skip to Main Content

I am new into databases and PostgreSQL and today I wanted to import this dataset: https://www.kaggle.com/datasets/wilmerarltstrmberg/recipe-dataset-over-2m/data into a PostgreSQL so I can make a recipe finder app.

I started by creating a new database, then creating a table and then creating the exact columns from the dataset:

create table recipes_table (
title text,
    ingredients text,
    directions text,
    link text,
    source text,
    NER text,
    site text
);

Then I proceeded to import all the data from my .csv (by rightclicking on the table and selecting import). I left everything on default and started importing but after 10 seconds I always get errors.

Unfortunately I lost track of every modification I did to my code and table (there were a lot), but I’ll leave the results I had from the processes tab:

--command " "\copy public.recipes_table (title, ingredients, directions, link, source, "NER", site) FROM 'C:/Users/silvi/Desktop/WEBPRO~1/archive/RECIPE~1.CSV' DELIMITER ',' CSV HEADER QUOTE '"' ESCAPE '''';""

ERROR: unquoted newline found in data HINT: Use quoted CSV field to represent newline. CONTEXT: COPY recipes_table, line 1274879 – first iteration;
I opened my terminal in VS Code and did this to solve it: sed -i ‘s/r//g’ recipes_data.csv

And then, with this command:
--command " "\copy public.recipes_table (title, ingredients, directions, link, source, "NER", site) FROM 'C:/Users/silvi/Desktop/WEBPRO~1/archive/RECIPE~1.CSV' DELIMITER ',' CSV HEADER QUOTE '"' ESCAPE '''';""

I got this error

ERROR: extra data after last expected column CONTEXT: COPY recipes_table, line 1274879: "Spaghetti Sauce,"[""1-2 lb. hamburger"", ""1-2 lb. Italian Sausage"", ""2 clove garlic"", ""salt"", ..."

Or this similar one:
ERROR: extra data after last expected column CONTEXT: COPY recipes_table, line 2: "No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. evaporated milk"", ""1/2 tsp. van..."

Now I’m as confused as I can get. How can I import this table in my pgAdmin app so I can visualize it and use it in my project?

And even more, I don’t need every column from this .csv so how can I get for example only the title, ingredients, directions and NER?

Thank you all for your help!

2

Answers


  1. Looking at your error code it shows ESCAPE '''' and I suspect that is causing the issue (including the one you band-aided with the sed command).

    The first two lines of this file are:

    title,ingredients,directions,link,source,NER,site
    No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. evaporated milk"", ""1/2 tsp. vanilla"", ""1/2 c. broken nuts (pecans)"", ""2 Tbsp. butter or margarine"", ""3 1/2 c. bite size shredded rice biscuits""]","[""In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine."", ""Stir over medium heat until mixture bubbles all over top."", ""Boil and stir 5 minutes more. Take off heat."", ""Stir in vanilla and cereal; mix well."", ""Using 2 teaspoons, drop and shape into 30 clusters on wax paper."", ""Let stand until firm, about 30 minutes.""]",www.cookbooks.com/Recipe-Details.aspx?id=44874,Gathered,"[""bite size shredded rice biscuits"", ""vanilla"", ""brown sugar"", ""nuts"", ""milk"", ""butter""]",www.cookbooks.com
    

    The ingredients column is encapsulated in double quotes " characters, HOWEVER there are double quote characters also inside the data, so they have escaped those double quotes with a second double quote.

    A simple example of what this looks like: Imagine your data is Hi, they call me "Joe", in this data that would be encapsulated with double quotes, and the already existing double quotes would be doubled to indicate that they are part of the data, so:

     "Hi, they call me ""Joe"""
    

    Therefore your ESCAPE character is " not '.

    What I would do is download the file once again so you have a fresh copy that has been tampered with by sed and the set everything up the same, but change the ESCAPE character to " and rerun the import. This will change that command line run to look like ESCAPE '"'

    You may find it helpful to read through the postgres COPY command documentation to understand all of the options. Even if you are using a UI to set this up, that COPY command is being ran under the hood: https://www.postgresql.org/docs/current/sql-copy.html

    Lastly, the ingredient, directions and NER columns look to be JSON arrays, so you may want to think about how that affects your schema and app and treat them appropriately (not as text).

    Login or Signup to reply.
  2.     create table recipes_table (
    title text,
        ingredients text,
        directions text,
        link text,
        source text,
        NER text,
        site text
    );
    

    Then:

    copy recipes_table(title, ingredients, directions, link, source, NER, site)
        from  'C:/Users/silvi/Desktop/WEBPRO~1/archive/RECIPE~1.CSV' CSV HEADER;
    

    PS: I downloaded and tested the same:

    2,231,142 rows affected in 26 s 752 ms

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search