Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – How to store a 2D array of strings containing common delimited characters in Python?

hpbristol
March 21, 2023
256 views
0 votes
2 Answers

I am trying to store a large number of news articles (eventually maybe more than several thousand) that I am using in a Python script. For convenience, I would like to store them in a 2D array within a text file, like [[ID, Title, Article],[1, ‘Bill’s Burgers’, ‘The owner, Bill, makes good burgers."]]. However, the solutions I find online require some character such as comma, space, newline, etc to delimited the entries. As these commonly appear in news articles, I can’t use them to delimited elements.

I tried to format my 2D array using json, but found this didn’t do anything to my array. When printing/opening the txt file, it appears exactly as when I declare it – "[[[["ID", "URL", "Title", "Date", "Article"],["1","2","3","4","5"]]". My code is as follows:

scraped_articles_array_headings = [["ID", "URL", "Title", "Date", "Article"],["1","2","3","4","5"]]
headings_encoded = json.dumps(scraped_articles_array_headings)
print(headings_encoded)
f = open("articles_encoded2.txt", "w", encoding="utf-8")
f.write(headings_encoded)
f.close()

How am I mis-using json here? I would welcome any suggestions regarding a suitable approach to storing this data – Ideally I just want a system to allow for easy searching of the contents of each parameter (ID, Title, etc) and I appreciate the above approach might not be following a sensible path to achieve this.

Answers

- SillySam
- March 21, 2023 at 1:12 am
- 0 votes
0
I think you may want to look into an example JSON formatted document.
Keep in mind that this is different from a CSV format which is line based and uses a delimiter as you’ve mentioned.

To store those values in JSON you may want to do something like this:
```
[
    {
        "id": 1,
        "url": "https://example.com",
        "title": "Example Title",
        "date": "21-03-2003",
        "article": "Some article content..."
    },
    {
        "id": 2,
        "url": "https://example.com",
        "title": "Example Title",
        "date": "21-03-2003",
        "article": "Some article content..."
    }
]
```
For storing large amounts of data you may want to consider a database to improve performance and also allows for querying (eg. finding all articles from a certain URL).
Login or Signup to reply.

You can have articles that contain "comma, space, newline, etc" and still use common formats that use them for delimiters. An example using the Python csv module:

import csv

scraped_articles = [['ID', 'URL', 'Title', 'Date', 'Article'],
                    ['1','2','3','4','article1 withn "quotes" and commas(,) and newlines'],
                    ['1','2','3','4','article2 withn "quotes" and commas(,) and newlines']]

with open('articles.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerows(scraped_articles)

with open('articles.csv', 'r', newline='', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for line in reader:
        print('-'*80)
        print(line['Article'])
    print('-'*80)

Resulting articles.csv:

ID,URL,Title,Date,Article
1,2,3,4,"article1 with
 ""quotes"" and commas(,) and newlines"
1,2,3,4,"article2 with
 ""quotes"" and commas(,) and newlines"

Output after reading data back:

--------------------------------------------------------------------------------
article1 with
 "quotes" and commas(,) and newlines
--------------------------------------------------------------------------------
article2 with
 "quotes" and commas(,) and newlines
--------------------------------------------------------------------------------

Please signup or login to give your own answer.

Click here to cancel reply.