Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – How can I process only files with a certain name?

PeterPark
January 21, 2023
290 views
0 votes
2 Answers

I am generating a python code that automatically processes and combines JSON datasets.
Meanwhile, when I access each folder, there are two JSON datasets in a folder, which are, for example

download/2019/201901/dragon.csv
download/2019/201901/kingdom.csv

and the file names are the same across all folders. In other words, each folder has two datasets with the name above.
in the ‘download’ folder, there are 4 folders, 2019, 2020, 2021, 2022, and
in the folder of each year, there are folders for each month, e.g., 2019/201901, 2019/201902, ~~
In this situation, I want to process only ‘dragon.csv’s. I wonder how I can do it. my current code is

import os
import pandas as pd
import numpy as np

path = 'download/2019'
save_path = 'download'

class Preprocess:
    
    def __init__(self, path, save_path):  
        self.path = path
        self.save_path = save_path

after finishing processing,

def save_dataset(path, save_path):

    for dir in os.listdir(path):
        for file in os.listdir(os.path.join(path, dir)):
            if file[-3:] == 'csv':
                df = pd.read_csv(os.path.join(path, dir, file))
                print(f'Reading data from {os.path.join(path, dir, file)}')

                print('Start Preprocessing...')
                df = preprocessing(df)
                print('Finished!')
                
                if not os.path.exists(os.path.join(save_path, dir)):
                    os.makedirs(os.path.join(save_path, dir))
                df.to_csv(os.path.join(save_path, dir, file), index=False)

save_dataset(path, save_path)

Answers

- ljmc
- January 21, 2023 at 4:41 pm
- 0 votes
0
You can use pathlib’s glob method:
```
from pathlib import Path

p = Path()  # nothing if you're in the folder containing `download` else point to that folder

dragons_paths = p.glob("download/**/dragons.csv")
```
dragons_paths contains a generator that will point to all the dragons.csv files under download folder.

PS. You should avoid shadowing dir, maybe call your variable dir_ or d.
Login or Signup to reply.

- FrederikKliemt
- January 21, 2023 at 4:43 pm
- 0 votes
0
If I understand your question, you only want to process files that include the substring "dragon". You could do this by adding a conditional to your if-clause. So instead of writing if file[-3:] == 'csv' simply write if file[-3:] == 'csv' and 'dragon' in file

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.