So using a script, with Python, in my CSV file, I want to see if IP and timestamp values exists in some way in the line entries of the JSON log file, and if so, return that specific JSON log entry to another file. I tried to make it universal so it’s applicable to all IP addresses. Here’s what the sample CSV file would look like;
"clientip",""destip","dest_hostname","timestamp"
"127.0.0.1","0.0.0.0","randomhost","2023-09-09T04:18:22.542Z"
A sample line entry from the Json Log File
{"log": "09-Sept-2023 rate-limit: info: client @xyz 127.0.0.1, "stream":"stderr", "time": 2023-09-09T04:18:22.542Z"}
It’s the lines from the JSON log file we want to return in the output.txt file when there’s a match. The JSON file doesn’t have the same fields and organization like the CSV does (with clientip, destip, dest_hostname, timestamp, but I was hoping that I could still at least return lines from the JSON log files to a new file that had matches on the clientip (like we see here with 127.0.0.1 in "info: client @xyz 127.0.0.1) and maybe the timestamp.
I tried shell previously but could not get any matches. I tried the join command join file.csv xyz-json.log > output.txt
but it didn’t yield anything, neither did awk
with specification like "NR==FR".
That’s why I’m trying to get this done in Python now. I’m new to Python as well, but this is what I roughly had in mind, ignoring indentation for now.
import csv
for line in csv
for line in json-logs
if csv == json-logs
print l1 == l2
I would appreciate any help/assistance with this!
2
Answers
One possibility would be to read both csv and json files into a dataframe; extract any
ip
values from the jsonlog
then do an inner merge from the json file onip
andtime
and output rows remaining after the merge:Here is some code structure to get you started. I had to take some liberties with the CSV and JSON you provided as they don’t appear to be well formed (as already mentioned in comments), so I coerced them to make them usable.
If you are able to post some better examples of the CSV and JSON I can update this to match, presuming the structures I guessed are inaccurate. Feel free to ask for additional guidance on any part of this.