scrape a file and delete lines with a certain range of numbers - Ubuntu

fonzman
March 27, 2023
147 views
2 votes
3 Answers

I am trying to scrape a file with content similar to this this:

"addressString":"12366 NY","eId":"64174f8e42b7fdfb837f68b","hasImage":false,"Price":5800,"Name":Bernard Bernoulli,"headline":"nice Fiat 500, red, slight damage to left mirror"
"addressString":"451 Citadel","eId":"sd3448e42b7368b","year":1976,"hasImage":true,"Price":12220,"Name":Edward Diego,"headline":"Mercedes SLX, no issues"
"addressString":"1321 Bejing","eId":"3102ffdb837fssdff3","Price":350,"Name":Jet Li,"headline":"Dodge Viper, no engine, no tires, no windshield; only cash"

I want to delete all lines with a "Price": lower than 950.

Also, the amount and position of named sections (like "Price", "year", "Name", etc.) differs from line to line.

I tried with sed and a regex which ranges from 0 to 950:

sed '/"Price":([0-9]|[1-9][0-9]|[1-8][0-9]{2}|9[0-4][0-9]|950),/d' <inputfile >outputfile

…but it did not work.

Any help is appreciated.
Using sed on Ubuntu Linux 20.04

Answers

- markalex
- March 27, 2023 at 9:54 am
- 0 votes
0
That’s because you did not specify flag -E for sed.

In your case sed uses BRE, and parentheses here don’t have special meaning unless they escaped with .

You can either escape all symbols with special meaning in regex by or use key -E.

Login or Signup to reply.

This almost looks like JSON data, with some small tweaks it is, i.e. you can parse the data with jq, e.g.:

<infile sed -E 's/"Name":([^,]+)/"Name":"1"/; s/^/{ /; s/$/ }/' |
jq 'select(.Price > 950)'

Output:

{
  "addressString": "12366 NY",
  "eId": "64174f8e42b7fdfb837f68b",
  "hasImage": false,
  "Price": 5800,
  "Name": "Bernard Bernoulli",
  "headline": "nice Fiat 500, red, slight damage to left mirror"
}
{
  "addressString": "451 Citadel",
  "eId": "sd3448e42b7368b",
  "year": 1976,
  "hasImage": true,
  "Price": 12220,
  "Name": "Edward Diego",
  "headline": "Mercedes SLX, no issues"
}

Don’t use regexps for numeric comparisons, use numeric comparisons. e.g. using GNU awk for the 3rd arg to match():

$ awk 'match($0,/"Price":([^,]+)/,a) && (a[1] < 950)' file
"addressString":"1321 Bejing","eId":"3102ffdb837fssdff3","Price":350,"Name":Jet Li,"headline":"Dodge Viper, no engine, no tires, no windshield; only cash"

or using any awk:

$ awk 'match($0,/"Price":([^,]+)/) && (substr($0,RSTART+8,RLENGTH-8)+0 < 950)' file
"addressString":"1321 Bejing","eId":"3102ffdb837fssdff3","Price":350,"Name":Jet Li,"headline":"Dodge Viper, no engine, no tires, no windshield; only cash"

Please signup or login to give your own answer.

Click here to cancel reply.

scrape a file and delete lines with a certain range of numbers – Ubuntu

Answers