skip to Main Content

I am trying to scrape a file with content similar to this this:

"addressString":"12366 NY","eId":"64174f8e42b7fdfb837f68b","hasImage":false,"Price":5800,"Name":Bernard Bernoulli,"headline":"nice Fiat 500, red, slight damage to left mirror"
"addressString":"451 Citadel","eId":"sd3448e42b7368b","year":1976,"hasImage":true,"Price":12220,"Name":Edward Diego,"headline":"Mercedes SLX, no issues"
"addressString":"1321 Bejing","eId":"3102ffdb837fssdff3","Price":350,"Name":Jet Li,"headline":"Dodge Viper, no engine, no tires, no windshield; only cash"

I want to delete all lines with a "Price": lower than 950.

Also, the amount and position of named sections (like "Price", "year", "Name", etc.) differs from line to line.

I tried with sed and a regex which ranges from 0 to 950:

sed '/"Price":([0-9]|[1-9][0-9]|[1-8][0-9]{2}|9[0-4][0-9]|950),/d' <inputfile >outputfile

…but it did not work.

Any help is appreciated.
Using sed on Ubuntu Linux 20.04

3

Answers


  1. That’s because you did not specify flag -E for sed.

    In your case sed uses BRE, and parentheses here don’t have special meaning unless they escaped with .

    You can either escape all symbols with special meaning in regex by or use key -E.

    Login or Signup to reply.
  2. This almost looks like JSON data, with some small tweaks it is, i.e. you can parse the data with jq, e.g.:

    <infile sed -E 's/"Name":([^,]+)/"Name":"1"/; s/^/{ /; s/$/ }/' |
    jq 'select(.Price > 950)'
    

    Output:

    {
      "addressString": "12366 NY",
      "eId": "64174f8e42b7fdfb837f68b",
      "hasImage": false,
      "Price": 5800,
      "Name": "Bernard Bernoulli",
      "headline": "nice Fiat 500, red, slight damage to left mirror"
    }
    {
      "addressString": "451 Citadel",
      "eId": "sd3448e42b7368b",
      "year": 1976,
      "hasImage": true,
      "Price": 12220,
      "Name": "Edward Diego",
      "headline": "Mercedes SLX, no issues"
    }
    
    Login or Signup to reply.
  3. Don’t use regexps for numeric comparisons, use numeric comparisons. e.g. using GNU awk for the 3rd arg to match():

    $ awk 'match($0,/"Price":([^,]+)/,a) && (a[1] < 950)' file
    "addressString":"1321 Bejing","eId":"3102ffdb837fssdff3","Price":350,"Name":Jet Li,"headline":"Dodge Viper, no engine, no tires, no windshield; only cash"
    

    or using any awk:

    $ awk 'match($0,/"Price":([^,]+)/) && (substr($0,RSTART+8,RLENGTH-8)+0 < 950)' file
    "addressString":"1321 Bejing","eId":"3102ffdb837fssdff3","Price":350,"Name":Jet Li,"headline":"Dodge Viper, no engine, no tires, no windshield; only cash"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search