skip to Main Content

I have a data in the following format.

>ab:xy_a0by98-2 Movie= top gun actor= Tom Genere=Action Length=234 Credits=30 pe=1 summry=(Tom|action|234)
Top Gun is a 1986 American action drama film directed by Tony Scott, and produced by Don Simpson and Jerry Bruckheimer

>ab:xy_b0ha81-5 Movie= Thor actor= chris hemsworth Genere=Action Length=321 Credits=20 pe=0 summry=(chris|Action|321)
Thor embarks on a journey unlike anything he's ever faced a quest for inner peace

>ab:xy_c0ma65-1 Movie= Batman actor= Bale Genere=Action Length=251 Credits=30 pe=1 summry=(Bale|Action|251)
From American Psycho to Batman Begins to Vice, Christian Bale is a bonafide A-list star
But he missed out on plenty of huge roles along the way.

>ab:xy_d0fc78-2 Movie= Joker actor= Phoenix Genere=thriller Length=341 Credits=35 pe=2 summry=(phoenix|thriller|341)
Joker is a 2019 American psychological thriller film directed and produced by Todd Phillips
who co-wrote the screenplay with Scott Silver

>ab:xy_e0ra81-2 Movie= Superman actor= henry cavill Genere=Action Length=254 Credits=28 pe=1 summry=(cavill|action|254)
Henry William Dalgliesh Cavill is a British actor
He is known for his portrayal of Charles Brandon in Showtime's The Tudors

I want to extract all the entries which contain pe=1, each entiry starts with the > symobol as follows:

>ab:xy_a0by98-2 Movie= top gun actor= Tom Genere=Action Length=234 Credits=30 pe=1 summry=(Tom|action|234)
Top Gun is a 1986 American action drama film directed by Tony Scott, and produced by Don Simpson and Jerry Bruckheimer

>ab:xy_c0ma65-1 Movie= Batman actor= Bale Genere=Action Length=251 Credits=30 pe=1 summry=(Bale|Action|251)
From American Psycho to Batman Begins to Vice, Christian Bale is a bonafide A-list star
But he missed out on plenty of huge roles along the way.

>ab:xy_e0ra81-2 Movie= Superman actor= henry cavill Genere=Action Length=254 Credits=28 pe=1 summry=(cavill|action|254)
Henry William Dalgliesh Cavill is a British actor
He is known for his portrayal of Charles Brandon in Showtime's The Tudors

and to format few values in a table as:

Name            Length
ab:xy_a0by98-2  234
ab:xy_c0ma65-1  251
ab:xy_e0ra81-2  254

I tried grep "pe=1" input.txt > output.txt. But it has extarcted only the first line not the description.
Any help appreciated…

2

Answers


  1. This sed command should do the job:

    sed -n 's/^>([^[:blank:]]*).*\Length=([0-9]*).*\pe=1.*/1 2/p' file
    
    Login or Signup to reply.
  2. 1st solution(With GNU awk): With your shown samples please try following in awk code. Written and tested in GNU awk. Simple explanation would be, checking if line starts with > and having pe=1 AND using match function to match regex \Length=([0-9]+)
    and get its matched value into a capture group into array named arr to get needed value of Length string. If both of these conditions are TRUE then; printing 1st field followed by 1st item of array arr.

    awk '/^>.*\pe=1 / && match($0,/\Length=([0-9]+)/,arr){print $1,arr[1]}' Input_file
    


    2nd solution(with any awk): With any version of awk please try following code, little tweak of 1st solution.

    awk '
    /^>.*\pe=1 / && match($0,/\Length=[0-9]+/){
      val=substr($0,RSTART,RLENGTH)
      sub(/.*=/,"",val)
      print $1,val
    }
    ' Input_file
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search