skip to Main Content

Let’s consider 2 text file, one ‘main_list’, and one ‘ignore_list’.
For each line in the ignore_list, I want to remove the line starting with that string in the main_line.

basically something doable with sed and a while loop.

E.g.

while read line; do echo ^$line; sed -i "/^$line/d" ./main_list; done < ./ignore_list

In a better way, I wanted to first create the sed pattern and then run it once:

while read line; do
    if [ $SED_PATTERN="" ]; then 
      SED_PATTERN="^$line"
    else
      SED_PATTERN=$SED_PATTERN"|^$line"
    fi
  done < ./ ignore_list
echo $SED_PATTERN
sed -i "/$SED_PATTERN/d" ./main_list

unfortunately, because of the sub shell used by the while loop, it does not work.

A variable modified inside a while loop is not remembered and https://mywiki.wooledge.org/BashFAQ/024 are giving worthful explanations and workaround. I haven’t managed it yet to get one working in a simple way.

Ideally, I want to use the sh shell (the script will run in a gitlab pipeline with a simple alpine image)

Any idea to keep it simple before I move to a python script (and use a fat image instead of alpine – in between, I can also use one with bash)

Maybe another approach than sed and the while loop?

Thanks.

edit: some more context about the content of both files: I am dealing with a list of debian packages installed from a build step.
The main_list is then an output of a dpkg-query command (see below), so should not contain too fancy characters.
The ignore_list contains the packages I want to ignore for another post processing step, containing internal components not relevant for my output.

Here a small extract of both files

main_list

e2fsprogs|1.46.2-2|e2fsprogs|1.46.2-2
ebtables|2.0.11-4|ebtables|2.0.11-4
edgeonboarding-config|0.1|edgeonboarding-config|0.1
efibootguard|0.13+cip|efibootguard|0.13+cip
ethtool|1:5.9-1|ethtool|1:5.9-1

for the ignore_list

edgeonboarding-config

You can generate the main_list on any linux system by running

dpkg-query -f '${source:Package}|${source:Version}|${binary:Package}|${Version}n' -W > main_list

and for the ignore_list, just pick-up a few string from the main_list (begining of the lines)

2

Answers


  1. You can do this with the grep -v command. Use the -f option to read the list of patterns to filter out from a file. Use process substitution to put ^ at the beginning of every line in ignore_list and use that as the pattern file.

    grep -v -f <(sed 's/^/^/' ignore_list) main_list > main_list.new && mv main_list.new main_list
    
    Login or Signup to reply.
  2. Using any awk (untested due to no sample input/output provided):

    awk '
        NR==FNR{ ign[$0]; next }
        {
            for ( str in ign ) {
                if ( index($0,str) == 1 ) {
                    next
                }
            }
        }
    ' ignore_list main_list
    

    That is doing a literal string comparison against just the start of each line. If you were to use sed and/or grep for this then you’d need to escape all possible regexp metachars in ignore_list first, see is-it-possible-to-escape-regex-metacharacters-reliably-with-sed

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search