skip to Main Content

I am trying to replace this:

The numbers at the start of the line will always be 3 digits, 0 padded, there may however be empty lines or regular sentences prior, as shown in my full example that has echo.

227| Mathematics | 9 | search,seo |
038| Top Games | 2391 | search,seo,cookies |
136| Top Programming Languages | 219 | cookies |

with

227.001| Mathematics | 9 | search,seo |
038.002| Top Games | 2391 | search,seo,cookies |
136.003| Top Programming Languages | 219 | cookies |

Here is what I have so far (this works but the number is not 0 padded):

echo -e "Last Updated: Mar 15 2023

- Score: cumulative of benchmarking tools.
- KiB: total size of the page.
- Stats: metrics related to the page.

| Score | KiB | Stats |
| --- | --- | --- |
227| Mathematics | 9 | search,seo |
038| Top Games | 2391 | search,seo,cookies |
136| Top Programming Languages | 219 | cookies |" | awk '/^[0-9]{3}/{$0 = substr($0,1,3) "." ++i substr($0,4)} 1'

I think I need to add something like this ‘printf("%03dn", ++i)’ but when I tried replacing ++i with this, it did not work.

7

Answers


  1. One solution (you have plenty of choices ^^):

    <INPUT> | 
        awk '
            BEGIN{FS=OFS="|"}
            /^[0-9]{3}/{$1=$1 sprintf(".%03d", NR)}
            1
        '
    

    Output:

    Last Updated: Mar 15 2023
    
    - Score: cumulative of benchmarking tools.
    - KiB: total size of the page.
    - Stats: metrics related to the page.
    
    | Score | KiB | Stats |
    | --- | --- | --- |
    227.001| Mathematics | 9 | search,seo |
    038.002| Top Games | 2391 | search,seo,cookies |
    136.003| Top Programming Languages | 219 | cookies |
    
    Login or Signup to reply.
  2. With sub replacement (by regex pattern):

    awk '/^[0-9]{3}/{ sub(/^[0-9]{3}/, "&."sprintf("%03d", ++i)) }1' test.txt
    
    227.001| Mathematics | 9 | search,seo |
    038.002| Top Games | 2391 | search,seo,cookies |
    136.003| Top Programming Languages | 219 | cookies |
    
    Login or Signup to reply.
  3. Assuming 3 digits at the beginning and one line per entry

    % awk '$1~/[[:digit:]]/{printf("%s.%03d", substr($0,1,3), ++x) 
        $0=substr($0,4,length($0))}1' file
    Last Updated: Mar 15 2023
    
    - Score: cumulative of benchmarking tools.
    - KiB: total size of the page.
    - Stats: metrics related to the page.
    
    | Score | KiB | Stats |
    | --- | --- | --- |
    227.001| Mathematics | 9 | search,seo |
    038.002| Top Games | 2391 | search,seo,cookies |
    136.003| Top Programming Languages | 219 | cookies |
    
    Login or Signup to reply.
  4. Assumptions:

    • first (numeric) field in the table has no leading/trailing white space
    • first (numeric) field is made up solely of digits (no periods, no exponential/scientific notations)
    • don’t know in advance the number of digits in the first (numeric) field (eg, could be 1 digit, 2 digits, 7 digits, …)

    One awk idea:

    awk '
    BEGIN { FS=OFS="|" } 
          { if ($1+0==$1)                     # if 1st field is numeric
               $1=$1 sprintf(".%03d",++i)
          }
    1
    ' sample.dat
    
    # or as a one-liner:
    
    awk 'BEGIN { FS=OFS="|" } { if ($1+0==$1) $1=$1 sprintf(".%03d",++i) } 1' sample.dat
    

    This generates:

    Last Updated: Mar 15 2023
    
    - Score: cumulative of benchmarking tools.
    - KiB: total size of the page.
    - Stats: metrics related to the page.
    
    | Score | KiB | Stats |
    | --- | --- | --- |
    227.001| Mathematics | 9 | search,seo |
    038.002| Top Games | 2391 | search,seo,cookies |
    136.003| Top Programming Languages | 219 | cookies |
    

    If the 1st column of the table could contain spaces …

    $ tail sample.dat
    | Score | KiB | Stats |
    | --- | --- | --- |
    227| Mathematics | 9 | search,seo |
    038| Top Games | 2391 | search,seo,cookies |
    136| Top Programming Languages | 219 | cookies |
    44 | Top Programming Languages | 219 | cookies |
     55| Top Programming Languages | 219 | cookies |
     6 | Top Programming Languages | 219 | cookies |
    

    Modifying the awk script and supplying the input via stdin (to mimic OP’s echo | awk):

    cat sample.dat | awk '
    BEGIN { FS=OFS="|" } 
          { if ($1+0==$1) {
               x=$1
               gsub(/ /,"",x)
               $1=x sprintf(".%03d",++i)
            }
          }
    1'
    

    This generates:

    Last Updated: Mar 15 2023
    
    - Score: cumulative of benchmarking tools.
    - KiB: total size of the page.
    - Stats: metrics related to the page.
    
    | Score | KiB | Stats |
    | --- | --- | --- |
    227.001| Mathematics | 9 | search,seo |
    038.002| Top Games | 2391 | search,seo,cookies |
    136.003| Top Programming Languages | 219 | cookies |
    44.004| Top Programming Languages | 219 | cookies |
    55.005| Top Programming Languages | 219 | cookies |
    6.006| Top Programming Languages | 219 | cookies |
    
    Login or Signup to reply.
  5. Using any awk:

    $ awk '{sub(/|/,sprintf(".%03d|",NR))}1' file
    227.001| Mathematics | 9 | search,seo |
    038.002| Top Games | 2391 | search,seo,cookies |
    136.003| Top Programming Languages | 219 | cookies |
    
    Login or Signup to reply.
  6. Since you are mentioning potentially processing the markdown in the table, ruby (or perl) is going to allow the table to be capture as a fully processable, sortable data element.

    Here is an example that produces your desired output:

    ruby  -e '
    $<.read.scan(/(A[sS]+?)(^|[sS]+z)/){|b1,b2|
        puts b1        # the non-table part
        data=b2.split(/R/)  # second capture is the table part
        puts data[..1].join("n")    # the two line header
        puts data[2..].map.with_index(1){|l, i|    # this deals with the table data
            row=l.split("|")
            ([sprintf("%s.%03d", row[0], i)]+row[1..]).join("|")
        }
    }
    ' file
    

    Or use a split with a look ahead instead of a multi-line regex:

    ruby -e '
    $<.read.split(/(?=^|s+w+s+|)/,2).each_slice(2){|b1,b2|
        puts b1        # the non-table part
        data=b2.split(/R/)  # second split is the table part
        puts data[..1].join("n")    # the two line header
        puts data[2..].
            map{|l| l.split("|")}.
            map.with_index(1){|row, i|    
            ([sprintf("%s.%03d", row[0], i)]+row[1..]).join("|")
        }
    }
    ' file
    

    Either prints:

    Last Updated: Mar 15 2023
    
    - Score: cumulative of benchmarking tools.
    - KiB: total size of the page.
    - Stats: metrics related to the page.
    
    | Score | KiB | Stats |
    | --- | --- | --- |
    227.001| Mathematics | 9 | search,seo 
    038.002| Top Games | 2391 | search,seo,cookies 
    136.003| Top Programming Languages | 219 | cookies
    

    Here is the same but sorted by the existing first column:

    ruby -e '
    $<.read.split(/(?=^|s+w+s+|)/,2).each_slice(2){|b1,b2|
        puts b1        # the non-table part
        data=b2.split(/R/)  # second split is the table part
        puts data[..1].join("n")    # the two line header
        puts data[2..].
            map{|l| l.split("|")}.       # split into cells
            sort_by{|row| row[0].to_f}.  # sort by digits in col 1
            map.with_index(1){|row, i|    
            ([sprintf("%s.%03d", row[0], i)]+row[1..]).join("|")
        }
    }
    ' file
    

    Prints:

    Last Updated: Mar 15 2023
    
    - Score: cumulative of benchmarking tools.
    - KiB: total size of the page.
    - Stats: metrics related to the page.
    
    | Score | KiB | Stats |
    | --- | --- | --- |
    038.001| Top Games | 2391 | search,seo,cookies 
    136.002| Top Programming Languages | 219 | cookies 
    227.003| Mathematics | 9 | search,seo 
    

    Or by the third column:

    ruby -e '
    $<.read.split(/(?=^|s+w+s+|)/,2).each_slice(2){|b1,b2|
        puts b1        # the non-table part
        data=b2.split(/R/)  # second split is the table part
        puts data[..1].join("n")    # the two line header
        puts data[2..].
            map{|l| l.split("|")}.       # split into cells
            sort_by{|row| row[2].to_f}.  # sort by digits in col 3
            map.with_index(1){|row, i|    
            ([sprintf("%s.%03d", row[0], i)]+row[1..]).join("|")
        }
    }
    ' file
    

    Prints:

    Last Updated: Mar 15 2023
    
    - Score: cumulative of benchmarking tools.
    - KiB: total size of the page.
    - Stats: metrics related to the page.
    
    | Score | KiB | Stats |
    | --- | --- | --- |
    227.001| Mathematics | 9 | search,seo 
    136.002| Top Programming Languages | 219 | cookies 
    038.003| Top Games | 2391 | search,seo,cookies
    

    etc…

    Login or Signup to reply.
  7. This version it’s entirely agnostic to what’s before the first |, and simply tags on the padded row number after a period (.), so it could be numbers, padded numbers, unicode text, or even emojis.

    mawk 'sub("[|]", sprintf(".%.3u&", NR))^_'
    
    227.001| Mathematics | 9 | search,seo |
    038.002| Top Games | 2391 | search,seo,cookies |
    136.003| Top Programming Languages | 219 | cookies |
    

    The one above aligns the row numbers to the pipe (|) itself, so if there are spaces, gaps will be created. To align to the data values instead while preserving any leading spaces/tabs, try :

    227| Mathematics | 9 | search,seo |
     Q | Top Games | 2391 | search,seo,cookies |
    136| Top Programming Languages | 219 | cookies | 
    
    gawk 'sub("[ t]*[|]", sprintf(".%.3u&",NR))^_'
    
    227.001| Mathematics | 9 | search,seo |
     Q.002 | Top Games | 2391 | search,seo,cookies |
    136.003| Top Programming Languages | 219 | cookies |
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search