Use Awk to Insert 0 padded Incrementing number in lines that contain a pattern - SEO

Jieiku
March 15, 2023
217 views
4 votes
7 Answers

I am trying to replace this:

The numbers at the start of the line will always be 3 digits, 0 padded, there may however be empty lines or regular sentences prior, as shown in my full example that has echo.

227| Mathematics | 9 | search,seo |
038| Top Games | 2391 | search,seo,cookies |
136| Top Programming Languages | 219 | cookies |

with

227.001| Mathematics | 9 | search,seo |
038.002| Top Games | 2391 | search,seo,cookies |
136.003| Top Programming Languages | 219 | cookies |

Here is what I have so far (this works but the number is not 0 padded):

echo -e "Last Updated: Mar 15 2023

- Score: cumulative of benchmarking tools.
- KiB: total size of the page.
- Stats: metrics related to the page.

| Score | KiB | Stats |
| --- | --- | --- |
227| Mathematics | 9 | search,seo |
038| Top Games | 2391 | search,seo,cookies |
136| Top Programming Languages | 219 | cookies |" | awk '/^[0-9]{3}/{$0 = substr($0,1,3) "." ++i substr($0,4)} 1'

I think I need to add something like this ‘printf("%03dn", ++i)’ but when I tried replacing ++i with this, it did not work.

Tags: awk

Answers

One solution (you have plenty of choices ^^):

<INPUT> | 
    awk '
        BEGIN{FS=OFS="|"}
        /^[0-9]{3}/{$1=$1 sprintf(".%03d", NR)}
        1
    '

Output:

Last Updated: Mar 15 2023

- Score: cumulative of benchmarking tools.
- KiB: total size of the page.
- Stats: metrics related to the page.

| Score | KiB | Stats |
| --- | --- | --- |
227.001| Mathematics | 9 | search,seo |
038.002| Top Games | 2391 | search,seo,cookies |
136.003| Top Programming Languages | 219 | cookies |

- RomanPerekhrest
- March 15, 2023 at 8:29 pm
- 0 votes
0
With sub replacement (by regex pattern):
```
awk '/^[0-9]{3}/{ sub(/^[0-9]{3}/, "&."sprintf("%03d", ++i)) }1' test.txt

227.001| Mathematics | 9 | search,seo |
038.002| Top Games | 2391 | search,seo,cookies |
136.003| Top Programming Languages | 219 | cookies |
```
Login or Signup to reply.

Assuming 3 digits at the beginning and one line per entry

% awk '$1~/[[:digit:]]/{printf("%s.%03d", substr($0,1,3), ++x) 
    $0=substr($0,4,length($0))}1' file
Last Updated: Mar 15 2023

- Score: cumulative of benchmarking tools.
- KiB: total size of the page.
- Stats: metrics related to the page.

| Score | KiB | Stats |
| --- | --- | --- |
227.001| Mathematics | 9 | search,seo |
038.002| Top Games | 2391 | search,seo,cookies |
136.003| Top Programming Languages | 219 | cookies |

Assumptions:

first (numeric) field in the table has no leading/trailing white space
first (numeric) field is made up solely of digits (no periods, no exponential/scientific notations)
don’t know in advance the number of digits in the first (numeric) field (eg, could be 1 digit, 2 digits, 7 digits, …)

One awk idea:

awk '
BEGIN { FS=OFS="|" } 
      { if ($1+0==$1)                     # if 1st field is numeric
           $1=$1 sprintf(".%03d",++i)
      }
1
' sample.dat

# or as a one-liner:

awk 'BEGIN { FS=OFS="|" } { if ($1+0==$1) $1=$1 sprintf(".%03d",++i) } 1' sample.dat

This generates:

Last Updated: Mar 15 2023

- Score: cumulative of benchmarking tools.
- KiB: total size of the page.
- Stats: metrics related to the page.

| Score | KiB | Stats |
| --- | --- | --- |
227.001| Mathematics | 9 | search,seo |
038.002| Top Games | 2391 | search,seo,cookies |
136.003| Top Programming Languages | 219 | cookies |

If the 1st column of the table could contain spaces …

$ tail sample.dat
| Score | KiB | Stats |
| --- | --- | --- |
227| Mathematics | 9 | search,seo |
038| Top Games | 2391 | search,seo,cookies |
136| Top Programming Languages | 219 | cookies |
44 | Top Programming Languages | 219 | cookies |
 55| Top Programming Languages | 219 | cookies |
 6 | Top Programming Languages | 219 | cookies |

Modifying the awk script and supplying the input via stdin (to mimic OP’s echo | awk):

cat sample.dat | awk '
BEGIN { FS=OFS="|" } 
      { if ($1+0==$1) {
           x=$1
           gsub(/ /,"",x)
           $1=x sprintf(".%03d",++i)
        }
      }
1'

This generates:

Last Updated: Mar 15 2023

- Score: cumulative of benchmarking tools.
- KiB: total size of the page.
- Stats: metrics related to the page.

| Score | KiB | Stats |
| --- | --- | --- |
227.001| Mathematics | 9 | search,seo |
038.002| Top Games | 2391 | search,seo,cookies |
136.003| Top Programming Languages | 219 | cookies |
44.004| Top Programming Languages | 219 | cookies |
55.005| Top Programming Languages | 219 | cookies |
6.006| Top Programming Languages | 219 | cookies |

- EdMorton
- March 15, 2023 at 9:42 pm
- 0 votes
0
Using any awk:
```
$ awk '{sub(/|/,sprintf(".%03d|",NR))}1' file
227.001| Mathematics | 9 | search,seo |
038.002| Top Games | 2391 | search,seo,cookies |
136.003| Top Programming Languages | 219 | cookies |
```
Login or Signup to reply.

Since you are mentioning potentially processing the markdown in the table, ruby (or perl) is going to allow the table to be capture as a fully processable, sortable data element.

Here is an example that produces your desired output:

ruby  -e '
$<.read.scan(/(A[sS]+?)(^|[sS]+z)/){|b1,b2|
    puts b1        # the non-table part
    data=b2.split(/R/)  # second capture is the table part
    puts data[..1].join("n")    # the two line header
    puts data[2..].map.with_index(1){|l, i|    # this deals with the table data
        row=l.split("|")
        ([sprintf("%s.%03d", row[0], i)]+row[1..]).join("|")
    }
}
' file

Or use a split with a look ahead instead of a multi-line regex:

ruby -e '
$<.read.split(/(?=^|s+w+s+|)/,2).each_slice(2){|b1,b2|
    puts b1        # the non-table part
    data=b2.split(/R/)  # second split is the table part
    puts data[..1].join("n")    # the two line header
    puts data[2..].
        map{|l| l.split("|")}.
        map.with_index(1){|row, i|    
        ([sprintf("%s.%03d", row[0], i)]+row[1..]).join("|")
    }
}
' file

Either prints:

Last Updated: Mar 15 2023

- Score: cumulative of benchmarking tools.
- KiB: total size of the page.
- Stats: metrics related to the page.

| Score | KiB | Stats |
| --- | --- | --- |
227.001| Mathematics | 9 | search,seo 
038.002| Top Games | 2391 | search,seo,cookies 
136.003| Top Programming Languages | 219 | cookies

Here is the same but sorted by the existing first column:

ruby -e '
$<.read.split(/(?=^|s+w+s+|)/,2).each_slice(2){|b1,b2|
    puts b1        # the non-table part
    data=b2.split(/R/)  # second split is the table part
    puts data[..1].join("n")    # the two line header
    puts data[2..].
        map{|l| l.split("|")}.       # split into cells
        sort_by{|row| row[0].to_f}.  # sort by digits in col 1
        map.with_index(1){|row, i|    
        ([sprintf("%s.%03d", row[0], i)]+row[1..]).join("|")
    }
}
' file

Prints:

Last Updated: Mar 15 2023

- Score: cumulative of benchmarking tools.
- KiB: total size of the page.
- Stats: metrics related to the page.

| Score | KiB | Stats |
| --- | --- | --- |
038.001| Top Games | 2391 | search,seo,cookies 
136.002| Top Programming Languages | 219 | cookies 
227.003| Mathematics | 9 | search,seo

Or by the third column:

ruby -e '
$<.read.split(/(?=^|s+w+s+|)/,2).each_slice(2){|b1,b2|
    puts b1        # the non-table part
    data=b2.split(/R/)  # second split is the table part
    puts data[..1].join("n")    # the two line header
    puts data[2..].
        map{|l| l.split("|")}.       # split into cells
        sort_by{|row| row[2].to_f}.  # sort by digits in col 3
        map.with_index(1){|row, i|    
        ([sprintf("%s.%03d", row[0], i)]+row[1..]).join("|")
    }
}
' file

Prints:

Last Updated: Mar 15 2023

- Score: cumulative of benchmarking tools.
- KiB: total size of the page.
- Stats: metrics related to the page.

| Score | KiB | Stats |
| --- | --- | --- |
227.001| Mathematics | 9 | search,seo 
136.002| Top Programming Languages | 219 | cookies 
038.003| Top Games | 2391 | search,seo,cookies

etc…

- RAREKpopManifesto
- March 18, 2023 at 9:28 pm
- 0 votes
0
This version it’s entirely agnostic to what’s before the first |, and simply tags on the padded row number after a period (.), so it could be numbers, padded numbers, unicode text, or even emojis.
```
mawk 'sub("[|]", sprintf(".%.3u&", NR))^_'
```
```
227.001| Mathematics | 9 | search,seo |
038.002| Top Games | 2391 | search,seo,cookies |
136.003| Top Programming Languages | 219 | cookies |
```
The one above aligns the row numbers to the pipe (|) itself, so if there are spaces, gaps will be created. To align to the data values instead while preserving any leading spaces/tabs, try :
```
227| Mathematics | 9 | search,seo |
 Q | Top Games | 2391 | search,seo,cookies |
136| Top Programming Languages | 219 | cookies | 
```
```
gawk 'sub("[ t]*[|]", sprintf(".%.3u&",NR))^_'
```
```
227.001| Mathematics | 9 | search,seo |
 Q.002 | Top Games | 2391 | search,seo,cookies |
136.003| Top Programming Languages | 219 | cookies |
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Use Awk to Insert 0 padded Incrementing number in lines that contain a pattern – SEO

Answers