skip to Main Content

I am working on VM Linux Ubuntu and my goal is to form this input

4,4
3,8
4,2
4,8
6,8
5,1
4,5
6,9
4,2
5,4
5,9
5,9
7,2
4,9
4,4
5,1

to this output

4,4 6,8 4,2 7,2
3,8 5,1 5,4 4,9
4,2 4,5 5,9 4,4
4,8 6,9 5,9 5,1

And copy these values to an Excel file on Windows.

I wrote this command

cat text.txt |cut -f3| awk '1 ; NR%4==0 {printf"n"} '|pr  -4 -t|column -t| tr -s '[:blank:]' 't'

which works fine!

I counted the blanked lines to create each column as below :
cat text.txt |cut -f3| awk '1 ; NR%4==0 {printf"n"} '| awk '!NF {sum += 1} END {print sum}' –> I got 4 blanked lines, so I am adding 4 to this command |pr -4 -t|

My issue was when I had a larger file, let’s say 160 lines (40 blanked lines).
Initially, the modification on this part |pr -40 -t gave an error of pr: page width too narrow
so I changed to

 cat text.txt |cut -f3| awk '1 ; NR%4==0 {printf"n"} '| pr -W200 -40 -t | column -t| tr -s '[:blank:]' 't'

which seems to work; however, the copy and paste merges 4 values in the 26th column, like

  4,83,6     
  5,66,2     
  6,57,6     
  8,18,1    

instead of

  4,8  3,6   
  5,6  6,2   
  6,5  7,6   
  8,1  8,1  

Even though I tried different values on the W option, I assume that -W200 is responsible for this issue.
I have two questions.

  1. Can a good proportion between the wide and blanked cells be found to fix this merging issue?
  2. As my goal is to “paste” 4 lines next to each other, could someone help me to paste the values directly instead of counting the number of blanked lines first and then modifying the wide range as I did?

I really appreciate any help you can provide.

6

Answers


  1. Instead of trying to reverse-engineer and understand OP’s current code, and since OP is already using awk, I’d like to propose a single awk script to replace all of OP’s current code:

    awk -v n=4 '                   # set awk variable "n" to the number of output lines
    BEGIN { OFS="t" }
          { sub(/r$/,"")          # strip out a trailing dos/windows line ending "r"
            ndx = NR%n
            lines[ndx] = lines[ndx] (NR <= n ? "" : OFS) $1
          }
    END   { for (i=1; i<=n; i++) {
                print lines[i%n]
            }
          }
    ' text.txt
    

    NOTE: you can remove the sub(/r$/,"") if you know for a fact the input file will never contain dos/windows line endings (r); if the file does not include r line endings then the sub(/r$/,"") becomes a no-op that does nothing (other than use up a few cpu cycles)

    For 4 lines (n=4) of output:

    4,4     6,8     4,2     7,2
    3,8     5,1     5,4     4,9
    4,2     4,5     5,9     4,4
    4,8     6,9     5,9     5,1
    

    For 3 lines (n=3) of output:

    4,4     4,8     4,5     5,4     7,2     5,1
    3,8     6,8     6,9     5,9     4,9
    4,2     5,1     4,2     5,9     4,4
    

    For 7 lines (n=7) of output:

    4,4     6,9     4,4
    3,8     4,2     5,1
    4,2     5,4
    4,8     5,9
    6,8     5,9
    5,1     7,2
    4,5     4,9
    
    Login or Signup to reply.
  2. Another simple approach is to buffer the elements in an array using string concatenation in awk. No other process is needed, e.g.

    awk '{n++; a[n] = (FNR > 4) ? a[n] " " $0 : $0; if (n == 4) n = 0 } END {for (i=1; i<=4; i++) print a[i] }' file
    

    Essentially you build a 4-element array with each element being one of the final rows you want in your output. You concatenate every 4th value to each array element 1, 2, 3 and 4 in order, adding a space before each value after the first in each element.

    After having read and concatenated all values into your array, you simply loop over the array elements in the END rule outputting each.

    In an easier to read expanded form that would be:

    awk '{
      n++
      a[n] = (FNR > 4) ? a[n] " " $0 : $0
      if (n == 4)
        n = 0
    }
    END {
      for (i=1; i<=4; i++)
         print a[i]
    }' file
    

    (adjust for whitespace as needed)

    Example Use/Output

    Using a heredoc to feed your example to awk you can do:

    $ awk '{n++; a[n] = (FNR > 4) ? a[n] " " $0 : $0; if (n == 4) n = 0 } END {for (i=1; i<=4; i++) print a[i] }' << eof
    4,4
    3,8
    4,2
    4,8
    6,8
    5,1
    4,5
    6,9
    4,2
    5,4
    5,9
    5,9
    7,2
    4,9
    4,4
    5,1
    eof
    
    4,4 6,8 4,2 7,2
    3,8 5,1 5,4 4,9
    4,2 4,5 5,9 4,4
    4,8 6,9 5,9 5,1
    
    Login or Signup to reply.
  3. Using any awk:

    awk '
    {x++} # Increment x by 1
    {if (x<=4) # Ensures a 5th line (or more) isn't added to output
            {col[x]=col[x] " " $1} # Concat 'x,x' value to appropriate array var
            else {x=1;col[x]=col[x] " " $1 }} # Necessary to prevent every 5th line from being skipped
            END { for (i=1;i<=4;i++)
                     {print col[i]}}
    ' $input_file
    
    Login or Signup to reply.
  4. Here is a Ruby to do that:

    ruby -e 'inp=$<.read.split(/R/) # The /R/ is unerversual DOS or Unix
    puts inp.each_slice(4).to_a.transpose.map{|e| e.join("t")}' file 
    

    Prints:

    4,4 6,8 4,2 7,2
    3,8 5,1 5,4 4,9
    4,2 4,5 5,9 4,4
    4,8 6,9 5,9 5,1
    

    If the line count length of your input is potentially not a perfect mutiple of the number of desired columns (ie, if your input is 15 lines or 17 lines rather that 16 lines which is an even multiple of 4) then you can do something like this:

    ruby -e '
    inp=$<.read.split(/R/)
    
    cols=4
    
    inp=inp.each_slice(inp.length/cols+(inp.length % cols == 0 ? 0 : 1)).to_a
    inp[0].zip(*inp[1..]){|sa| puts sa.join("t")}
    ' <(seq 14) 
    

    Prints:

    1   5   9   13
    2   6   10  14
    3   7   11  
    4   8   12  
    

    That is the same behavior as pr.

    Login or Signup to reply.
  5. If you have access to GNU datamash, here is yet another solution. I added a few numbers to your example to show what happens with an item total not divisible by 4.

    $ paste -d ' ' - - - - < file | datamash transpose -t ' '
    4,4 6,8 4,2 7,2 2,3 2,7
    3,8 5,1 5,4 4,9 2,4 2,8
    4,2 4,5 5,9 4,4 2,5 
    4,8 6,9 5,9 5,1 2,6
    

    The paste command puts 4 items at a time from your list into a single line with items separated by a single space. GNU datamash then transposes the resulting table and separates the items with a single space.

    Please note that this answer gives you a space-separated table. If you want a tab-separated table, the code is simpler:

    paste - - - - < file | datamash transpose
    

    Also, if you would like to make the length of the columns 7 (for example) instead of 4, you can do

    $ paste $(printf 'x2d %.0s' {1..7}) < file | datamash transpose
    4,4 6,9 4,4 2,8
    3,8 4,2 5,1 
    4,2 5,4 2,3 
    4,8 5,9 2,4 
    6,8 5,9 2,5 
    5,1 7,2 2,6 
    4,5 4,9 2,7
    

    (tab-separated, but not obvious in this Stack Overflow display)

    Login or Signup to reply.
  6. Bash approach for example, read these values as an array:

    arr=(
        4,4 #0
        3,8 #1
        4,2 #2
        4,8 #3
    
        6,8 #4
        5,1 #5
        4,5 #6
        6,9 #7
    
        4,2 #8
        5,4 #9
        5,9 #10
        5,9 #11
    
        7,2 #12
        4,9 #13
        4,4 #14
        5,1 #15
    )
    

    and process like this:

    n=4
    
    for((i=0; i<n; i++));{
        for((j=i; j<${#arr[@]}; j+=n));{
            printf "${arr[j]} "
        }
        echo
    }
    

    output:

    4,4 6,8 4,2 7,2 
    3,8 5,1 5,4 4,9 
    4,2 4,5 5,9 4,4 
    4,8 6,9 5,9 5,1 
    

    Hardcoded variant:

    for((a=0,b=4,c=8,d=12; a<${#arr[@]}/4; a++,b++,c++,d++));{
        echo ${arr[a]} ${arr[b]} ${arr[c]} ${arr[d]}
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search