I have these in a file under CentOS:
real1 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
real2 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
real3 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
173corr 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
512corr 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
513corr 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
There are two blocks "real" and "corr" though each block may contain multiple subcontents, i.e. real1
, real2
etc.
I would like the subcontents of each block being joined. The output will looks like:
real1 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
173corr 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
To accomplish this in Editplus, take the real block as example, I need to highlight the whole real block, and find all nreald+n
occurrences and replace with tt
.
The challenges are:
-
How to highlight multiple lines in sed. For example, there is one real block starting from line 5 to line 10, and another real block from 30 to 50. Each of the real blocks will be highlighted and performed the same replacement block by block in Editplus. I don’t know if sed can do all at once. If not, designate and perform replacement on each block is ok.
-
The header of each subcontents is in name+digit format, i.e.
real1
,real2
and so on. So I addd+
in my trial on CentOS, but it seems not working.
I know this is a very complex problem. I just hope sed can do the trick.
3
Answers
I’m sure
sed
can do the trick, I just don’t do sed very well … how about an uglyawk
script?Here the script:
And the outcome:
This might work for you (GNU sed):
Print lines as normal until one containing either
real
orcorr
, then gather up the following lines removing the newline and the start of line information. On change of key print each line.Let
be
data.txt
thengives output:
Explanation: I use two variables:
seen
to keep current category, where category is defined as content of first column with all digits removed,acc
to load content of lines with common category. For every line I calculate current category, if it is same as in previous line I only append content of current line (sans first line content) to myacc
, else I printacc
, setseen
accordingly and setacc
to current first line content. InEND
I doprint acc
, as otherwise content of last category would be missing.(tested in GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0))