skip to Main Content

Let’s say I have a file, called courses.txt with contents like below.
the file has sections(course providers and my email used) followed by various courses. example : edX ([email protected]) and then the various course names, each preceded by the serial number.

udemy ([email protected])  
"=========================="-  
1) foo bar
2) java programming language
3) redis stephen grider
4) javascript
5) react with typescript
6) kotlin
7) Etherium and Solidity : the Complete Developer's Guide
8) reactive programming with spring  


coursera ([email protected])  
"==========================-"  
1) python
2) typescript
3) java concurrency
4) C#

edX ([email protected])  
"==========================-"  
1) excel
2) scala
3) risk management
4) stock
5) oracle
6) mysql  
7) java  
==========================-    

Question : I want to grep for a course, say "java". I want a match which shows me the particular line(s) of the match(example : "java") and the corresponding section name(say, "edX ([email protected])" ).

if I want to search for "java" what "regex" will give me following matches (I use grep/perl on windows):

  <br>
udemy ([email protected])    
2) java programming language  

coursera ([email protected])  
3) java concurrency

edX ([email protected])    
7) java    

I tried lookbehind/lookahead but couldn’t figure out how to print the course provider name with email and the course name.

Thoughts?

3

Answers


  1. I won’t give you a complete solution, but you can start with this:

    grep -iE "java|@" filename.txt
    

    Some explanation:

    • the -i makes it case insensitive
    • the -E uses extended regular expressions
    • the | is an example of those extended regular expressions and it means "OR": show the lines which contain ‘java’ OR ‘@’ (the latter being all the email adresses)

    As a result, you get a file with all the e-mail addresses, and all the ‘java’ courses, together with a catch: if a line with an e-mail address is followed by another line with an e-mail address, then there’s no ‘java’ course for that address. Hence, you can now use Perl and remove the e-mail addresses where the next line also is an e-mail address.

    Login or Signup to reply.
  2. Looking at the input data we can conclude that section starts with a line which includes email address.

    Data for the section starts with serial number.

    Based on this information we can build a hash %sections with line which includes email as a key, and all lines starting with serial number can be stored in an array under the key.

    Once the hash is build the code goes through all sections and looks for lines which include search term, if the term found the output section with matching line.

    Note: to work on real file replace <DATA> with <> then run as ./script.pl filename.dat

    use strict;
    use warnings;
    use feature 'say';
    
    my($lookfor, %sections, $key);
    
    $lookfor = shift || die "Provide search term";
    
    while( <DATA> ) {
        chomp;
        $key = $_ if /@/;
        push @{$sections{$key}}, $_ if /^d) /;
    }
    
    for my $section (keys %sections ) {
        for( @{$sections{$section}} ) {
            say "$sectionn"
              . '-' x 30
              . "n$_n" if /b$lookforb/i;
        }
    }
    
    exit 0;
    
    __DATA__
    udemy ([email protected])  
    "=========================="-  
    1) foo bar
    2) java programming language
    3) redis stephen grider
    4) javascript
    5) react with typescript
    6) kotlin
    7) Etherium and Solidity : the Complete Developer's Guide
    8) reactive programming with spring  
    
    
    coursera ([email protected])  
    "==========================-"  
    1) python
    2) typescript
    3) java concurrency
    4) C#
    
    edX ([email protected])  
    "==========================-"  
    1) excel
    2) scala
    3) risk management
    4) stock
    5) oracle
    6) mysql  
    7) java  
    ==========================-    
    <br>
    

    Output

    edX ([email protected])
    ------------------------------
    7) java
    
    coursera ([email protected])
    ------------------------------
    3) java concurrency
    
    udemy ([email protected])
    ------------------------------
    2) java programming language
    
    
    Login or Signup to reply.
  3. If you process in paragraphs (chunks of text separated by blank lines) then in each paragraph it is fairly straightforward to match the needed pattern — the header (followed by a line with =‘s) and a line with java in it

    perl -00 -wnE'say "$1n$2" 
        if /(.+?) n "=+.+? n .+? n ([^n]+sjavas[^n]+)/sx' file
    

    (Tested on Linux; read on for Windows. Broken into lines for easier reading. See below for explanation of the pattern.)

    At the end of the line with =s I use .+? instead of the specific characters that follow =s in your input because your sample input isn’t consistent; it has both -" and "-, in different paragraphs. Adjust as suitable.

    Since this is on Windows, where you may have to use " delimiters for the one-liner (I don’t know what shell you use), you may need to replace the literal " inside the pattern with x22 (hex for "), or your other favorite sequence.

    Hopefully good for Windows (can’t test on Windows right now)

    perl -00 -wnE "say qq($1n$2) 
        if /(.+?)n x22=+.+? n .+? n ([^n]+sjavas[^n]+) /sx
    " file
    

    The -00 switch makes it read in paragraphs. With the /x modifier spaces inside the pattern are ignored so we can use them to space things out for readability. With the /s modifier the . matches a newline as well. This is important for the middle .+? to matche multiple lines, up to the one with java (surrounded by spaces).

    If you don’t mind having a script instead of a one-liner, what I recommend, then, for example

    use warnings;
    use strict;
    use feature 'say';
    
    local $/ = "nn";
    
    while (<>) { 
        say "$1n$2" 
            if /(.+?) n "=+.+? n .+? n ([^n]+ sjavas [^n]+)/sx;
    }
    

    The <> operator reads files given on the command line, line by line, but the notion of a "line" is earlier set to a paragraph with local $/ = "nn". That local is there in case this is a part of a larger program where you don’t want to change the $/ variable for the whole program!


    Or, instead of using /s that makes . match newlines, use a pattern for multiple lines

    perl -00 -wnE'say "$1n$2" 
        if /(.+) n "=+.+ n (?:.+n)* (.+sjavas.+)/x' file
    

    Or, if you need "..." on Windows, like

    perl -00 -wnE "say qq($1n$2) 
        if /(.+) n x22=+.+ n (?:.+n)* (.+sjavas.+)/x' file
    

    (Again, I can’t test on Windows right now.)

    Note that now we don’t have to make all those .+ non-greedy with the added ? (.+?) like in the patterns with /s above — now that .+ stops at a newline, just as needed here.

    Or, use the /s modifier dynamically, via extended patterns

    perl -00 -wnE "say "$1n$2" 
        if /(.+) n x22=+.+ n (?s).+?(?-s) (.+sjavas.+)/x
    " file
    

    Here (?s) "turns on" the /s modifier, which would be in effect until the end of the enclosing group (the rest of the pattern in this case), but (?-s) turns it off.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search