skip to Main Content

I’m currently parsing some Gherkin files along with their associated step definition files. I’m wondering what the best way would be to extract the RegEx inside the step along with the code would be. For example, I have the following functions:

this.Given(/^I create an SNS topic with name "([^"]*)"$/, function(name, callback) {
    var world = this;
    this.request(null, 'createTopic', {Name: name}, callback, function (resp) {
      world.topicArn = resp.data.TopicArn;
    });
  });

  this.Given(/^I list the SNS topics$/, function(callback) {
    this.request(null, 'listTopics', {}, callback);
  });

I want to extract both the regex ^I create an SNS topic with name "([^"]*)"$ and function code:

    var world = this;
    this.request(null, 'createTopic', {Name: name}, callback, function (resp) {
      world.topicArn = resp.data.TopicArn;
    });

I’ve been able to extract the regex using the following regex: ‘this.(?:Given|Then|When)(/(.+?)/’

However, extracting the function code is a lot more tricky. How can I specify to extract everything from the first { to the last } for the function? Is there a better way to do this i.e. a library that automatically can extract it?

2

Answers


  1. Regular expressions are not suited to parse correctly general prorgrams(1). You should use a javascript parser instead.

    Another way would be to choose a proxy; for example:

    • you can split your file in line chunks starting each with this.Given(,
    • keep whatever lies between that this.Given( and the last }); you see in the chunk as the "function body"

    this simplistic approach has some obvious blind spots (that’s why I called it "a proxy"):
    it won’t work if you happen to have nested this.Given( statements, it would incorrectly catch a final }); in a comment line, it would incorrectly include the code from another function declaration (if you happen to have some that are declared between two this.Given( statements), …

    but if your code has a regular structure this may be quicker to implement than using a complete javascript parser.


    (1) : programming languages generally are in the "context free" or "context sensitive" language classes, while regular expressions can only parse "regular" languages

    Login or Signup to reply.
  2. For your sample data, a recursive regex (such as supported by the regex module) could work:

    this.(?:Given|Then|When)(/(.+?)/,s*functions*(((?:[^()]|(?2))*))s*({(?:[^{}]|(?3))*}));
    

    This matches:

    • this.(?:Given|Then|When)(/(.+?)/ : your original regex
    • ,s*functions* : the start of the function declaration
    • (((?:[^()]|(?2))*)) : a recursive regex for the function arguments (allows for nested ())
    • ({(?:[^{}]|(?3))*}) : a recursive regex for the code block (allows for nested {})
    • ); : the trailing part of the outer function call

    Regex demo on regex101

    In python:

    import regex
    
    text = '''this.Given(/^I create an SNS topic with name "([^"]*)"$/, function(name, callback) {
        var world = this;
        this.request(null, 'createTopic', {Name: name}, callback, function (resp) {
          world.topicArn = resp.data.TopicArn;
        });
      });
    
      this.Given(/^I list the SNS topics$/, function(callback) {
        this.request(null, 'listTopics', {}, callback);
      });
    '''
    
    pattern = regex.compile(r'this.(?:Given|Then|When)(/(.+?)/,s*functions*(((?:[^()]|(?2))*))s*({(?:[^{}]|(?3))*}));')
    pattern.findall(text)
    

    Output:

    [
        (
            '^I create an SNS topic with name "([^"]*)"$',
            '(name, callback)',
            "{n    var world = this;n    this.request(null, 'createTopic', {Name: name}, callback, function (resp) {n      world.topicArn = resp.data.TopicArn;n    });n  }"
        ),
        (
            '^I list the SNS topics$',
            '(callback)',
            "{n    this.request(null, 'listTopics', {}, callback);n  }"
        )
    ]
    

    Note that this will fail when there are brackets in quotes in the code, although there are ways you can work around that too, for example using ideas from this answer.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search