I’m currently parsing some Gherkin files along with their associated step definition files. I’m wondering what the best way would be to extract the RegEx inside the step along with the code would be. For example, I have the following functions:
this.Given(/^I create an SNS topic with name "([^"]*)"$/, function(name, callback) {
var world = this;
this.request(null, 'createTopic', {Name: name}, callback, function (resp) {
world.topicArn = resp.data.TopicArn;
});
});
this.Given(/^I list the SNS topics$/, function(callback) {
this.request(null, 'listTopics', {}, callback);
});
I want to extract both the regex ^I create an SNS topic with name "([^"]*)"$
and function code:
var world = this;
this.request(null, 'createTopic', {Name: name}, callback, function (resp) {
world.topicArn = resp.data.TopicArn;
});
I’ve been able to extract the regex using the following regex: ‘this.(?:Given|Then|When)(/(.+?)/’
However, extracting the function code is a lot more tricky. How can I specify to extract everything from the first { to the last } for the function? Is there a better way to do this i.e. a library that automatically can extract it?
2
Answers
Regular expressions are not suited to parse correctly general prorgrams(1). You should use a javascript parser instead.
Another way would be to choose a proxy; for example:
this.Given(
,this.Given(
and the last});
you see in the chunk as the "function body"this simplistic approach has some obvious blind spots (that’s why I called it "a proxy"):
it won’t work if you happen to have nested
this.Given(
statements, it would incorrectly catch a final});
in a comment line, it would incorrectly include the code from another function declaration (if you happen to have some that are declared between twothis.Given(
statements), …but if your code has a regular structure this may be quicker to implement than using a complete javascript parser.
(1) : programming languages generally are in the "context free" or "context sensitive" language classes, while regular expressions can only parse "regular" languages
For your sample data, a recursive regex (such as supported by the regex module) could work:
This matches:
this.(?:Given|Then|When)(/(.+?)/
: your original regex,s*functions*
: the start of the function declaration(((?:[^()]|(?2))*))
: a recursive regex for the function arguments (allows for nested()
)({(?:[^{}]|(?3))*})
: a recursive regex for the code block (allows for nested{}
));
: the trailing part of the outer function callRegex demo on regex101
In python:
Output:
Note that this will fail when there are brackets in quotes in the code, although there are ways you can work around that too, for example using ideas from this answer.