I am working on a project that requires me to extract the names and locations of the Python packages that were installed using pip install
command.
A webpage contains a code
element that has multiline text with bash commands. I want to write a JS code that can parse this text and find the packages and their positions in the text.
For example, if the text is:
$ pip install numpy
pip install --global-option build_ext -t ../ pandas>=1.0.0,<2
sudo apt update
pip uninstall numpy
pip install "requests==12.2.2"
I want to get something like this:
[
{
"name": "numpy",
"position": 14
},
{
"name": "pandas",
"position": 65
},
{
"name": "requests",
"position": 131
}
]
How can I do this in JavaScript?
2
Answers
Here is an optional solution, trying to use loops instead of Regex:
The idea will be to find the lines with the
pip install
text, so they are the lines we are interested. Then, break the command into words, and loop on them until we reach the packages part of the command.First, we will define the regex for a package. Remember that a package can be something like
pip install 'stevedore>=1.3.0,<1.4.0' "MySQL_python==1.2.2"
:Note to the named groups, the
package_part
used to identify the "package with version" string, while thepackage_name
used to extract the package name.About the arguments
We have two types of CLI arguments: options and flags.
The problem with options is that we need to understand that the next word is not a package name, but the option value.
So first I listed all the options from
pip install
command:Then I wrote a function that will be used later, to decide what to do when we see an argument:
This function received the identified argument, and the rest of the command, split into words.
(Here you start to see the "index counter". Since we also need to find the position of each find, we need to keep track of the current position in the original text).
In the last lines of the function, you can see I handling both
--option=something
and--option something
.The Parser
Now the main parser is splitting the original text into lines, then into words.
Each action have to update the global index to keep track our position in the text, and also, this index help us to search and find inside the text and not fall to wrong substring, buy
indexOf(str, counterIndex)
:You can see my explained code in this answer.
Here is another similar solution, based more on Regex: