Ebay API - regex filter for list of words until nth occurence of character

copamundial
June 17, 2020
266 views
0 votes
2 Answers

I have a Dataframe with urls. I have a blacklist with words to filter these urls.
No I want to filter these urls until the third occurence of /.
So for example:

http://example.com/abc/def/

Here I would like to filter only until the third occurence of /.

So just:
http://example.com/

I read some similiar questions and I guess I need to combine two regexes.

/.*?/(.*?)/ this should do the job for filtering until the third occurence of /
to filter the for a list of words I use this expression:

mask = df["url"].str.contains(r'b(?:{})b'.format('|'.join(blacklist)))
df_new = df[~mask]

Now I don’t know how to combine these two expressions. I’m new to Python and especially regex so there also might be a smarter way of doing this task.

Thank you.

EDIT:
Blacklist looks like this: ["ebay","shop","camping","car"]

Df like this:

url                             text
http://example.com/abc/def/     fdogjdfgfd
http://abcde.com/yzt/egd/        oijfgfdgdf
http://ebay.com/buy/something    fgfgeg

Answers

- ShubhamSharma
- June 17, 2020 at 10:00 am
- 0 votes
0
Use, Series.str.contains with the given regex pattern:
```
pattern = '|'.join(rf'(?://[^/]*?{b}[^/]+)' for b in blacklist)
m = df['url'].str.contains(pattern, case=False)
df = df[~m]
```
```
# print(df)
                           url        text
0  http://example.com/abc/def/  fdogjdfgfd
1    http://abcde.com/yzt/egd/  oijfgfdgdf
```
You can test the regex here.
Login or Signup to reply.

- Stef
- June 17, 2020 at 10:04 am
- 0 votes
0
You can first extract the part of the url up to the third '/' and then use you logic on this:
```
mask = df["url"].str.extract(r'((?:[^/]*/[^/]*){,3})').str.contains(r'b(?:{})b'.format('|'.join(blacklist)))
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Ebay API – regex filter for list of words until nth occurence of character

Answers