skip to Main Content

I have a MongoDB NoSQL database, the name is baike, there is a collection named baike_items with the following format:

id:
title:
baike_id
page_url
text

All other fields are fine except the page_url. Some of the urls are normal like:

'https://baike.baidu.hk/item/%E5%A5%91%E4%B8%B9%E6%97%8F/2390374'

But some urls are ended with a string #viewPageContent, like:

https://baike.baidu.hk/item/%E5%E6%97%8F/11435374#viewPageContent

My intention is to write a mongoDB query to remove all the urls’ #viewPageContent string while keep the rest of the string.

https://baike.baidu.hk/item/123#viewPageContent
https://baike.baidu.hk/item/456#viewPageContent
.
.
.

to

https://baike.baidu.hk/item/123
https://baike.baidu.hk/item/456
.
.
.

Any suggestions? thanks.

update1
The following python should do it.

db.baike_items.update_many(
  { "page_url": { "$regex": "#viewPageContent"} },
  [{
    "$set": { "page_url": {
      "$replaceOne": { "input": "$page_url", "find": "#viewPageContent", "replacement": "" }
    }}
  }]
)

3

Answers


  1. Chosen as BEST ANSWER
    db.baike_items.update_many(
      { "page_url": { "$regex": "#viewPageContent"} },
      [{
        "$set": { "page_url": {
          "$replaceOne": { "input": "$page_url", "find": "#viewPageContent", "replacement": "" }
        }}
      }]
    )
    

  2. old_url = "https://baike.baidu.hk/item/%E7%89%A9%E7%90%86%E5%85%89%E5%AD%B8/61334055#viewPageContent"
    
    new_url = old_url.replace("#viewPageContent", "")
    
    print(old_url)
    >>> https://baike.baidu.hk/item/%E7%89%A9%E7%90%86%E5%85%89%E5%AD%B8/61334055#viewPageContent
    
    print(new_url)
    >>> https://baike.baidu.hk/item/%E7%89%A9%E7%90%86%E5%85%89%E5%AD%B8/61334055
    
    Login or Signup to reply.
  3. import re
    a = "https://baike.baidu.hk/item/%E7%89%A9%E7%90%86%E5%85%89%E5%AD%B8/61334055#viewPageContent"
    print(re.sub(r"#viewPageContent", '', a))
    

    output: https://baike.baidu.hk/item/%E7%89%A9%E7%90%86%E5%85%89%E5%AD%B8/61334055

    Hope I could help you!

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search