I have a MongoDB NoSQL database, the name is baike
, there is a collection named baike_items
with the following format:
id:
title:
baike_id
page_url
text
All other fields are fine except the page_url
. Some of the urls are normal like:
'https://baike.baidu.hk/item/%E5%A5%91%E4%B8%B9%E6%97%8F/2390374'
But some urls are ended with a string #viewPageContent
, like:
https://baike.baidu.hk/item/%E5%E6%97%8F/11435374#viewPageContent
My intention is to write a mongoDB query to remove all the urls’ #viewPageContent
string while keep the rest of the string.
https://baike.baidu.hk/item/123#viewPageContent
https://baike.baidu.hk/item/456#viewPageContent
.
.
.
to
https://baike.baidu.hk/item/123
https://baike.baidu.hk/item/456
.
.
.
Any suggestions? thanks.
update1
The following python should do it.
db.baike_items.update_many(
{ "page_url": { "$regex": "#viewPageContent"} },
[{
"$set": { "page_url": {
"$replaceOne": { "input": "$page_url", "find": "#viewPageContent", "replacement": "" }
}}
}]
)
3
Answers
output:
https://baike.baidu.hk/item/%E7%89%A9%E7%90%86%E5%85%89%E5%AD%B8/61334055
Hope I could help you!