So i made a script that just scrapes for all hyperlinks on linkedin. i dont want anything else. this is what i did.
import re
import requests
import logging
from telegram.ext import *
from bs4 import BeautifulSoup
from datetime import datetime
url = 'https://www.linkedin.com/jobs/search/?f_E=2%2C3&f_TPR=r86400&geoId=103644278&keywords=data%20analytics&location=United%20States'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')
urls = []
for link in soup.find_all('a', href=True):
words = link.get('href')
if "jobs/view" in words:
print(words)
ResultText = requests.get("https://api.telegram.org/bot[TOKEN]/sendMessage?chat_id=[CHANNEL ID]&text={}".format(words))
import urllib
ParsedRestultText = urllib.parse.quote_plus(words)
so the for link in soup.findall it spits out everything properly in terminal like this :
but my telegram is showing me just the "community guidelines page" like this:
and occasionally, I will get a {} like this:
i looked at multiple documentations but i cant find one where they are JUST getting the hyperlinks and having it sent. It looks like i finally made the connection after many hours of figuring it out to my bot, but i dont understand as to why the hyperlinks are not getting sent to my bot that i built ?
2
Answers
This just overrides the variable
words
in every iteration. If you want to gather all links that you find, you should probably append them to a list and use something like', '.join(found_links)
as text to send. See the str.join docs.ResultText = requests.get("https://api.telegram.org/bot[TOKEN]/sendMessage?chat_id=[CHANNEL ID]&text={}".format(words))
Since this statement is outside of the
for
loop, you only get the last link that is scraped.If you want to send all the links to Telegram Bot place the above statement inside the
for
loop. (But this will result in multiple calls to your Telegram Bot API)Although the best approach would be to store all the links in a list and send that list to your Telegram Bot in one call.