I’m trying to scrape the Tradingview web page with my own chart to read boolstates.
Here what I mean exactly
With this HTML code of the website
I’m working with Debian/Linux on a Server and programming with Python. I tried using BeautifulSoup to read the page and found out that BeautifulSoup can’t run JavaScript and therefore can’t display everything in HTML to work with it.
This code only outputs brackets []. So it didn’t found the class I’m searching for
import requests
import soupsieve
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}
url = 'https://de.tradingview.com/chart/zDAFlgZJ/#'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
output = soup.find_all('div', attrs={'class':'valueValue-3kA0oJs5'})
print(output)
After that I tried it with PyQt5 using this video
I changed that script of the Video into PyQt5 but cant bring the code to run.
That script outputs:
qt.qpa.screen: QXcbConnection: Could not connect to display :99
Could not connect to any X display.
but I don’t have a screen only the terminal.
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEnginePage as QWebPage
import bs4 as bs
import urllib.request
import os
class Client(QWebPage):
def __init__(self, url) :
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self.on_page_load)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def on_page_load(self) :
self.app.quit()
url = 'https://pythonprogramming.net/parsememcparseface/'
client_response = Client(url)
source = client_response.mainFrame().toHtml()
soup = bs.BeautifulSoup(source, 'lxml')
js_test = soup.find('p', class_='jstest')
print(js_test.text)
```
after that I tried it with this instruction with Selenium and Chromedriver. But It stopped in the installation after starting headless.sh:
./start_headless.sh: command not found
so I paste it manually into the terminal and tried to start demo.py
but I’m getting again errors.
With python 2.7
Traceback (most recent call last):
File "demo.py", line 3, in <module>
from pyvirtualdisplay import Display
File "/usr/local/lib/python2.7/dist-packages/pyvirtualdisplay/__init__.py", line 4, in <module>
from pyvirtualdisplay.display import Display
File "/usr/local/lib/python2.7/dist-packages/pyvirtualdisplay/display.py", line 26
backend: Optional[str] = None,
^
SyntaxError: invalid syntax
With python 3.7
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 169, in start
cmd, stdout=stdout, stderr=stderr, cwd=self.cwd, env=self.env,
File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'Xvfb': 'Xvfb'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "demo.py", line 6, in <module>
display = Display(visible=0, size=(800, 600))
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/display.py", line 63, in __init__
**kwargs
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/xvfb.py", line 50, in __init__
manage_global_env=manage_global_env,
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/abstractdisplay.py", line 88, in __init__
helptext = get_helptext(program)
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/util.py", line 10, in get_helptext
p.call()
File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 141, in call
self.start().wait(timeout=timeout)
File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 174, in start
raise EasyProcessError(self, "start error")
easyprocess.EasyProcessError: start error <EasyProcess cmd_param=['Xvfb', '-help'] cmd=['Xvfb', '-help'] oserror=[Errno 2] No such file or directory: 'Xvfb': 'Xvfb' return_code=None stdout="None" stderr="None" timeout_happened=False>
I also tried it with websocket and could only read out the data from the standard chart, but I’ll leave that out here.
Does anyone have any idea how I can solve this initial problem?
2
Answers
I figured it out how to filter out the data from JS. For people who want to create similar functions here is the working script. Working with request_html:
Might be on the heavier side, but have you thought about doing it with Selenium? You’d be able to run the full browser. If I’m not mistaken, you can still use BeautifulSoup with it as well.
As for the route, you can probably find brokers that offer that info via a proper API, which would obviously be the ideal scenario. Interactive Brokers comes to mind.