skip to Main Content

I’m trying to scrape the Tradingview web page with my own chart to read boolstates.

Here what I mean exactly

With this HTML code of the website

I’m working with Debian/Linux on a Server and programming with Python. I tried using BeautifulSoup to read the page and found out that BeautifulSoup can’t run JavaScript and therefore can’t display everything in HTML to work with it.

This code only outputs brackets []. So it didn’t found the class I’m searching for

import requests
import soupsieve 
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}
url = 'https://de.tradingview.com/chart/zDAFlgZJ/#'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')


output = soup.find_all('div', attrs={'class':'valueValue-3kA0oJs5'})
print(output)

After that I tried it with PyQt5 using this video

I changed that script of the Video into PyQt5 but cant bring the code to run.

That script outputs:

qt.qpa.screen: QXcbConnection: Could not connect to display :99
Could not connect to any X display.

but I don’t have a screen only the terminal.

import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEnginePage as QWebPage
import bs4 as bs
import urllib.request
import os

class Client(QWebPage):

    def __init__(self, url) :
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self.on_page_load)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()

    def on_page_load(self) :
        self.app.quit()


url = 'https://pythonprogramming.net/parsememcparseface/'
client_response = Client(url)
source = client_response.mainFrame().toHtml()
soup = bs.BeautifulSoup(source, 'lxml')
js_test = soup.find('p', class_='jstest')
print(js_test.text)
```

after that I tried it with this instruction with Selenium and Chromedriver. But It stopped in the installation after starting headless.sh:

./start_headless.sh: command not found

so I paste it manually into the terminal and tried to start demo.py

but I’m getting again errors.
With python 2.7

Traceback (most recent call last):
  File "demo.py", line 3, in <module>
    from pyvirtualdisplay import Display
  File "/usr/local/lib/python2.7/dist-packages/pyvirtualdisplay/__init__.py", line 4, in <module>
    from pyvirtualdisplay.display import Display
  File "/usr/local/lib/python2.7/dist-packages/pyvirtualdisplay/display.py", line 26
    backend: Optional[str] = None,
           ^
SyntaxError: invalid syntax

With python 3.7

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 169, in start
    cmd, stdout=stdout, stderr=stderr, cwd=self.cwd, env=self.env,
  File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'Xvfb': 'Xvfb'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "demo.py", line 6, in <module>
    display = Display(visible=0, size=(800, 600))
  File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/display.py", line 63, in __init__
    **kwargs
  File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/xvfb.py", line 50, in __init__
    manage_global_env=manage_global_env,
  File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/abstractdisplay.py", line 88, in __init__
    helptext = get_helptext(program)
  File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/util.py", line 10, in get_helptext
    p.call()
  File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 141, in call
    self.start().wait(timeout=timeout)
  File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 174, in start
    raise EasyProcessError(self, "start error")
easyprocess.EasyProcessError: start error <EasyProcess cmd_param=['Xvfb', '-help'] cmd=['Xvfb', '-help'] oserror=[Errno 2] No such file or directory: 'Xvfb': 'Xvfb' return_code=None stdout="None" stderr="None" timeout_happened=False>

I also tried it with websocket and could only read out the data from the standard chart, but I’ll leave that out here.

Does anyone have any idea how I can solve this initial problem?

2

Answers


  1. Chosen as BEST ANSWER

    I figured it out how to filter out the data from JS. For people who want to create similar functions here is the working script. Working with request_html:

    from requests_html import HTMLSession
    
    session = HTMLSession()
    url = 'YOUR WEBSITE'
    r = session.get(url)
    r.html.render()
    
    for item in r.html.xpath("//*[contains(@class,'CLASS NAME')]"):
        print(item.text)
    

  2. Might be on the heavier side, but have you thought about doing it with Selenium? You’d be able to run the full browser. If I’m not mistaken, you can still use BeautifulSoup with it as well.

    As for the route, you can probably find brokers that offer that info via a proper API, which would obviously be the ideal scenario. Interactive Brokers comes to mind.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search