skip to Main Content

I’m trying to access a protected page on twitter (for example my own like list) via urllib2 in Python, but this code always sends me back to the login page. Any idea why that is?

(I know I can use the twitter API and stuff, but want to learn in general how this is done)

Thanks,
Roy


The code:

url = "https://twitter.com/login"
protectedUrl = "https://twitter.com/username/likes

USER = "myTwitterUser"
PASS = "myTwitterPassword"

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [('User-Agent', 'Mozilla/5.0'), ("Referer", "https://twitter.com")]

hdr = {'User-Agent': 'Mozilla/5.0', "Referer":"https://twitter.com"}
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req)

html = page.read()
s = BeautifulSoup(html, "lxml")
AUTH_TOKEN = s.find(attrs={"name": "authenticity_token"})["value"]

login_details = {"session[username_or_email]": USER,
              "session[password]": PASS,
              "remember_me": 1,
              "return_to_ssl": "true",
              "scribe_log": "",
              "redirect_after_login": "/",
              "authenticity_token": AUTH_TOKEN
                 }

login_data = urllib.urlencode(login_details)
opener.open(url, login_data)
resp = opener.open(protectedUrl)
print resp.read()

2

Answers


  1. You need to post to the correct url which is "https://twitter.com/sessions", it is also essential to use the opener when you make the initial request to get the =authenticity_tokenso page = opener.open(req) in place of page = urllib2.urlopen(req) so we get the cookies needed:

    url = "https://twitter.com/"
    USER = "username"
    PASS = "pass"
    post = "https://twitter.com/sessions"
    likes = "https://twitter.com/{}/likes"
    
    # cookies
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    
    # headers
    head = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64)"}
    
    # create request
    req = urllib2.Request(url, headers=head)
    # must use the opener so the cookies persist
    page = opener.open(req)
    html = page.read()
    
    # extract the authenticity_token
    s = BeautifulSoup(html, "lxml")
    AUTH_TOKEN = s.select_one("input[name=authenticity_token]")["value"]
    
    login_details = {"session[username_or_email]": USER,
                     "session[password]": PASS,
                     "remember_me": 1,
                     "redirect_after_login": "/",
                     "authenticity_token": AUTH_TOKEN
                     }
    # encode form data
    login_data = urllib.urlencode(login_details)
    r = opener.open("https://twitter.com/sessions", login_data)
    
    # get likes now we have logged in
    resp = opener.open(likes.format(USER))
    
    print(resp.read())
    

    If we run the code using one of my twitter accounts with no likes:

    In [72]: login_details = {"session[username_or_email]": USER,
       ....:                  "session[password]": PASS,
       ....:                  "remember_me": 1,
       ....:                  "redirect_after_login": "/",
       ....:                  "authenticity_token": AUTH_TOKEN
       ....:                  }
    
    In [73]: # encode form data
    
    In [74]: login_data = urllib.urlencode(login_details)
    
    In [75]: r = opener.open("https://twitter.com/sessions", login_data)
    
    In [76]: # get likes now we have logged in
    
    In [77]: resp = opener.open(likes.format(USER))
    
    In [78]: soup = BeautifulSoup(resp.read(),"lxml")
    
    In [79]: print(soup.select_one("p.empty-text"))
    <p class="empty-text">
            You haven't liked any Tweets yet.
    
        </p>
    

    You can see we get successfully to the page we want.

    Doing the same with a requests.Session() object, the code has a lot less going on:

    USER = "username"
    PASS = "pass"
    post = "https://twitter.com/sessions"
    likes = "https://twitter.com/{}/likes"
    url = "https://twitter.com"
    
    data = {"session[username_or_email]": USER,
            "session[password]": PASS,
            "scribe_log": "",
            "redirect_after_login": "/",
            "remember_me": "1"}
    
    post = "https://twitter.com/sessions"
    
    with requests.Session() as s:
        r = s.get(url)
        soup = BeautifulSoup(r.content, "lxml")
        AUTH_TOKEN = soup.select_one("input[name=authenticity_token]")["value"]
        data["authenticity_token"] = AUTH_TOKEN
        r = s.post(post, data=data)
        soup = BeautifulSoup(r.content)
        print(s.get( "https://twitter.com/{}/likes".format(USER)).content)
    
    Login or Signup to reply.
  2. From the experience I had with websites like this, you need to use complete HTTP headers including:

    • accept
    • accept-encoding
    • accept-language
    • referrer
    • upgrade-insecure-requests
    • user-agent

    delete only the cookie from the header.

    You need also to create a session and handle cookies as twitter must be like facebook. I personally like more to use “requests” as you can create a session and use cookies easily.

    you can do something like this:

    import requests
    form time import sleep
    
    hd = {'h11': 'h12',  'h21': 'h22', 'h31': 'h32'}
    usrdata = {'user': USER, 'pass': PASS}
    
    sess = requests.Session()
    req = sess.get('http://www.twitter.com') ## to start session
    sleep(1)
    req = sess.post('https://twitter.com/sessions', data=usrdata, headers=hd)
    

    Hope this helps.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search