Unable to access login page with urllib2 - Twitter API

Rr
July 5, 2016
226 views
1 vote
2 Answers

I’m trying to access a protected page on twitter (for example my own like list) via urllib2 in Python, but this code always sends me back to the login page. Any idea why that is?

(I know I can use the twitter API and stuff, but want to learn in general how this is done)

Thanks,
Roy

The code:

url = "https://twitter.com/login"
protectedUrl = "https://twitter.com/username/likes

USER = "myTwitterUser"
PASS = "myTwitterPassword"

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [('User-Agent', 'Mozilla/5.0'), ("Referer", "https://twitter.com")]

hdr = {'User-Agent': 'Mozilla/5.0', "Referer":"https://twitter.com"}
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req)

html = page.read()
s = BeautifulSoup(html, "lxml")
AUTH_TOKEN = s.find(attrs={"name": "authenticity_token"})["value"]

login_details = {"session[username_or_email]": USER,
              "session[password]": PASS,
              "remember_me": 1,
              "return_to_ssl": "true",
              "scribe_log": "",
              "redirect_after_login": "/",
              "authenticity_token": AUTH_TOKEN
                 }

login_data = urllib.urlencode(login_details)
opener.open(url, login_data)
resp = opener.open(protectedUrl)
print resp.read()

Answers

You need to post to the correct url which is "https://twitter.com/sessions", it is also essential to use the opener when you make the initial request to get the =authenticity_tokenso page = opener.open(req) in place of page = urllib2.urlopen(req) so we get the cookies needed:

url = "https://twitter.com/"
USER = "username"
PASS = "pass"
post = "https://twitter.com/sessions"
likes = "https://twitter.com/{}/likes"

# cookies
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

# headers
head = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64)"}

# create request
req = urllib2.Request(url, headers=head)
# must use the opener so the cookies persist
page = opener.open(req)
html = page.read()

# extract the authenticity_token
s = BeautifulSoup(html, "lxml")
AUTH_TOKEN = s.select_one("input[name=authenticity_token]")["value"]

login_details = {"session[username_or_email]": USER,
                 "session[password]": PASS,
                 "remember_me": 1,
                 "redirect_after_login": "/",
                 "authenticity_token": AUTH_TOKEN
                 }
# encode form data
login_data = urllib.urlencode(login_details)
r = opener.open("https://twitter.com/sessions", login_data)

# get likes now we have logged in
resp = opener.open(likes.format(USER))

print(resp.read())

If we run the code using one of my twitter accounts with no likes:

In [72]: login_details = {"session[username_or_email]": USER,
   ....:                  "session[password]": PASS,
   ....:                  "remember_me": 1,
   ....:                  "redirect_after_login": "/",
   ....:                  "authenticity_token": AUTH_TOKEN
   ....:                  }

In [73]: # encode form data

In [74]: login_data = urllib.urlencode(login_details)

In [75]: r = opener.open("https://twitter.com/sessions", login_data)

In [76]: # get likes now we have logged in

In [77]: resp = opener.open(likes.format(USER))

In [78]: soup = BeautifulSoup(resp.read(),"lxml")

In [79]: print(soup.select_one("p.empty-text"))
<p class="empty-text">
        You haven't liked any Tweets yet.

    </p>

You can see we get successfully to the page we want.

Doing the same with a requests.Session() object, the code has a lot less going on:

USER = "username"
PASS = "pass"
post = "https://twitter.com/sessions"
likes = "https://twitter.com/{}/likes"
url = "https://twitter.com"

data = {"session[username_or_email]": USER,
        "session[password]": PASS,
        "scribe_log": "",
        "redirect_after_login": "/",
        "remember_me": "1"}

post = "https://twitter.com/sessions"

with requests.Session() as s:
    r = s.get(url)
    soup = BeautifulSoup(r.content, "lxml")
    AUTH_TOKEN = soup.select_one("input[name=authenticity_token]")["value"]
    data["authenticity_token"] = AUTH_TOKEN
    r = s.post(post, data=data)
    soup = BeautifulSoup(r.content)
    print(s.get( "https://twitter.com/{}/likes".format(USER)).content)

- Amine
- July 5, 2016 at 5:31 pm
- 0 votes
0
From the experience I had with websites like this, you need to use complete HTTP headers including:
- accept
- accept-encoding
- accept-language
- referrer
- upgrade-insecure-requests
- …
- user-agent
delete only the cookie from the header.

You need also to create a session and handle cookies as twitter must be like facebook. I personally like more to use “requests” as you can create a session and use cookies easily.

you can do something like this:
```
import requests
form time import sleep

hd = {'h11': 'h12',  'h21': 'h22', 'h31': 'h32'}
usrdata = {'user': USER, 'pass': PASS}

sess = requests.Session()
req = sess.get('http://www.twitter.com') ## to start session
sleep(1)
req = sess.post('https://twitter.com/sessions', data=usrdata, headers=hd)
```
Hope this helps.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Unable to access login page with urllib2 – Twitter API

Answers