skip to Main Content

I made a website where people can post links for other websites and then the backend generates a preview of the link (by using curl and parsing out the open graph tags available on most websites / by picking the first image, html title etc). Now, fine after some tweaking but sometimes I get some kind of rate limit.

Here is one example of a link I want to parse: https://www.facebook.com/HBR/posts/10157131816732787

I can parse it 4 ou 5 times and get a title, image etc but then if I repeat it I get sent to the login page of facebook. How can I avoid this?

I tried to parse the link at https://developers.facebook.com/tools/debug/sharing however it says “Facebook URLs cannot be crawled”. So my question is: how am I even supposed to parse those links if they don’t even allow it on their debugger?

Is there any kind of API that allows me to get this information without user login? I don’t want to parse entire facebook pages, profiles etc, just get a preview for a link that my users might post on the website.

2

Answers


  1. You MUST use the Facebook Graph API if you want to get data of Facebook Pages (or anything else on Facebook), scraping is not allowed.

    In order to get data of Pages you do not own, you need to apply for Page Public Content Access: https://developers.facebook.com/docs/apps/review/feature/#reference-PAGES_ACCESS

    An App Access Token (without Login) is sufficient in that case.

    API Reference for Pages: https://developers.facebook.com/docs/graph-api/reference/page/

    Login or Signup to reply.
  2. I dont think show.You can crawl post on public group using python selenium and beautiful soup

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search