skip to Main Content

first question in a long while! I need to find any and all urls’s in a string returned from a facebook page request (I’m requesting the website of a page using the graphi api) and putting the value into an array that I subsequently display in a datatable js table.

Anyhow, I’m having issues as when I build the json data for the datatable, it breaks in some cases:-

http://socialinsightlab.com/datatable_fpages.json

The issue is with the website field having erroneous characters / structure / white space etc in the field.

Anyhow I found the perfect regex to use to find all websites in the field (there can be more than one website listed in the return).

The regex is

(?i)b((?:[a-z][w-]+:(?:/{1,3}|[a-z0-9%])|wwwd{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^s()<>]+|(([^s()<>]+|(([^s()<>]+)))*))+(?:(([^s()<>]+|(([^s()<>]+)))*)|[^s`!()[]{};:'".,<>?«»“”‘’]))

When I try and assign it to a php variable as in preg_match_all I can’t as it won’t accept the regex string into the variable as it has quotes in it I guess.

So my question is how can I extract only the urls found in the website field and then assign them to a variable so i can add them to the datatable.

Here is an example of a call that fails:-

http://socialinsightlab.com/datatable_fpages.json

I need to be able to just return websites and nothing more.

Any ideas?

Thanks

Jonathan

2

Answers


  1. This regex is specifically made as a solution to this problem:

    (?:https?://|www)[^"s]+
    

    Live demo

    Login or Signup to reply.
  2. If you don’t want to deal with all this quotes escaping, you can do the following:

    1. Save regex to a file, say, regex.txt.
    2. Read this file into variable and trim: $regex = trim(file_get_contents("regex.txt"));
    3. Use it with preg_match() etc.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search