first question in a long while! I need to find any and all urls’s in a string returned from a facebook page request (I’m requesting the website of a page using the graphi api) and putting the value into an array that I subsequently display in a datatable js table.
Anyhow, I’m having issues as when I build the json data for the datatable, it breaks in some cases:-
http://socialinsightlab.com/datatable_fpages.json
The issue is with the website field having erroneous characters / structure / white space etc in the field.
Anyhow I found the perfect regex to use to find all websites in the field (there can be more than one website listed in the return).
The regex is
(?i)b((?:[a-z][w-]+:(?:/{1,3}|[a-z0-9%])|wwwd{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^s()<>]+|(([^s()<>]+|(([^s()<>]+)))*))+(?:(([^s()<>]+|(([^s()<>]+)))*)|[^s`!()[]{};:'".,<>?«»“”‘’]))
When I try and assign it to a php variable as in preg_match_all I can’t as it won’t accept the regex string into the variable as it has quotes in it I guess.
So my question is how can I extract only the urls found in the website field and then assign them to a variable so i can add them to the datatable.
Here is an example of a call that fails:-
http://socialinsightlab.com/datatable_fpages.json
I need to be able to just return websites and nothing more.
Any ideas?
Thanks
Jonathan
2
Answers
This regex is specifically made as a solution to this problem:
Live demo
If you don’t want to deal with all this quotes escaping, you can do the following:
regex.txt
.$regex = trim(file_get_contents("regex.txt"));
preg_match()
etc.