skip to Main Content

I am using the wikimedia api to retrieve all possible URL’s from a wikipedia article ,’https://en.wikipedia.org/w/api.php?action=query&prop=links&redirects&pllimit=500&format=json‘ , but it is only giving a list of link titles , for example , Artificial Intelligence , wikipedia page has a link titled ” delivery networks,” , but the actual URL is “https://en.wikipedia.org/wiki/Content_delivery_network” , which is what I want

2

Answers


  1. I have replaced most of my previous answer, including the code, to use the information provided in Tgr’s answer, in case someone else would like sample Python code. This code is heavily based on code from Mediawiki for so-called ‘raw continuations’.

    I have deliberately limited the number of links requested per invocation to five so that one more parameter possibility could be demonstrated.

    import requests
    
    def query(request):
        request['action'] = 'query'
        request['format'] = 'json'
        request['prop'] = 'info'
        request['generator'] = 'links'
        request['inprop'] = 'url'
        previousContinue = {}
        while True:
            req = request.copy()
            req.update(previousContinue)
            result = requests.get('http://en.wikipedia.org/w/api.php', params=req).json()
            if 'error' in result:
                raise Error(result['error'])
            if 'warnings' in result:
                print(result['warnings'])
            if 'query' in result:
                yield result['query']
            if 'continue' in result:
                previousContinue = {'gplcontinue': result['continue']['gplcontinue']}
            else:
                break
    
    count = 0        
    for result in query({'titles': 'Estelle Morris', 'gpllimit': '5'}):
        for url in [_['fullurl'] for _ in list(result.values())[0].values()]:
            print (url)
    

    I mentioned in my first answer that, if the OP wanted to do something similar with artificial intelligence then he should begin with ‘Artificial intelligence’ — noting the capitalisation. Otherwise the search would start with a disambiguation page and all of the complications that could arise with those.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search