skip to Main Content

I would like to do a Wikidata query of many values that are listed in a column of a CSV file on my computer.
How can I load the values from the CSV file into the Wikidata query automatically without copying them in manually?

So far I have worked with the Wikidata query in Visual Studio Code.

This is the query I made for one person:

SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF {"2467372"}
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}

I want to automatically load many values into the curly brackets of the query above from the column of my CSV file.

2

Answers


  1. So, say you have a file my_file.csv with the following content:

    2467372
    63468347
    12447
    

    First of all, import a python library for reading files (like fileinput).

    Then declare the pattern that you want to use for your query, using %s as placeholder for the identifiers.

    Now, build a list of identifiers as follows:

    identifiers = ['wd:'+line.strip() for line in fileinput.input(files='my_file.csv')]
    

    And finally join the list using a space character as separator and pass this string to your query pattern:

    query = query_pattern % ' '.join(identifiers)
    

    This is the final code:

    import fileinput
    
    filename = 'my_file.csv'
    query_pattern = '''SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
       WHERE {
       VALUES ?VIAF { %s }
       ?Author wdt:P214 ?VIAF ;
         wdt:P19 ?birthLocation .
               
       SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
    }'''
    
    identifiers = ['"'+line.strip()+'"' for line in fileinput.input(files=filename)]
    query = query_pattern % ' '.join(identifiers)
    print(query)
    

    Executing it, you’ll get:

    SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
       WHERE {
       VALUES ?VIAF { "2467372" "63468347" "12447" }
       ?Author wdt:P214 ?VIAF ;
         wdt:P19 ?birthLocation .
    
       SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }
    }
    
    Login or Signup to reply.
  2. First, I feel compelled to point out that if you don’t already know a programming language OpenRefine can do this for you in a few clicks.

    Having said that, here’s a basic Python program that accomplishes what you literally asked for – reading a set of VIAF ids and adding them to your query:

    import csv
    
    
    def expand_query(ids):
        query = """
        SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation ?birthLocationLabel WHERE {
            VALUES ?VIAF {
            """ + '"' + '" "'.join(ids) + '"' """
            }
            ?Author wdt:P214 ?VIAF. 
            OPTIONAL { ?Author wdt:P19 ?birthLocation. }
            SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de,en". }
        }
        """
        return query
    
    
    def main():
        with open('../data/authors.csv', "rt") as csvfile:
            csvreader = csv.DictReader(csvfile, dialect=csv.excel)
            ids = [row["viaf"] for row in csvreader]
            print(expand_query(ids))
    
    
    if __name__ == "__main__":
        main()
    

    It expects a CSV file with a column called viaf and will ignore all other columns. e.g.

    name,viaf
    Douglas Adams,113230702
    William Shakespeare,96994048
    Bertolt Brecht,2467372
    

    I’ve tweaked the query slightly to:

    • always output a row even if the birth location isn’t available
    • output the label for the birth location
    • add English as an additional fallback language for labels

    This makes the assumption that you’ve got a small enough set of identifiers to be able to use a single query, but you can extended it to:

    • read identifiers in batchs of a convenient size
    • use SPARQLwrapper to send the results to the Wikidata SPARQL endpoint and parse the results
    • write the results to a different CSV file in chunks as they’re received
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search