Visual Studio Code - Wikidata query with values from CSV-File

MrLibre
January 23, 2023
319 views
0 votes
2 Answers

I would like to do a Wikidata query of many values that are listed in a column of a CSV file on my computer.
How can I load the values from the CSV file into the Wikidata query automatically without copying them in manually?

So far I have worked with the Wikidata query in Visual Studio Code.

This is the query I made for one person:

SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF {"2467372"}
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}

I want to automatically load many values into the curly brackets of the query above from the column of my CSV file.

Tags: csv wikidata

Answers

So, say you have a file my_file.csv with the following content:

2467372
63468347
12447

First of all, import a python library for reading files (like fileinput).

Then declare the pattern that you want to use for your query, using %s as placeholder for the identifiers.

Now, build a list of identifiers as follows:

identifiers = ['wd:'+line.strip() for line in fileinput.input(files='my_file.csv')]

And finally join the list using a space character as separator and pass this string to your query pattern:

query = query_pattern % ' '.join(identifiers)

This is the final code:

import fileinput

filename = 'my_file.csv'
query_pattern = '''SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF { %s }
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}'''

identifiers = ['"'+line.strip()+'"' for line in fileinput.input(files=filename)]
query = query_pattern % ' '.join(identifiers)
print(query)

Executing it, you’ll get:

SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF { "2467372" "63468347" "12447" }
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .

   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }
}

- TomMorris
- January 24, 2023 at 12:45 am
- 0 votes
0
First, I feel compelled to point out that if you don’t already know a programming language OpenRefine can do this for you in a few clicks.

Having said that, here’s a basic Python program that accomplishes what you literally asked for – reading a set of VIAF ids and adding them to your query:
```
import csv


def expand_query(ids):
    query = """
    SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation ?birthLocationLabel WHERE {
        VALUES ?VIAF {
        """ + '"' + '" "'.join(ids) + '"' """
        }
        ?Author wdt:P214 ?VIAF. 
        OPTIONAL { ?Author wdt:P19 ?birthLocation. }
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de,en". }
    }
    """
    return query


def main():
    with open('../data/authors.csv', "rt") as csvfile:
        csvreader = csv.DictReader(csvfile, dialect=csv.excel)
        ids = [row["viaf"] for row in csvreader]
        print(expand_query(ids))


if __name__ == "__main__":
    main()
```
It expects a CSV file with a column called viaf and will ignore all other columns. e.g.
```
name,viaf
Douglas Adams,113230702
William Shakespeare,96994048
Bertolt Brecht,2467372
```
I’ve tweaked the query slightly to:
- always output a row even if the birth location isn’t available
- output the label for the birth location
- add English as an additional fallback language for labels
This makes the assumption that you’ve got a small enough set of identifiers to be able to use a single query, but you can extended it to:
- read identifiers in batchs of a convenient size
- use SPARQLwrapper to send the results to the Wikidata SPARQL endpoint and parse the results
- write the results to a different CSV file in chunks as they’re received
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Visual Studio Code – Wikidata query with values from CSV-File

Answers