I’m working with RDF data stored in a PostgreSQL database that was added using the RDFLib-SQLAlchemy library. While querying the asserted_statements table using SQL, I noticed that some objects and subjects have IDs that start with the letter "N".
Here’s a snippet of my SQL query and the results:
SELECT a.subject, a.predicate, a.object, b.subject, b.predicate, b.object
FROM public.kb_d5c47fc464_asserted_statements a
JOIN public.kb_d5c47fc464_asserted_statements b
ON a.object = b.subject
WHERE a.object LIKE 'N%' AND b.subject LIKE 'N%' AND a.subject LIKE 'http%'
ORDER BY a.id ASC;
Sample Data Output:
subject | predicate | object |
---|---|---|
http://purl.obolibrary.org/obo/BFO_0000062 | http://www.w3.org/2002/07/owl#propertyChainAxiom | N160ea22f83814f728990ceaafb6fbc43 |
http://purl.obolibrary.org/obo/BFO_0000062 | http://www.w3.org/2002/07/owl#propertyChainAxiom | N1cb51000d673480fb7bff6975709ab97 |
I’m curious about the significance of these IDs starting with N. Are they generated by RDFLib or related to blank nodes? What role do they play in the RDF structure, and how should I interpret them?
Any insights into why these IDs are being used and their purpose would be helpful.
2
Answers
The rest of the id looks like a UUID (version 4, variant 1) encoded as hex without dashes.
Looking through the source for UUID leads to
BNode.__new__
which contains:where
value
becomes the node id and_prefix
defaults to_unique_id()
(source)What is this unique id?
It is the letter ‘N’!
(source) (alternative source)
Those nodes starting with letter N are just blank nodes. See them as existential variables as you cant reuse them in further sparql queries. Although you can use them inside rdflib, eg:
They are helpful, when you need to compare two graphs and you need some nodes without a name, see eg
rdflib.compare
.