I want to map structured data (microdata, jsonld) extracted from html text into a Java POJO. For extraction I use the library Apache Any23 and configured a JSONLDWriter
to convert the structured data found in the html document into json-ld
format.
This works as expected an gives me the following output:
[ {
"@graph" : [ {
"@id" : "_:node1gn1v4pudx1",
"@type" : [ "http://schema.org/JobPosting" ],
"http://schema.org/datePosted" : [ {
"@language" : "en-us",
"@value" : "Wed Jan 11 02:00:00 UTC 2023"
} ],
"http://schema.org/description" : [ {
"@language" : "en-us",
"@value" : "Comprehensive Job Description"
} ],
"http://schema.org/hiringOrganization" : [ {
"@language" : "en-us",
"@value" : "Org AG"
} ],
"http://schema.org/jobLocation" : [ {
"@id" : "_:node1gn1v4pudx2"
} ],
"http://schema.org/title" : [ {
"@language" : "en-us",
"@value" : "Recruiter (m/f/d)n "
} ]
}, {
"@id" : "_:node1gn1v4pudx2",
"@type" : [ "http://schema.org/Place" ],
"http://schema.org/address" : [ {
"@id" : "_:node1gn1v4pudx3"
} ]
}, {
"@id" : "_:node1gn1v4pudx3",
"@type" : [ "http://schema.org/PostalAddress" ],
"http://schema.org/addressCountry" : [ {
"@language" : "en-us",
"@value" : "Company Country"
} ],
"http://schema.org/addressLocality" : [ {
"@language" : "en-us",
"@value" : "Company City"
} ],
"http://schema.org/addressRegion" : [ {
"@language" : "en-us",
"@value" : "Company Region"
} ]
}, {
"@id" : "https://career.company.com/job/Recruiter/",
"http://www.w3.org/1999/xhtml/microdata#item" : [ {
"@id" : "_:node1gn1v4pudx1"
} ]
} ],
"@id" : "https://career.company.com/job/Recruiter/"
} ]
Next I want to deserialize the json-ld object into a Java bean using jackson. The POJO class should look somthing like this:
public class JobPosting {
private String datePosting;
private String hiringOrganization;
private String title;
private String description;
// Following members could be enclosed in a class too if easier
// Like class Place{private PostalAddress postalAddress;}
// private Place place;
private String addressCountry;
private String addressLocality;
private String addressRegion;
}
I would like to do it with annotations provided by Jackson lib but I struggle with a few things:
- The
@type
value wrapped with an array node - The actual data has an extra
@value
layer - And some objects only hold a reference to other objects in the graph via
@id
fields
How can I map these fields to my Java Pojo properly?
3
Answers
The trick is to process the json-ld with a json-ld processor to get a more developer friendly json. The titanium-json-ld library provides such processors.
The above code snippet resolves references via @id and resolves json keys with the given IRI.
That leads to the following output which is easy to parse via Jackson lib:
Looking the elements you are interested in the json (for example the "datePosted", "hiringOrganization" values) they are always labelled by "@value" and included in the array corresponding to their names (in this case "http://schema.org/datePosted" and "http://schema.org/hiringOrganization". These are all contained in a part of your json file that can be converted to a
JsonNode
node that can be obtained in the following way:So if you have a pojo like below:
and you want to retrieve the datePosted and hiringOrganization values you can check that the relative position is still the same in the json file and can be calculated in a for loop:
This would require some preprocessing first to turn your graph with id pointers into a simplified tree before mapping it with Jackson:
@id
references with the actual objects themselves.@value
.Full code below, using Java 17 and a bit of recursion: