skip to Main Content

I want to map structured data (microdata, jsonld) extracted from html text into a Java POJO. For extraction I use the library Apache Any23 and configured a JSONLDWriter to convert the structured data found in the html document into json-ld format.

This works as expected an gives me the following output:

[ {
  "@graph" : [ {
    "@id" : "_:node1gn1v4pudx1",
    "@type" : [ "http://schema.org/JobPosting" ],
    "http://schema.org/datePosted" : [ {
      "@language" : "en-us",
      "@value" : "Wed Jan 11 02:00:00 UTC 2023"
    } ],
    "http://schema.org/description" : [ {
      "@language" : "en-us",
      "@value" : "Comprehensive Job Description"
    } ],
    "http://schema.org/hiringOrganization" : [ {
      "@language" : "en-us",
      "@value" : "Org AG"
    } ],
    "http://schema.org/jobLocation" : [ {
      "@id" : "_:node1gn1v4pudx2"
    } ],
    "http://schema.org/title" : [ {
      "@language" : "en-us",
      "@value" : "Recruiter (m/f/d)n    "
    } ]
  }, {
    "@id" : "_:node1gn1v4pudx2",
    "@type" : [ "http://schema.org/Place" ],
    "http://schema.org/address" : [ {
      "@id" : "_:node1gn1v4pudx3"
    } ]
  }, {
    "@id" : "_:node1gn1v4pudx3",
    "@type" : [ "http://schema.org/PostalAddress" ],
    "http://schema.org/addressCountry" : [ {
      "@language" : "en-us",
      "@value" : "Company Country"
    } ],
    "http://schema.org/addressLocality" : [ {
      "@language" : "en-us",
      "@value" : "Company City"
    } ],
    "http://schema.org/addressRegion" : [ {
      "@language" : "en-us",
      "@value" : "Company Region"
    } ]
  }, {
    "@id" : "https://career.company.com/job/Recruiter/",
    "http://www.w3.org/1999/xhtml/microdata#item" : [ {
      "@id" : "_:node1gn1v4pudx1"
    } ]
  } ],
  "@id" : "https://career.company.com/job/Recruiter/"
} ]

Next I want to deserialize the json-ld object into a Java bean using jackson. The POJO class should look somthing like this:

public class JobPosting {
    private String datePosting;
    private String hiringOrganization;
    private String title;
    private String description;

    // Following members could be enclosed in a class too if easier
    // Like class Place{private PostalAddress postalAddress;}
    // private Place place;
    private String addressCountry;
    private String addressLocality;
    private String addressRegion;
}

I would like to do it with annotations provided by Jackson lib but I struggle with a few things:

  • The @type value wrapped with an array node
  • The actual data has an extra @value layer
  • And some objects only hold a reference to other objects in the graph via @id fields

How can I map these fields to my Java Pojo properly?

3

Answers


  1. Chosen as BEST ANSWER

    The trick is to process the json-ld with a json-ld processor to get a more developer friendly json. The titanium-json-ld library provides such processors.

    JsonDocument input = JsonDocument.of(jsonLdAsInputStream);
    JsonObject frame = JsonLd.frame(input, URI.create("http://schema.org")).get();
    

    The above code snippet resolves references via @id and resolves json keys with the given IRI.
    That leads to the following output which is easy to parse via Jackson lib:

    [{
      "id": "_:b0",
      "type": "JobPosting",
      "datePosted": {
        "@language": "en-us",
        "@value": "Wed Jan 11 02:00:00 UTC 2023"
      },
      "description": {
        "@language": "en-us",
        "@value": "Comprehensive Job Description"
      },
      "hiringOrganization": {
        "@language": "en-us",
        "@value": "Org AG"
      },
      "jobLocation": {
        "id": "_:b1",
        "type": "Place",
        "address": {
          "id": "_:b2",
          "type": "PostalAddress",
          "addressCountry": {
            "@language": "en-us",
            "@value": "Company Country"
          },
          "addressLocality": {
            "@language": "en-us",
            "@value": "Company City"
          },
          "addressRegion": {
            "@language": "en-us",
            "@value": "Company Region"
          }
        }
      },
      "title": {
        "@language": "en-us",
        "@value": "Recruiter (m/f/d)n    "
      }
    }]
    

  2. Looking the elements you are interested in the json (for example the "datePosted", "hiringOrganization" values) they are always labelled by "@value" and included in the array corresponding to their names (in this case "http://schema.org/datePosted" and "http://schema.org/hiringOrganization". These are all contained in a part of your json file that can be converted to a JsonNode node that can be obtained in the following way:

    JsonNode root = mapper.readTree(json)
                          .get(0)
                          .get("@graph")
                          .get(0);
    

    So if you have a pojo like below:

    @Data
    public class JobPosting {
    
        private String datePosted;
        private String hiringOrganization;
    }
    

    and you want to retrieve the datePosted and hiringOrganization values you can check that the relative position is still the same in the json file and can be calculated in a for loop:

    JsonNode root = mapper.readTree(json)
                                   .get(0)
                                   .get("@graph")
                                   .get(0);
    
    String strSchema = "http://schema.org/";
    String[] fieldNames = {"datePosted", "hiringOrganization"};
    //creating a Map<String, String> that will be converted to the JobPosting obj
    Map<String, String> map = new HashMap<>();
            for (String fieldName: fieldNames) {
                map.put(fieldName, 
                        root.get(strSchema + fieldName)
                            .get(0)
                            .get("@value")
                            .asText()
                );
            }
      
    JobPosting jobPosting = mapper.convertValue(map, JobPosting.class);
    //it prints JobPosting(datePosted=Wed Jan 11 02:00:00 UTC 2023, hiringOrganization=Org AG)
    System.out.println(jobPosting);
    
    Login or Signup to reply.
  3. This would require some preprocessing first to turn your graph with id pointers into a simplified tree before mapping it with Jackson:

    1. Turn it into a tree by replacing the @id references with the actual objects themselves.
    2. Flatten those troublesome object/array wrappers around @value.

    Full code below, using Java 17 and a bit of recursion:

    package org.example;
    
    import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
    import com.fasterxml.jackson.annotation.JsonProperty;
    import com.fasterxml.jackson.annotation.JsonSubTypes;
    import com.fasterxml.jackson.annotation.JsonTypeInfo;
    import com.fasterxml.jackson.core.type.TypeReference;
    import com.fasterxml.jackson.databind.ObjectMapper;
    
    import java.io.File;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    import java.util.Map.Entry;
    
    import static java.util.stream.Collectors.toMap;
    
    class Main {
    
      public static void main(String[] args) throws Exception {
        var mapper = new ObjectMapper();
        var node = mapper.readValue(new File("test.json"), Object.class);
    
        // Build a lookup map of "@id" to the actual object.
        var lookup = buildLookup(node, new HashMap<>());
    
        // Replace "@id" references with the actual objects themselves instead
        var referenced = lookupReferences(node, lookup);
    
        // Flattens single object array containing "@value" to be just the "@value" themselves
        var flattened = flatten(referenced);
    
        // Jackson should be able to under our objects at this point, so convert it
        var jobPostings =
            mapper.convertValue(flattened, new TypeReference<List<RootObject>>() {}).stream()
                .flatMap(it -> it.graph().stream())
                .filter(it -> it instanceof JobPosting)
                .map(it -> (JobPosting) it)
                .toList();
    
        System.out.println(jobPostings);
      }
    
      private static Map<String, Object> buildLookup(Object node, Map<String, Object> lookup) {
        if (node instanceof List<?> list) {
          for (var value : list) {
            buildLookup(value, lookup);
          }
        } else if (node instanceof Map<?, ?> map) {
          for (var value : map.values()) {
            buildLookup(value, lookup);
          }
          if (map.size() > 1 && map.get("@id") instanceof String id) {
            lookup.put(id, node);
          }
        }
        return lookup;
      }
    
      private static Object lookupReferences(Object node, Map<String, Object> lookup) {
        if (node instanceof List<?> list
            && list.size() == 1
            && list.get(0) instanceof Map<?, ?> map
            && map.size() == 1
            && map.get("@id") instanceof String id) {
          return lookupReferences(lookup.get(id), lookup);
        }
    
        if (node instanceof List<?> list) {
          return list.stream().map(value -> lookupReferences(value, lookup)).toList();
        }
    
        if (node instanceof Map<?, ?> map) {
          return map.entrySet().stream()
              .map(entry -> Map.entry(entry.getKey(), lookupReferences(entry.getValue(), lookup)))
              .collect(toMap(Entry::getKey, Entry::getValue));
        }
    
        return node;
      }
    
      private static Object flatten(Object node) {
        if (node instanceof List<?> list && list.size() == 1) {
          if (list.get(0) instanceof String s) {
            return s;
          }
          if (list.get(0) instanceof Map<?, ?> map) {
            var value = map.get("@value");
            if (value != null) {
              return value;
            }
          }
        }
    
        if (node instanceof List<?> list) {
          return list.stream().map(Main::flatten).toList();
        }
    
        if (node instanceof Map<?, ?> map) {
          return map.entrySet().stream()
              .map(entry -> Map.entry(entry.getKey(), flatten(entry.getValue())))
              .collect(toMap(Entry::getKey, Entry::getValue));
        }
    
        return node;
      }
    }
    
    @JsonIgnoreProperties(ignoreUnknown = true)
    record RootObject(@JsonProperty("@graph") List<GraphObject> graph) {}
    
    @JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "@type", defaultImpl = Ignored.class)
    @JsonSubTypes({
      @JsonSubTypes.Type(value = JobPosting.class, name = "http://schema.org/JobPosting"),
      @JsonSubTypes.Type(value = Place.class, name = "http://schema.org/Place"),
      @JsonSubTypes.Type(value = PostalAddress.class, name = "http://schema.org/PostalAddress"),
    })
    interface GraphObject {}
    
    @JsonIgnoreProperties(ignoreUnknown = true)
    record Ignored() implements GraphObject {}
    
    @JsonIgnoreProperties(ignoreUnknown = true)
    record JobPosting(
        @JsonProperty("http://schema.org/title") String title,
        @JsonProperty("http://schema.org/description") String description,
        @JsonProperty("http://schema.org/hiringOrganization") String hiringOrganization,
        @JsonProperty("http://schema.org/datePosted") String datePosted,
        @JsonProperty("http://schema.org/jobLocation") Place jobLocation)
        implements GraphObject {}
    
    @JsonIgnoreProperties(ignoreUnknown = true)
    record Place(@JsonProperty("http://schema.org/address") PostalAddress address)
        implements GraphObject {}
    
    @JsonIgnoreProperties(ignoreUnknown = true)
    record PostalAddress(
        @JsonProperty("http://schema.org/addressLocality") String locality,
        @JsonProperty("http://schema.org/addressRegion") String region,
        @JsonProperty("http://schema.org/addressCountry") String country)
        implements GraphObject {}
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search