I am creating a basic JSON string parser and I’ve run into a situation.
I’ve been using the following RFC.
RFC 8259: The JavaScript Object Notation (JSON) Data Interchange Format.
In Section 3 – Values, it defines that a value is as follows.
value = false / null / true / object / array / number / string
And, Section 4 – Objects defines an object.
object = begin-object [ member *( value-separator member ) ]
end-object
member = string name-separator value
Thus, in my code I created the following class structures.
private static class Value {
private java.lang.Object object;
@Override
public String toString() {
String string;
if (object instanceof Boolean) string = (boolean) object ? "true" : "false";
else if (object instanceof BigDecimal) string = ((BigDecimal) object).toPlainString();
else if (object instanceof String) string = (String) object;
else string = object.toString();
return string;
}
}
private static class Object {
List<Member> list = new ArrayList<>();
private static class Member {
private String name;
private List<Value> value;
@Override
public String toString() {
return "{" + name + ": " + value + "}";
}
}
@Override
public String toString() {
return list.toString();
}
}
I am attempting to determine if there is a more logical way to do this.
Is there a better way to contain each of those types, and then capture their content accordingly.
I thought of just storing them all as String values, although an object has less precedence than a value.
Therefore, there would not be any difference in what I currently have implemented.
I feel inheritance would prove abstruse and lacking, since it won’t ever scale beyond this.
The values will always have the forms described.
Essentially, I want to have the following pseudo-code as an end-point.
And, the error I encounter is that the user will have to decouple an array of JSON objects.
To provide them with Java Objects seems daunting,
parser.get("key").asString();
parser.get(0).asArray();
parser.get(1).asObject();
parser.findObject("key").asObject();
parser.findNumber(123).asNumber();
Is there some sort of concept, or design, that can be used to harness a set of uncorrelated values?
How can I offer the JSON object, or array, to the user, in a way that won’t require them to cast a Java Object?
2
Answers
The easiest way to implement this in Java (similar to all the big libs), is an inheritance tree of object types.
Optionally, add casting methods as you described onto
Value
. Checking the type of aValue
is now a type level operation, so you can useinstanceof
.In broad strokes you have 3 options.
But before we go through those, there are perfectly fine JSON parsing libraries out there. The standard
org.json
one is extremely hard to use and not at all recommended, but there’s GSON and Jackson-json. Open source, very widely used, and do all or most of this stuff already. Not sure why you want to reinvent this wheel. But, in the spirit of the question:Let’s work with an example:
Just cast em
This is what the org.json library does, and results in excessively verbose and hard to write code.
Your
json.parse
method returns aJsonObject
, which is an abstract class and not particularly useful until you instanceof check what it actually is and cast from there.If as a user of this library you know pretty much exactly how that data is structured and what you’re looking for, in old timey java, you’d write:
which surely needs no further debate about how ridiculous that is.
With patterned switches, you have a few more options but it really doesn’t get any better.
This style of parsing is considerably less convoluted if you do not know what you’re looking for; for example, if you’re writing a JSON UI widget that just renders it, this really isn’t that bad.
The key problem
The problem is that JSON is untyped, but in java, we really need those types so that the parser can provide the data in the way the library user wants it. This means the structure and types of the JSON need to be provided externally.
Marshalling
One obvious way to do it, is by using simple java classes or records to represent this stuff. The library would work as follows:
Field names are public, record constructors are well defined, field types are public and ‘reified’ (you can look up the part in the
<>
, that is not erased). This complicates the library as it has to go on a reflective spree to bind it all together but it can be done, and this is something popular JSON libraries such as GSON and Jackson offer. The same system can be used to turn an instance into a JSON. Some obvious concerns:You really need a library of custom parsers; often
LocalDate x;
should just work for some data in your JSON consisting of{"x": "1970-12-31"}
– GSON and co let you add ‘parsers’ and the like which are handed some JsonValue and a type (e.g. LocalDate), and this lets you handle such things. That does make it more complicated. You hardcode a few parsers (certainly all number types, string, and list, possibly (hash)map, need custom libraries).You want the ability to add parse or format hints, such as wishing for a
long
to nevertheless be rendered as a string (note that if you send your JSON to javascript, or even just get it parsed elsewhere given that JSON is defined in terms of javascript, all numbers are effectivelydouble
, so the usual problems with doubles and rounding occurs. Trying to store a very large id, e.g. above 2^52, in a JSON number is a really bad idea as a consequence – whatever you send it to is extremely likely to round that). Annotations with RUNTIME level retention can do that.Marshalling is entirely useless if the user of the library doesn’t know the structure of the JSON you’re working with (such as when writing a ‘JSON viewer’ UI widget). This is why GSON and Jackson offer both this and a more ‘just cast-em’ like library.
Replace
get()
withasInt()
and friends.Instead of forcing the user to check what a thing is (with
instanceof
), or cast a thing to what they know it is, make methods for each type instead:There is no
JsonNumber
,JsonBoolean
, etcetera – instead there’s justJson
(though you may have package-private subtypes that implement each JSON principal data type)..get()
just returns a wrapper object with the ‘path’ encoded in it (you don’t want ‘pathing’ into non-existent spaces to throw exceptions), onlyasX()
does a look up and these methods come in many variants (asBoolean
,asStringList
, and so on), each variant overloaded: A no-args variant that throws if the path you are in doesn’t exist. And an argsed variant that takes a default that is returned if the path does not exist. If the path does exist but the value you find there is fundamentally incompatible, it’s a bit of a design decision if that means the default should be returned, or an exception should be thrown (e.g. if you call.asInt(100)
and the value there is"foobar"
, what now)?Same ‘what now?’ design question arises if e.g. the input json has a
"director"
k/v pair but its value is simply ’18’, instead of a JSON object that contains a key named"dob"
.The point is, the vast majority of JSON out there tends to have known structure (known to the library user), but often omits data that cannot be supplied or isn’t relevant, and usually you want to handle that by going with a default value. This path solution is compatible with that idea, and is highly efficient (in the ‘simple to write and understand, succinct code’ sense), if you know what you’re looking for.
You can make an explicit
JsonValue
type hierarchy and add anasValue()
method to cater to the ‘UI widget that can view arbitrary JSON’ use case if you need this.