In this article EB is saying that this
'{"oranges": 2, "apples": 6, "pears": 5}'
is an unordered set in JSON, while this s-expression
'((ORANGES 2) (APPLES 6) (PEARS 5))
is a nested list, but apparently not an unordered set. He says
Yep, this [the Lisp s-expression] has less punctuation, which is nice. However, now suppose I
want my data to be not just a nested list, but rather a mapping of
fruit names to numbers. And I want it to be an efficient mapping, too.
In other words, I want the JSON'{"oranges": 2, "apples": 6, "pears": 5}'
He goes on to distinguish between ordered and unordered collections. What about the JSON makes it an unordered set, or, as he says, a mapping between fruit and numbers? Why, how is the s-expression lacking? Then there is this discussion on the article, which I wasn’t following so well either.
Further, EB uses this
#S(HASH-TABLE :TEST FASTHASH-EQL (ORANGES . 2) (APPLES . 6) (PEARS . 5))
when I, a beginner, have only seen this sort of thing for hash (emacs lisp example):
(setq hash-of-countries (make-hash-table :test 'equal))
(puthash "Mexico" "MX" hash-of-countries)
(puthash "United States" "USA" hash-of-countries)
What does this #S(HASH-TABLE...
mean? And in general, why is he down on s-expressions versus JSON? Any clearing up would be appreciated.
2
Answers
JSON is just a textual representation of a data structure, using Javascript inspired notation. A JSON object is, by definition, an unordered collection of key/value pairs.
{"oranges": 2, "apples": 6, "pears": 5}
is equivalent to{"oranges": 2, "pears": 5, "apples": 6}
. How the written out object is mapped to a given language’s internal data structures depends on the particular JSON parser and the language’s conventions. In Common Lisp land, there are libraries that represent JSON object as hash tables, association lists, property lists, CLOS objects, structures, and more. It doesn’t fundamentally matter what as long as it’s some sort of key-value mapping.'((ORANGES 2) (APPLES 6) (PEARS 5))
is an association list (alist for short), though it’d be more commonly written using dotted pairs:'((ORANGES . 2) (APPLES . 6) (PEARS . 5))
. It’s a very common convention for a set of key-value pairs in Common Lisp – there are even standard functions for working with such lists. They might even be more efficient than hash tables when dealing with a small number of keys. You see them a lot because CL doesn’t have standard syntax for literal hash tables (Unlike some other Lisp-family languages like Racket and Clojure). Order only matters if there are duplicates of the same key in an alist – having a single instance of each key is not enforced the way it is for, say, hash tables, just a convention.#S(HASH-TABLE :TEST FASTHASH-EQL (ORANGES . 2) (APPLES . 6) (PEARS . 5))
is just some pseudo-code the author of that article came up with as a "what if" it did have literal hash tables. It appears to be based on the syntax for reading literal structures, but isn’t quite the same. (Maybe there’s some implementation that supports it as an extension?)The author’s preference for JSON over a S-expression representation of the same data structure is just that – personal preference. For exchanging data between services, especially over web APIs, JSON is a de facto standard, but how programs represent it for their own use – say in a configuration or data file – is much more opinion based and thus off topic for Stack Overflow.
This is dangerously close to an opinion-based question and answer, so I will keep it factual rather than offering criticism of the article.
There are two distinct issues here:
For serializing.
Some languages have textual representations for some of their data types which can be used to serialize them painlessly. JavaScript is one such, Lisp (specifically Common Lisp here) is another. The types for which they have standard serialized representations, and whether you want to map the serialized representation to the ‘obvious’ type at all are entirely up to you. In the case of s-expressions an obvious approach would be to indicate the intention of the data by tagging it somehow: if you want an association-list (see below) you’d serialize it as
(alist <representation of alist>)
and if you want a mapping you’d serialize it as(mapping <representation of mapping>)
. Or something.It is obvious that, in a language with more than a few data types, and especially in a language where the set of types is not fixed (Common Lisp is such a language, as is C, Python, …) it is never possible to have a fixed mapping between data types and their serialized form, or even whether they can be serialized at all. Rather you almost certainly must rely on a much more limited serialization format which provides enough information to reconstruct the object.
So, for the actual differences between types, in Lisp.
Lisp’s
list
type is quite different than, for instance Python’s. Lisp lists are eithernil
.(<object> . <list>)
.As a special convenience, a list like
(x . (y . (z . ())))
is written(x y z)
and this applies everywhere, so((a . (b . ())) ...)
is written((a b) ...)
.Conses do not have to form lists:
(a . b)
is a fine cons, but it’s not a list. Things like this are often called improper lists, with the list type I defined above called proper lists.In other words, lists in Lisp are singly-linked lists, not extensible arrays of some kind.
So consider this list
A list like this can be used to represent a mapping between keys and values: you can walk along the list until you find the key you care about. In Lisp this is an association list (often called ‘alist’). There is no special ‘association list’ type: it’s just a convention that has grown up.
Such a list can be used as a poor-persons table. It’s a poor-persons table because finding a key in it requires searching the list, and thus takes time proportional to the number of keys in the list. And there is nothing in the structure which prevents duplicate keys. If there are duplicate keys then searching will generally find the first one: the order of the keys matters.
So the alist
((a b) (a c) ...)
is different than((a c) (a b) ...)
, but both are legitimate alist.The dependence on ordering is for many purposes an advantage! Because it means that you can very cheaply temporarily shadow a mapping in such a table, without having to make an expensive copy of the entire table. Many programs have taken advantage of this.
Common Lisp has another data type, which is more akin to JavaScript’s ‘object’: hash tables. This type is a mapping from keys to values where
Hash tables have no standard serialized representation in CL. That’s partly because they’re rather rich objects: you can specify things like the test for equality that keys must satisfy and so on. It’s perhaps also because people just did not think of how they should be serialized. In particular
#S(...)
is not a standard representation for hash tables.Common Lisp allows you to extend its textual representation for objects. It would be easy for instance to write a program which read hash tables as
#H(:test ... :data ...)
or whatever. However that’s not generally useful for a serialization format common between languages (or even between different CL programs where only one may have defined the extended syntax). Better would be to agree on some format, as I mentioned above, which tags things explicitly.In summary: