Incidentally, decoding JSON data (or really, almost any data structure) is
really easy in Go (golang). We simply call json.Unmarshal(…)
and
boom! We have nice data structures.
Well, except if our input source is not very well defined (meaning not strictly typed).
Objects with loose schema
Take this example. We want to decode a JSON object that looks like this:
The usual way to go is to decode it into a struct:
That’s fairly easy. But what happens if suddenly we add a new data source that uses numeric author IDs instead of emails? For example, we might have an input stream that looks like this:
Decoding to interface{}
An quick & easy fix is to decode the author
field to an interface{}
and
then do a type switch. Something like this:
That was easy… except, it doesn’t work. What happens when our IDs get close to
264-1? Their precision will not fit in a float64
, so our decoder
will end up rounding some IDs. Too bad.
Decoder.UseNumber()
to the rescue!
Luckily there’s an easy way to fix this: by calling Decoder.UseNumber()
.
“UseNumber causes the Decoder
to unmarshal a number into an interface{}
as
a Number
instead of as a float64
” — from the docs.
Now our previous example would look something like this:
Seems fine, now, right? Nope! This will still fail for numbers >
263, as they would overflow the int64
.
No we see that if we want to decode a JSON a number into an uint64
, we really
have ta call Decoder.Decode(…)
(or json.Unmarshal(…)
) with a
*uint64
argument (a pointer to a uint64
). We could do that simply by
directly decoding the string representation of the number. Instead of:
…we could write:
Wait… Let’s use json.RawMessage
instead.
Now we’re correctly decoding large numbers into uint64
. But now we’re also
just using the json.Number
type to delay decoding of a particular value. To
do that, the json
package provides a more powerful type:
json.RawMessage
. RawMessage
simply delays the decoding of part of a
message, so we can do it ourselves later. (We can also use it to special-case
encoding of a value.)
Here is our example, using json.RawMessage
:
This looks better. Now we can even extend it to accept more schemas. Say we want to accept a third format:
It seems obvious that we are going to need an Author
type. Let’s define one.
This looks fine… Except that now we’re doing all the decoding of the Author
type in the function that decodes the Record
object. And we can see that with
time, our Record
object’s decoder will grow bigger and bigger. Wouldn’t it be
nice if the Author
type could somehow decode itself?
Behold, the json.Unmarshaler
interface!
Implement that by any type, and the json
package will use that to unmarshal
your object.
Let’s move the decode logic to the Author
struct:
Much better. Now that the Author
object knows how to decode itself, we don’t
have to worry about it any more (and we can extend Author.UnmarshalJSON
when
we want to support extra schemas, e.g. username or email).
Furthermore, now that Record objects can be decoded without any additional work, we can move one more level higher:
You can go play with this.
NOTE: Thanks to Riobard Zhan for pointing out a mistake in the
previous version of this article. The reason I have two types above,
Author
and author
, is to avoid an infinite recursion when unmarshalling
into an Author
instance. The private author
type is used to trigger the
built-in JSON unmarshal machinery, while the exported Author
type is used to
implement the json.Unmarshaler
interface. The trick with the conversion near
the top of the Unmarshal
is used to avoid the recursion.
What about encoding?
Let’s say we want to normalise all these data sources in our API and always
return the author
field as an object. With the above implementation, we don’t
have to do anything: re-encoding records will normalise all objects for us.
However, we might want to save some bandwidth by not sending defaults. For
that, we can tag our fields with json:",omitempty"
:
Now 1234
will be turned into {"id":1234}
, "attilaolah@gmail.com"
to
{"email":"attilaolah@gmail.com"}
, and
{"id":1234,"email":"attilaolah@gmail.com"}
will be left intact when
re-encoding objects.
Using json.Marshaler
For encoding custom stuff, there’s the json.Marshaler
interface. It’s
works similarly to json.Unmarshaler
. You implement it, and the json
package
uses it.
Let’s say that we want to save some bandwidth, and always transfer the minimal
information required to reconstruct the objects by the json.Unmarshaler
. We
could implement something like this:
Now 1234
, "attilaolah@gmail.com"
and
{"id":1234,"email":"attilaolah@gmail.com"}
are left intact, but {"id":1234}
is turned into 1234
and {"email":"attilaolah@gmail.com"}
is turned into
"attilaolah@gmail.com"
.
Another way to do the same would be to have two types, one that always encodes
to an object (Author
), and one that encodes to the minimal representation
(CompactAuthor
):
Using pointers
We see now that the omitempty
tag is pretty neat. But we can’t use it with
the author
field, because json
can’t tell if an Author
object is “empty”
or not (for json
, a struct is always non-empty).
To fix that, we can turn the author
field into an *Author
instead. Now
json.Unmarshal(…)
will leave that field nil
when the author information is
completely missing, and it will not include the field in the output when
Record.Author
is nil
.
Example:
Timestamps
time.Time
implements both json.Marshaler
and json.Unmarshaler
.
Timestamps are formatted as RFC3339.
However, it is important to remember that time.Time
is a struct type, hence
json
will never consider it “empty” (json
will not consult
Time.IsZero()
)
To omit zero timestamps with the omitempty
tag, use a pointer (i.e.
*time.Time
) instead.
Conclusion
Interfaces are awesome. Even more awesome than you probably think. And tags. Combine these two, and you have objects that you can expose through an API supporting various encoding formats. Let me finish with a simple yet powerful example:
Authors can now be encoded/decoded to/from a number of formats: