Attila Oláh

SRE @ Google Zürich

  • 10 Sep 2014

    JSON and struct composition in Go

    Say you are decoding a JSON object into a Go struct. It comes from a service that is not under your control, so you cannot do much about the schema. However, you want to encode it differently.

    You could go wild with json.Marshaler, but it has some drawbacks:

    • complexity: adds lots of extra code for big structs
    • memory usage: must be careful not to do needless allocations

    To be fair, in most cases you can avoid allocations in your MarshalJSON(), but that may lead to even more complexity, which now sits in your code base (instead of encoding/json), so it’s your job to unit test it. And that’s even more boring code to write.

    Here are a few tricks to be used with big structs.

    Omitting fields

    Let’s say you have this struct:

    type User struct {
        Email    string `json:"email"`
        Password string `json:"password"`
        // many more fields…
    }

    What you want is to encode User, but without the password field. A simple way to do that with struct composition would be to wrap it in another struct:

    type omit *struct{}
    
    type PublicUser struct {
        *User
        Password omit `json:"password,omitempty"`
    }
    
    // when you want to encode your user:
    json.Marshal(PublicUser{
        User: user,
    })

    The trick here is that we never set the Password property of the PublicUser, and since it is a pointer type, it will default to nil, and it will be omitted (because of omitempty).

    Note that there’s no need to declare the omit type, we could have simply used *struct{} or even bool or int, but declaring the type makes it explicit that we’re omitting that field from the output. Which built-in type we use does not matter as long as it has a zero value that is recognised by the omitempty tag.

    We could have used only anonymous values:

    json.Marshal(struct {
        *User
        Password bool `json:"password,omitempty"`
    }{
        User: user,
    })

    Try it in the playground.

    Also note that we only include a pointer to the original User struct in our wrapper struct. This indirection avoids having to allocate a new copy of User.

    Adding extra fields

    Adding fields is even simpler than omitting. To continue our previous example, let’s hide the password but expose an additional token property:

    type omit *struct{}
    
    type PublicUser struct {
        *User
        Token    string `json:"token"`
        Password omit   `json:"password,omitempty"`
    }
    
    json.Marshal(PublicUser{
        User:  user,
        Token: token,
    })

    Try it in the playground.

    Composing structs

    This is handy when combining data coming from different services. For example, here’s a BlogPost struct that also contains analytics data:

    type BlogPost struct {
        URL   string `json:"url"`
        Title string `json:"title"`
    }
    
    type Analytics struct {
        Visitors  int `json:"visitors"`
        PageViews int `json:"page_views"`
    }
    
    json.Marshal(struct{
        *BlogPost
        *Analytics
    }{post, analytics})

    Try it in the playground.

    Splitting objects

    This is the opposite of composing structs. Just like when encoding a combined structs, we can decode into a combined struct and use the values separately:

    json.Unmarshal([]byte(`{
      "url": "attilaolah@gmail.com",
      "title": "Attila's Blog",
      "visitors": 6,
      "page_views": 14
    }`), &struct {
      *BlogPost
      *Analytics
    }{&post, &analytics})

    Try it in the playground.

    Renaming fields

    This one is a combination of removing fields and adding extra fields: we simply remove the field and add it with a different json: tag. This can be done with pointer indirection to avoid allocating memory, although for small data types the indirection overhead can cost the same amount of memory as it would cost to create a copy of the field, plus the runtime overhead.

    Here is an example where we rename two struct fields, using indirection for the nested struct and copying the integer:

    type CacheItem struct {
        Key    string `json:"key"`
        MaxAge int    `json:"cacheAge"`
        Value  Value  `json:"cacheValue"`
    }
    
    json.Marshal(struct{
        *CacheItem
    
        // Omit bad keys
        OmitMaxAge omit `json:"cacheAge,omitempty"`
        OmitValue  omit `json:"cacheValue,omitempty"`
    
        // Add nice keys
        MaxAge int    `json:"max_age"`
        Value  *Value `json:"value"`
    }{
        CacheItem: item,
    
        // Set the int by value:
        MaxAge: item.MaxAge,
    
        // Set the nested struct by reference, avoid making a copy:
        Value: &item.Value,
    })

    Try it in the playground.

    Note that this is only practical when you want to rename one or two fields in a big struct. When renaming all fields, it is often simpler (and cleaner) to just create a new object altogether (i.e. a serialiser) and avoid the struct composition.

    Related posts:

    • JSON decoding in Go
    • programming,
    • golang
  • 12 May 2014

    A simple but powerful zsh prompt

    Over the years, I’ve been changing my bash prompt every now and then. Since I switched to zsh last year, and started using oh-my-zsh, I’ve slowly put together the perfect prompt for my needs.

    Here’s how it looks right now (with extra-large font size for better visibility):

    zsh prompt

    Parts of the left prompt, from left to right:

    • 1z shows that there is one background job (vim), suspended with Ctrl+Z (hence the z) — this goes away if there are no background jobs
    • tp is the hostname, useful to tell apart ssh sessions
    • git:master shows that I’m in a git repo and that master is the currently active branch, this one is very useful
    • … after the git branch indicates that there are unstaged changes or newly added files — this goes away in a clean tree
    • ~/github.com/attilaolah/… is just the $PWD collapsed with ~ when applicable
    • $ shows that I’m not the root user
    • there’s a trailing space to make it a word boundary when selecting with the mouse

    There are spaces between these parts so that I can select them with a double-click, if I want to quickly navigate here, for example in another tmux window.

    Parts of the right prompt, from right to left:

    • 1:23:52 is the time, which is useful when I forget to prefix a long running command with time
    • = before the time indicates a non-zero exit status from the previous command

    I used to have git_prompt_status in the right prompt (that shows a summary of changes in the current repo), but it was making the terminal noticeably slower, which is not something I tolerate. Hitting enter in a terminal must feel instant.

    The source, if anyone likes it:

    ZSH_THEME_GIT_PROMPT_PREFIX=" %{$fg[blue]%}git%{$reset_color%}:%{$fg[red]%}"
    ZSH_THEME_GIT_PROMPT_SUFFIX="%{$reset_color%}"
    ZSH_THEME_GIT_PROMPT_DIRTY="%{$fg[yellow]%}…%{$reset_color%}"
    ZSH_THEME_GIT_PROMPT_CLEAN=""
    
    local prompt_jobs="%(1j.%{$fg[yellow]%}%j%{$reset_color%}%{$fg[red]%}z%{$reset_color%} .)"
    local prompt_host="%{$fg[cyan]%}%m%{$reset_color%}"
    local prompt_root="%(!.%{$fg_bold[red]%}#.%{$fg[green]%}$)%{$reset_color%}"
    
    local return_status="%{$fg[red]%}%(?..=)%{$reset_color%}"
    
    PROMPT='${prompt_jobs}${prompt_host}$(git_prompt_info) %~ ${prompt_root} '
    
    RPROMPT="${return_status}%*"
    • programming,
    • linux
  • 10 Jan 2014

    Quick security checklist

    This is intended to be a short list of things to check before you go publish a website or web app (or really, anything that interacts with a browser). It starts with the easy things and continues with less obvious stuff. It is in no way complete.

    Use HttpOnly cookies

    • Pretty much eliminates XSS-based session hijacking ✓
    • Easy to set up on most servers ✓
    • Does not completely eliminate XSS attacks ✗

    Always use a CSRF token

    • Pretty much eliminates CSRF ✓
    • Many frameworks support it out of the box ✓
    • No use against XSS ✗

    Always, ALWAYS escape all user input

    • Most decent template engines will do it automatically ✓
    • Eliminates XSS attacks ✓
    • People tend to forget it ✗

    I cannot stress enough this last item. It doesn’t matter that you use a CSRF token. The XSS attack vector will have access to it. It also doesn’t matter that you use an HttpOnly cookie. While an attacker cannot steal the cookie, they can still wreak havoc. They can do almost everything the user can do.

    Sometimes it may be less obvious what “user input” means. Recently I’ve found an XSS vulnerability in a website because they did not escape the file names of user-uploaded images stored in an Amazon S3 bucket. That data is coming from the database, so it may seem unnecessary to sanitise it. However, as long as the user can put that data in the database, it counts as user input.

    Use HTTPS

    • Trivial Might be tricky to set up
    • Costs $$$ (but then what doesn’t)

    Handling sensitive user data (or any user data for that matter; go ask your users which part of their data isn’t sensitive)? HTTPS is the way to serve that. Or really, anything you can. You lose some benefits, like caching by intermediate proxies, but most people are trying to avoid those caches anyway, not leverage them.

    I’ve heard a number of arguments against HTTPS, and they all sucked. You should just use a secure connection, period.

    Give out as little permission as possible

    When using a third party service like Amazon S3, if you’re generating upload tickets for the users (temporary credentials allowing them to upload data to S3), you want to restrict that data as much as possible.

    • Uploading images or videos? Restrict the Content-Type header. You don’t want someone to upload executable files cause download/run boxes to show up.
    • Jail each user/group/organisation to their own bucket/directory.

    Allow a user write access in /avatars/{userid} to upload /avatars/{userid}/128.jpg. Don’t grant them access to /avatars and allow them write access to /avatars/*.jpg.

    Hide as much of the data as possible

    This is not to be confused by hiding the infrastructure of your servers or client applications. It is OK if people know that you API runs on Heroku or some other platform. It is OK if they know that you use a certain programming language or framework or library.

    However, it is not good to let people find out more and more information by providing some. Don’t show a user’s phone number for anyone who knows their email (or vice versa). When rejecting a failed login or a password reset request, don’t tell whether you have a user with such username or email in your database.

    I’ve seen people focusing on obfuscating code and trying to hide logic in the client, thinking that nobody will figure out that you can access a resource without proper authorisation. Somebody will. And you won’t be prepared.

    Extra authentication for important changes

    Don’t forget that despite all the security measures you employ and force on your users, often the easiest way to breach their account is to just steal your phone or tablet for a few minutes. Don’t allow email address or password changes without asking for a password (or making sure you have the right person on the other end of the wire).

    Stop doing stupid things

    • Stop adding stupid password length limits (you’re going to hash it anyway)
    • Stop telling me my password is not secure enough because it only contains 28 random letters, but no digits
    • And lastly, stop asking me to use upper case letters, lower case letters, at least one digit, two punctuation marks and one beauty mark in my password.
    • programming,
    • security
  • 07 Jan 2014

    Evil ELFs

    In this post I am going to demonstrate how to easily find out what an evil ELF is doing to your system. This can be useful if you have one that is making secure network connections and you want to have a closer look… Or just for fun.

    Linked library dependencies and ldd

    The easiest to start with are linked library dependencies. In our example:

    $ ldd ./evil-elf
      […]
      libcurl.so.4 => /usr/lib64/libcurl.so.4 (0x00007fa94ba57000)
      […]

    The rest of the output is stripped; the important thing is that our app seems to use libcurl to communicate with the evil servers.

    LD_PRELOAD and debug libraries

    To have some more info on what is going on behind the scenes, we can grab a copy of libcurl and build a debug version that has verbose logging enabled by default.

    $ wget https://curl.haxx.se/download/curl-7.34.0.tar.lzma
    $ lzma -d curl-7.34.0.tar.lzma
    $ cd curl-7.34.0
    $ ./configure --enable-debug
    $ make

    Now we can use the debug version of libcurl.so to get a lot of debugging output about the network connections made:

    $ LD_PRELOAD=./curl-7.34.0/lib/.libs/libcurl.so ./evil-elf

    The debug build automatically enables the [CURLOPT_VERBOSE] param, which logs all connection information, except the transferred payload. To also log the payload, have a look at the sample code in debug.c (part of the libcurl project).

    Static (built-in) libs and objdump

    Now that we can inspect the traffic, we can use curl to impersonate the app. But what if the requests are signed, and the signature is verified on the server? We want to be able to generate those fingerprints ourselves.

    Let’s assume that we’ve noticed a 40-char digit hex string in every request. 40 characters? It is most likely SHA1. But we didn’t see any linked library that could be used to generate such hashes… Perhaps they are not dynamically linked (that happens often with distributed binaries).

    To have a closer look at the evil app, let’s take it apart with objdump:

    $ objdump -ClDgTt -M intel evil-elf > evil-elf.asm
    $ ag -i sha1 evil-elf.asm
    35011:0000000000a8faae  w   DF .text    000000000000001a  Base        boost::uuids::detail::sha1::sha1()
    2983642:  a8ec8e:       e8 61 10 00 00          call   a8fcf4 <boost::uuids::detail::sha1::process_bytes(void const*, unsigned long)>
    2983647:  a8eca4:       e8 b7 13 00 00          call   a90060 <boost::uuids::detail::sha1::get_digest(unsigned int (&) [5])>
    […]

    Bingo! It seems the Boost library is used to generate the SHA1 hashes. A quick look at the source reveals that the routines live inside boost/uuid/sha1.hpp.

    Runtime inspection with gdb

    Instead of preloading a debug version of this, we’ll use gdb to break execution of the app when it feeds the string to be hashed:

    $ gdb
    GNU gdb (Gentoo 7.6.2 p1) 7.6.2
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-pc-linux-gnu".
    For bug reporting instructions, please see:
    <https://bugs.gentoo.org/>.
    (gdb) file ./evil-elf
    (gdb) break boost::uuids::detail::sha1::process_bytes
    (gdb) run
    […]

    Now when execution stops at process_bytes, we know that the string (char * to be precise) we need is somewhere at hand. Probably near the top of the stack, or maybe in a register. We know it is the first parameter when calling the function, but the compiler may have mangled that away, plus we have to consider the hidden argument (this) implied when calling a C++ method.

    Breakpoint 1, 0x0000000000a8fcf8 in boost::uuids::detail::sha1::process_bytes(void const*, unsigned long) ()
    (gdb) info registers
    rax            0x7fffffff8480   140737488323712
    rbx            0x7fffffff8850   140737488324688
    rcx            0x1208008        18907144
    rdx            0x94     148
    […]

    We can try printing these addresses as characters to see if we find our char *.

    0x7fffffff8480: 1 '\001'        35 '#'  69 'E'  103 'g' -119 '\211'     -85 '\253'      -51 '\315'      -17 '\357'
    0x7fffffff8488: -2 '\376'       -36 '\334'
    (gdb) x/10c $rbx
    0x7fffffff8850: 64 '@'  0 '\000'        0 '\000'        0 '\000'        0 '\000'        0 '\000'        0 '\000'        0 '\000'
    0x7fffffff8858: 0 '\000'        0 '\000'
    (gdb) x/10c $rcx
    0x1208008:      80 'P'  79 'O'  83 'S'  84 'T'  38 '&'  104 'h' 116 't' 116 't'
    0x1208010:      112 'p' 115 's'

    There it is, in the RCX register! Let’s print it as a string!

    (gdb) x/s $rcx
    0x1208008:      "POST&https%3A%2F%evil%2Ecom%2Fapi%2Fv1%2Fauth%2Fclient&&SECRET&1389088091&NONCE&SIGNATURE&"

    Awesome. Now that we see how the signature is being generated, we can do the same when faking the requests with curl.

    Automate it all with .gdbinit

    One easy way to automate printing the data being hashed is by creating a .gdbinit file like this:

    set environment LD_PRELOAD=./curl-7.34.0/lib/.libs/libcurl.so
    
    file ./evil-elf
    
    break boost::uuids::detail::sha1::process_bytes
    commands $bpnum
    x/s $rcx
    continue
    end
    
    run

    Now, to start the monitored version of evil-elf, just run gdb, and the rest will be taken care of.

    A few more hints

    When running commands from gdb, you can call functions as well, using the call command. However, it is easy to run into recursions, and gdb will stop there, without completing the called function correctly. An easy fix is to disable the breakpoint at the beginning of the commands block, and re-enable them just before the end.

    Sources

    • gdb — the GNU debugger: man gdb
    • objdump — display information from object files: man objdump
    • libcurl — client-side URL transfers: man libcurl
    • lsof — list open files: man lsof
    • .gdbinit commands: this answer on stack exchange
    • programming,
    • reverse-engineering
  • 29 Nov 2013

    JSON decoding in Go

    Incidentally, decoding JSON data (or really, almost any data structure) is really easy in Go (golang). We simply call json.Unmarshal(…) and boom! We have nice data structures.

    Well, except if our input source is not very well defined (meaning not strictly typed).

    Objects with loose schema

    Take this example. We want to decode a JSON object that looks like this:

    {
      "author": "attilaolah@gmail.com",
      "title":  "My Blog",
      "url":    "https://attilaolah.eu"
    }

    The usual way to go is to decode it into a struct:

    type Record struct {
        Author string `json:"author"`
        Title  string `json:"title"`
        URL    string `json:"url"`
    }
    
    func Decode(r io.Reader) (x *Record, err error) {
        x = new(Record)
        err = json.NewDecoder(r).Decode(x)
        return
    }

    That’s fairly easy. But what happens if suddenly we add a new data source that uses numeric author IDs instead of emails? For example, we might have an input stream that looks like this:

    [{
      "author": "attilaolah@gmail.com",
      "title":  "My Blog",
      "url":    "https://attilaolah.eu"
    }, {
      "author": 1234567890,
      "title":  "Westartup",
      "url":    "https://www.westartup.eu"
    }]

    Decoding to interface{}

    An quick & easy fix is to decode the author field to an interface{} and then do a type switch. Something like this:

    type Record struct {
        AuthorRaw interface{} `json:"author"`
        Title     string      `json:"title"`
        URL       string      `json:"url"`
    
        AuthorEmail string
        AuthorID    uint64
    }
    
    func Decode(r io.Reader) (x *Record, err error) {
        x = new(Record)
        if err = json.NewDecoder(r).Decode(x); err != nil {
            return
        }
        switch t := x.AuthorRaw.(type) {
        case string:
            x.AuthorEmail = t
        case float64:
            x.AuthorID = uint64(t)
        }
        return
    }

    That was easy… except, it doesn’t work. What happens when our IDs get close to 264-1? Their precision will not fit in a float64, so our decoder will end up rounding some IDs. Too bad.

    Decoder.UseNumber() to the rescue!

    Luckily there’s an easy way to fix this: by calling Decoder.UseNumber(). “UseNumber causes the Decoder to unmarshal a number into an interface{} as a Number instead of as a float64” — from the docs.

    Now our previous example would look something like this:

    func Decode(r io.Reader) (x *Record, err error) {
        x = new(Record)
        if err = json.NewDecoder(r).Decode(x); err != nil {
            return
        }
        switch t := x.AuthorRaw.(type) {
        case string:
            x.AuthorEmail = t
        case json.Number:
            var n uint64
            // We would shadow the outer `err` here by using `:=`
            n, err = t.Int64()
            x.AuthorID = n
        }
        return
    }

    Seems fine, now, right? Nope! This will still fail for numbers > 263, as they would overflow the int64.

    No we see that if we want to decode a JSON a number into an uint64, we really have ta call Decoder.Decode(…) (or json.Unmarshal(…)) with a *uint64 argument (a pointer to a uint64). We could do that simply by directly decoding the string representation of the number. Instead of:

            n, err = t.Int64()

    …we could write:

            err = json.Unmarshal([]byte(t.String()), &n)

    Wait… Let’s use json.RawMessage instead.

    Now we’re correctly decoding large numbers into uint64. But now we’re also just using the json.Number type to delay decoding of a particular value. To do that, the json package provides a more powerful type: json.RawMessage. RawMessage simply delays the decoding of part of a message, so we can do it ourselves later. (We can also use it to special-case encoding of a value.)

    Here is our example, using json.RawMessage:

    type Record struct {
        AuthorRaw json.RawMessage `json:"author"`
        Title     string          `json:"title"`
        URL       string          `json:"url"`
    
        AuthorEmail string
        AuthorID    uint64
    }
    
    func Decode(r io.Reader) (x *Record, err error) {
        x = new(Record)
        if err = json.NewDecoder(r).Decode(x); err != nil {
            return
        }
        var s string
        if err = json.Unmarshal(x.AuthorRaw, &s); err == nil {
            x.AuthorEmail = s
            return
        }
        var n uint64
        if err = json.Unmarshal(x.AuthorRaw, &n); err == nil {
            x.AuthorID = n
        }
        return
    }

    This looks better. Now we can even extend it to accept more schemas. Say we want to accept a third format:

    [{
      "author": "attilaolah@gmail.com",
      "title":  "My Blog",
      "url":    "https://attilaolah.eu"
    }, {
      "author": 1234567890,
      "title":  "Westartup",
      "url":    "https://www.westartup.eu"
    }, {
      "author": {
        "id":    1234567890,
        "email": "nospam@westartup.eu"
      },
      "title":  "Westartup",
      "url":    "https://www.westartup.eu"
    }]

    It seems obvious that we are going to need an Author type. Let’s define one.

    type Record struct {
        AuthorRaw json.RawMessage `json:"author"`
        Title     string          `json:"title"`
        URL       string          `json:"url"`
    
        Author Author
    }
    
    type Author struct {
        ID    uint64 `json:"id"`
        Email string `json:"email"`
    }
    
    func Decode(r io.Reader) (x *Record, err error) {
        x = new(Record)
        if err = json.NewDecoder(r).Decode(x); err != nil {
            return
        }
        if err = json.Unmarshal(x.AuthorRaw, &x.Author); err == nil {
            return
        }
        var s string
        if err = json.Unmarshal(x.AuthorRaw, &s); err == nil {
            x.Author.Email = s
            return
        }
        var n uint64
        if err = json.Unmarshal(x.AuthorRaw, &n); err == nil {
            x.Author.ID = n
        }
        return
    }

    This looks fine… Except that now we’re doing all the decoding of the Author type in the function that decodes the Record object. And we can see that with time, our Record object’s decoder will grow bigger and bigger. Wouldn’t it be nice if the Author type could somehow decode itself?

    Behold, the json.Unmarshaler interface!

    Implement that by any type, and the json package will use that to unmarshal your object.

    Let’s move the decode logic to the Author struct:

    type Record struct {
        Author Author `json:"author"`
        Title  string `json:"title"`
        URL    string `json:"url"`
    }
    
    type Author struct {
        ID    uint64 `json:"id"`
        Email string `json:"email"`
    }
    
    // Used to avoid recursion in UnmarshalJSON below.
    type author Author
    
    func (a *Author) UnmarshalJSON(b []byte) (err error) {
    	j, s, n := author{}, "", uint64(0)
        if err = json.Unmarshal(b, &j); err == nil {
    		*a = Author(j)
            return
        }
        if err = json.Unmarshal(b, &s); err == nil {
            a.Email = s
            return
        }
        if err = json.Unmarshal(b, &n); err == nil {
            a.ID = n
        }
        return
    }
    
    func Decode(r io.Reader) (x *Record, err error) {
        x = new(Record)
        err = json.NewDecoder(r).Decode(x)
        return
    }

    Much better. Now that the Author object knows how to decode itself, we don’t have to worry about it any more (and we can extend Author.UnmarshalJSON when we want to support extra schemas, e.g. username or email).

    Furthermore, now that Record objects can be decoded without any additional work, we can move one more level higher:

    type Records []Record
    
    func Decode(r io.Reader) (x Records, err error) {
        err = json.NewDecoder(r).Decode(&x)
        return
    }

    You can go play with this.

    NOTE: Thanks to Riobard Zhan for pointing out a mistake in the previous version of this article. The reason I have two types above, Author and author, is to avoid an infinite recursion when unmarshalling into an Author instance. The private author type is used to trigger the built-in JSON unmarshal machinery, while the exported Author type is used to implement the json.Unmarshaler interface. The trick with the conversion near the top of the Unmarshal is used to avoid the recursion.

    What about encoding?

    Let’s say we want to normalise all these data sources in our API and always return the author field as an object. With the above implementation, we don’t have to do anything: re-encoding records will normalise all objects for us.

    However, we might want to save some bandwidth by not sending defaults. For that, we can tag our fields with json:",omitempty":

    type Author struct {
        ID    uint64 `json:"id,omitempty"`
        Email string `json:"email,omitempty"`
    }

    Now 1234 will be turned into {"id":1234}, "attilaolah@gmail.com" to {"email":"attilaolah@gmail.com"}, and {"id":1234,"email":"attilaolah@gmail.com"} will be left intact when re-encoding objects.

    Using json.Marshaler

    For encoding custom stuff, there’s the json.Marshaler interface. It’s works similarly to json.Unmarshaler. You implement it, and the json package uses it.

    Let’s say that we want to save some bandwidth, and always transfer the minimal information required to reconstruct the objects by the json.Unmarshaler. We could implement something like this:

    func (a *Author) MarshalJSON() ([]byte, error) {
        if a.ID != 0 && a.Email != "" {
            return json.Marshal(map[string]interface{}{
                "id":    a.ID,
                "email": a.Email,
            })
        }
        if a.ID != 0 {
            return json.Marshal(a.ID)
        }
        if a.Email != "" {
            return json.Marshal(a.Email)
        }
        return json.Marshal(nil)
    }

    Now 1234, "attilaolah@gmail.com" and {"id":1234,"email":"attilaolah@gmail.com"} are left intact, but {"id":1234} is turned into 1234 and {"email":"attilaolah@gmail.com"} is turned into "attilaolah@gmail.com".

    Another way to do the same would be to have two types, one that always encodes to an object (Author), and one that encodes to the minimal representation (CompactAuthor):

    type Author struct {
        ID    uint64 `json:"id,omitempty"`
        Email string `json:"email,omitempty"`
    }
    
    type CompactAuthor Author
    
    func (a *CompactAuthor) MarshalJSON() ([]byte, error) {
        if a.ID != 0 && a.Email != "" {
            return json.Marshal(Author(a))
        }
        if a.ID != 0 {
            return json.Marshal(a.ID)
        }
        if a.Email != "" {
            return json.Marshal(a.Email)
        }
        return json.Marshal(nil)
    }

    Using pointers

    We see now that the omitempty tag is pretty neat. But we can’t use it with the author field, because json can’t tell if an Author object is “empty” or not (for json, a struct is always non-empty).

    To fix that, we can turn the author field into an *Author instead. Now json.Unmarshal(…) will leave that field nil when the author information is completely missing, and it will not include the field in the output when Record.Author is nil.

    Example:

    type Record struct {
        Author *Author `json:"author"`
        Title  string  `json:"title"`
        URL    string  `json:"url"`
    }

    Timestamps

    time.Time implements both json.Marshaler and json.Unmarshaler. Timestamps are formatted as RFC3339.

    However, it is important to remember that time.Time is a struct type, hence json will never consider it “empty” (json will not consult Time.IsZero())

    To omit zero timestamps with the omitempty tag, use a pointer (i.e. *time.Time) instead.

    Conclusion

    Interfaces are awesome. Even more awesome than you probably think. And tags. Combine these two, and you have objects that you can expose through an API supporting various encoding formats. Let me finish with a simple yet powerful example:

    type Author struct {
        XMLName xml.Name `json:"-" xml:"author"`
        ID      uint64   `json:"id,omitempty" xml:"id,attr"`
        Email   string   `json:"email,omitempty" xml:"email"`
    }

    Authors can now be encoded/decoded to/from a number of formats:

    [{
      "id": 1234,
      "email": "attilaolah@gmail.com"
    },
      "nospam@westartup.eu",
      5678
    ]
    <author id="1234">
      <email>attilaolah@gmail.com</email>
    </author>
    <author>
      <email>nospam@westartup.eu</email>
    </author>
    <author id="5678"/>

    Related posts:

    • JSON and struct composition in Go
    • programming
  • 21 Nov 2013

    Convert HTML files to PNG

    For some stupid reason I ended up having to convert a whole lot of HTML files to images. Not wanting to waste space, the plan was to compress the files as much as possible without loosing too much quality.

    So here’s how to do it:

    for f in *.html; do
      wkhtmltoimage -f png $f $f.png
      mogrify -trim $f.png
      pngquant --speed 1 $f.png --ext .png.o
      mv $f.png.o $f.png
    done

    Here’s what each line does:

    • for f in *.html; do will loop through each .html file in the current directory
    • wkhtmltoimage -f png $f $f.png will convert the .html files to .png (see the wkhtmltoimage project page)
    • mogrify -trim $f.png will auto-crop the image and cut off the body padding, this saves a few bytes (see the mogrify documentation)
    • pngquant --speed 1 $f.png --ext .png.o compresses the .png files (note that pngquant uses lossy compression; to compress losslessly, use pngcrush instead)
    • mv $f.png.o $f.png replaces the original png with the optimised one
    • programming
Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
Next
Attila Oláh //
atl@google.com