Go Bloviations (Optional Reading)

Sections

Syntax

Semicolons

That Damnable Use Requirement
C Compatibility
Unicode
Errors
Compile Times
Concurrency

fish Fails at Making a Thread Safe Set
By Introducing Deadlocks
And Races Introducing
Mark Summerfield Succeeds at Making a Thread Safe Set, But It's Ugly

Conclusions

Thumbs Up
Thumbs Down

I used Google's new Go language for two days. This qualifies me to bloviate on it, so here goes.

The language (unsurprisingly) feels like a modernized C, with design decisions that reflect an apparent consensus on best practices. The language is strictly typed, but supports some limited type inference, to save on keypresses (what the designers call "stuttering"). There's no more header files. It's garbage collected and supports closures. There's pointers, but no pointer arithmetic. There's multiple return values. Strings are built-in and immutable. It feels modern!

But it is C-like, in that it omits a lot of the heavyweight bondage and discipline found in other languages. Data aggregation is done through structs, and there's no access controls: all fields are public. There's no subtyping - in fact, there's no type hierarchy at all. There's no generics, no exceptions, no operator overloading, nada.

In C you spend less time building up a super-structure of type relationships, const-correctness, and abstraction, and more time working on the actual problem. Go seems to be designed in the same spirit of simplicity and transparency. Where so many other modern languages focus on this superstucture, it is refreshing to see a modernized language in the spirit of C.

Syntax

Much has been made of Go's syntax, which at first blush seems pointlessly inverted from C. For example, variable and function return types go after the identifier. But I found the syntax to be simpler and more regular than C: there's fewer extraneous elements, like required parenthesis and useless voids. For example, this Go function I wrote:

SetInactivityTimeout(fn func() int, sec, nsec uint_64)

compares favorably, syntax-wise, to its C analog:

void SetInactivityTimeout(int (*fn)(void), uint64_t sec, uint64_t nsec)

However in other ways, brevity suffers. Branching is one of the most serious victims: with no ternary operator, and with the requirement that 'if' uses open braces with a particular style, the best you can do is this:

if expr {
    n = trueVal
} else {
    n = falseVal
}

This remains true.

Another syntax / semantics oddity is the behavior of reading from channels (like a pipe). Whether a read from a channel blocks depends on how the return value is used:

 res := <- queue /* waits if the queue is empty */
 res, ok := <- queue /* returns immediately if the queue is empty */

This bears repeating: the behavior of a channel read depends on how the return value is (will be) used. This seems like a violation of the laws of time and space!

By the way, the :=<- idiom is called the Overbite Man.

Semicolons

An aside on semicolons: Go programs don't terminate statements with semicolons. Wait, let me rephrase: Go allows you to insert the semicolons, but doesn't require them. Losing semicolons is nice, but the simplicity is only apparent, because to be proficient in Go you still must understand the rules governing Go semicolons.

This is because, instead of omitting semicolons from the grammar, they are injected automatically by the lexer. This isn't an academic distinction, because the abstraction is leaky. For example, here's an error I got from the cgo tool:

test.go:75:1: expected ';', found 'EOF'

The error message's advice is incorrect. The true problem is that the file didn't end with a newline.

That Damnable Use Requirement

Go will refuse to compile a file that has an unused variable or package import. This sounds hygenic, like it's a way to prevent the inevitable accumulation of unused header imports that torment C projects. But in practice, this is a dreadful, dreadful feature. Imagine this:

Something doesn't work right, so you add a call to fmt.Printf to help debug it.
Compile error: "Undefined: fmt."
You add an import "fmt" at the top.
It works, and you debug the problem.
Remove the now annoying log.
Compile error: "imported and not used: fmt."
Remove the "fmt" knowing full well you're just going to be adding it back again in a few minutes.

Repeat a dozen times a day, and it's a recipe for hair-pulling.

Furthermore, some developers compile every few lines, as a sort of sanity check. This is not possible in Go: inevitably you will introduce a variable that you just haven't used yet, and the compile will error out.

This one irritant is the most annoying part of writing in Go.

The damnable use requirement lives on to this day. This requirement would be right at home in a bondage and discipline language, which may explain why it feels so out of place in Go.

C Compatibility

Here's a brighter spot. Go has a foreign function interface to C, but it receives only a cursory note on the home page. This is unfortunate, because the FFI works pretty darn well. You pass a C header to the "cgo" tool, and it generates Go code (types, functions, etc.) that reflects the C code (but only the code that's actually referenced). C constants get reflected into Go constants, and the generated Go functions are stubby and just call into the C functions.

The cgo tool failed to parse my system's ncurses headers, but it worked quite well for a different C library I tried, successfully exposing enums, variables, and functions. Impressive stuff.

Where it falls down is function pointers: it is difficult to use a C library that expects you to pass it a function pointer. I struggled with this for an entire afternoon before giving up. Ostsol got it to work through, by his own description, three levels of indirection.

The cgo documentation has since been vastly improved and is given higher billing on the home page. While I don't think it's quite up to the task of handling term programming, it remains a fabulous feature.

Another welcome change is that Go seems to have hidden some of its Plan 9 naming conventions. For example, at the time of the original post, the Go compiler was '6g'; now it is just 'go'.

Unicode

Go looooves UTF-8. It's thrilling that Go takes Unicode seriously at all in a language landscape where Unicode support ranges from tacked-on to entirely absent. Strings are all UTF-8 (unsurprisingly, given the identity of the designers). Source code files themselves are UTF-8. Moreover, the API exposes operations like type conversion in terms of large-granularity strings, as opposed to something like C or Haskell where case conversion is built atop a function that converts individual characters. Also, there is explicit support for 32 bit Unicode code points ("runes"), and converting between runes, UTF-8, and UTF16. There's a lot to like about the promise of the language with respect to Unicode.

But it's not all good. There is no case-insensitive compare (presumably, developers are expected to convert case and then compare, which is different).

Since this was written, Go added an EqualFold function, which reports whether strings are equal under Unicode case-folding. This seems like a bizarre addition: Unicode-naïve developers looking for a case insensitive compare are unlikely to recognize EqualFold, while Unicode-savvy developers may wonder which of the many folding algorithms you actually get. It is also unsuitable for folding tasks like a case-insensitive sort or hash table.

Furthermore, EqualFold doesn't implement a full Unicode case insensitive compare. You can run the following code at golang.org; it ought to output true, but instead outputs false.

package main
import "fmt"
import "strings"
func main() {
    fmt.Println(strings.EqualFold("ss", "ß"))
}

Bad Unicode support remains an issue in Go.

Operations like substring searching return indexes instead of ranges, which makes it difficult to handle canonically equivalent character sequences. Likewise, string comparison is based on literal byte comparisons: there is no obvious way to handle the precomposed "San José" as the same string as the decomposed "San José". These are distressing omissions.

To give a concrete example, do a case-insensitive search for "Berliner Weisse" on this page in a modern Unicode-savvy browser (sorry Firefox users), and it will correctly find the alternate spelling "Berliner Weiße", a string with a different number of characters. The Go strings package could not support this.

My enthusiasm for its Unicode support was further dampened when I exercised some of the operations it does support. For example, it doesn't properly handle the case conversions of Greek sigma (as in the name "Odysseus") or German eszett:

package main
import (
  "os"
  . "strings"
)
func main() {
   os.Stdout.WriteString(ToLower("ὈΔΥΣΣΕΎΣ\n"))
   os.Stdout.WriteString(ToUpper("Weiße Elster\n"))
}

This outputs "ὀδυσσεύσ" and "WEIßE ELSTER", instead of the correct "ὀδυσσεύς" and "WEISSE ELSTER."

In fact, reading the source code it's clear that string case conversions are currently implemented in terms of individual character case conversion. For the same reason, title case is broken even for Roman characters: strings.ToTitle("ridiculous fish") results in "RIDICULOUS FISH" instead of the correct "Ridiculous Fish." D'oh.

Go has addressed this by documenting this weirdo existing behavior and then adding a Title function that does proper title case mapping. So Title does title case mapping on a string, while ToTitle does title case mapping on individual characters. Pretty confusing.

Unicode in Go might be summed up as good types underlying a bad API. This sounds like a reparable problem: start with a minimal incomplete string package, and fix it later. But we know from Python the confusion that results from that approach. It would be better to have a complete Unicode-savvy interface from the start, even if its implementation lags somewhat.

Errors

In the language FAQ, the designers explain why Go does not support assertions:

...our experience has been that programmers use them as a crutch to avoid thinking about proper error handling and reporting. Proper error handling means that servers continue operation after non-fatal errors instead of crashing. Proper error reporting means that errors are direct and to the point, saving the programmer from interpreting a large crash trace. Precise errors are particularly important when the programmer seeing the errors is not familiar with the code...Time invested writing a good error message now pays off later when the test breaks.

This is the "moon rover" philosophy: if something unexpected happens to the moon rover, it should relay as much information as it can, and keep going no matter the cost. This is a defensible position. I would expect to see some sort of error handling infrastructure, and precise error reporting. But there's not:

If you index beyond the bounds of an array, the error is "index out of range." It does not report what the index is, or what the valid range is.
If you dereference nil, the error is "invalid memory address or nil pointer dereference" (which is it, and why doesn't it know?)
If your code has so much as a single unused variable or import, the compiler will not "continue operation," and instead refuse to compile it entirely.

Some of what I wrote above seems a little snarky / petty, but there it is. Regardless, Go still chooses to not support assertions.

Compile times

Go's compilation speed receives top billing on the home page, with the claim "typical builds take a fraction of a second." At first blush it seems to be so. The single-file project I spent a day on compiles in two hundreths of a second. The 45 file math package compiles in just under a second. Wow!

The compile speed claims seems to have since been removed, so I also removed some ill-conceived ramblings. Here's a summary of what I found 16 months ago:

For small compiles, the Go compiler was blazingly fast; on a large synthetic codebase (700 files), it was three times slower than clang compiling C.
The Go compiler does not support incremental or parallel compilation (yet). Changing one file requires recompiling them all, one by one.
You could theoretically componentize an app into separate packages. However it appears that packages cannot have circular dependencies, so packages are more like libraries than classes.

I don't know to what extent these findings still apply, if at all. Building on the latest release errored with a cryptic "nosplit stack overflow" message, which I did not dig into.

Concurrency

The most important and unusual feature of Go is its concurrency mechanism. To summarize, the "go" keyword can be applied in front of a function call, which will be executed in the background, concurrent with the remainder of the function and all other so-called "goroutines." Goroutines are lightweight. Communication between goroutines is via "channels," which are thread safe queues. A channel is parametrized by some type: you can make a channel of ints, of strings, of structs, etc. This is an attractive mechanism, especially compared to traditional pthreads.

At this point the notes become sparse; the remainder of the text is new content presented in black so as not to exhaust your retinas.

Goroutines

A goroutine is a thread which is scheduled in user-space, and so less expensive than kernel threads. Overhead is a few KB. The docs say, "It is practical to create hundreds of thousands of goroutines in the same address space." Cool!

You can create a goroutine with any function, even a closure. But be careful: a questionable design decision was to make closures capture variables by reference instead of by value. To use an example from Go's FAQ, this innocent looking code actually contains a serious race:

    values := []string{"a", "b", "c"}
    for _, v := range values {
        go func() {
            fmt.Println(v)
            done <- true
        }()
    }

The for loop and goroutines share memory for the variable v, so the loop's modifications to the variable are seen within the closure. For a language that exhorts us to "do not communicate by sharing memory," it sure makes it easy to accidentally share memory! (This is one reason why the default behavior of Apple's blocks extension is to capture by value.)

fish Fails at Making a Thread Safe Set

To explore Go's concurrency, I attempted to make a thread-safe set. The set "runs" in its own goroutine, which not only enables thread safety, but also allows clients to insert data into the set and move on, while the set rehashes in the background - something that a lock-based implementation cannot do!

Let's make a type for the set, SafeSet:

  type SafeSet struct {
    set map[string] bool
    adder chan string
  }

There's a map that will be protected by a goroutine, and a channel. The goroutine reads values from the channel, and adds them to the map.

The set needs a way to test for membership. I took a design cue from the old Go tutorial, which implements an object's methods by having a separate channel for each request "type," so we add a "test" channel. The test channel must receive the value to test, and also a channel to send the result. So we package up the value to be looked up and the result channel into a little struct. We send this on the "test' channel:

type SetTest struct { 
    val string
    result chan bool
}

type SafeSet struct {
    set map[string] bool
    adder chan string
    tester chan SetTest
}

Little single-use types like SetTest seems to be a common idiom in Go. Next, we can introduce a function that services a SafeSet, and all of its channels:

func (set *SafeSet) run() {
    for {
        select {
            case toadd := <- set.adder:
                set.set[toadd] = true
            case testreq := <- set.tester:
                testreq.result <- set.set[testreq.val]
        }
    }
}

Lastly we make a function that creates a SafeSet, by allocating all of its components and kicking off the goroutine:

func newSet() (result SafeSet) {
    result.set = make(map[string] bool)
    result.adder = make(chan string, 16)
    result.tester = make(chan SetTest, 16)
    go result.run()
    return
}

That magic number "16" is the buffer size of the channel: it can hold 16 values in-flight. (A channel can also be unbuffered, which causes a reader to block until a writer is available, and vice-versa.)

The channels are buffered so the client can insert into the set and then move on, even if the set is busy. Not shown is deletion, or wrapper functions; the entire code is here.

As far as I can tell, this is idiomatic Go (or at least it was 16 months ago). Much of the code is concerned with packaging requests and then demultiplexing them in the goroutine. This seems like needless boilerplate: why not instead simply pass a closure over a channel that the goroutine will execute? I have never seen this technique used in Go, but it seems natural to me. (It's essentially how libdispatch works.)

For comparison purposes, I wrote a similar set in Objective-C.

In the ObjC SafeSet, the role of the goroutine is played by the dispatch queue, which is passed closures that it executes. "Ravioli types" like SetTest are obviated by the closures, which automatically package referenced values up into blocks. And there's a convenient facility ('dispatch_sync') to execute a block synchronously, which in Go you must simulate by reading from a channel.

On the other hand, Go's channel mechanism gives you close control over buffer sizes, allowing you to implement rate-limiting of client callers. Channels also provide a natural replacement for callbacks. For example, in Go, you can ask to receive OS signals by simply providing a channel of ints, and then reading from the channel however you like. Dispatch has no such natural mechanism: instead you must specify both a handling block and the queue on which it should be executed.

So some tasks are simpler in Go, and others are simpler in libdispatch. There is no need to pick a winner. Both concurrency mechanisms are a huge improvement over traditional techniques like pthreads.

Deadlocks

Our SafeSet has a function that checks whether a value is in the set. Perhaps we want to add a new function that takes an array and returns whether any of its members are in the set. Recall that, in order to check if a value is in a set, we allocate a channel and pass it to the set; it returns the boolean answer on the channel. As an optimization, I allocated one channel and used it for all the values:

func (set *SafeSet) get_any(strs []string) bool {
    result := false
    recv_chan := make(chan bool)
    for _, s := range strs {
        request := SetTest{s, recv_chan}
        set.tester <- request
    }
    for i := 0; i < len(strs); i++ {
        result = result || <- recv_chan
    }
    return result
}

This works for the first call, but it fails for subsequent calls. The problem is that get_any does not read out all of the values from the channel, so the SafeSet gets stuck writing to them. We could fix it in a few ways; one is to make the channel big enough to hold all values:

func (set *SafeSet) get_any(strs []string) bool {
    result := false
    recv_chan := make(chan bool, len(strs))
    for _, s := range strs {
        request := SetTest{s, recv_chan}
        set.tester <- request
    }
    for i := 0; i < len(strs); i++ {
        result = result || <- recv_chan
    }
    return result
}

Better, because the SafeSet now has enough space to write all of the output values. But are we guaranteed enough space to write all of the input values? Might the set.tester <- request line block?

It might. Or maybe we get lucky, depending on the buffer size that we give the input channel. Up above, we chose a buffer size of 16, without any real justification for that number, but now we see that it has a deep significance. We can pass get_any an array of size 16 or less, and it will work; if we were incautious, we may not discover that larger arrays fail in testing.

Or maybe we do discover it, but what we don't realize is that the size of 16 is a global resource. Imagine if two goroutines both attempt to call test_any with an array of length 10: it may be that both manage to get 8 of their values on the input channel, and then deadlock.

It's worth pointing out that Go detects and reports deadlocks that involve all goroutines. However, if any goroutine in the process is able to run, the deadlock is unreported. So while this deadlock detection is a cool feature, it can be defeated by a simple infinite loop. In a real program, with multiple independent goroutines, the deadlock reporting is unlikely to be useful.

Races

But there's a far more serious bug: a client that inserts into SafeSet may not see that value appear in the set immediately. The client enqueues on the adder channel, and then the tester channel, but there's no guarantee that the SafeSet will handle the requests in that order. Using multiple channels was an irreparable mistake on my part.

SafeSet Conclusions

My attempt as a newbie to write a SafeSet was discouraging, because I introduced lots of bugs that naive testing missed:

add followed by get may falsely return that the value is not in the set.
get_any worked the first call, but not subsequent calls.
get_any failed for arrays larger than size 16.
get_any might fail on any size for concurrent access.

These bugs occurred only because I chose to make the channels buffered. If all channels were unbuffered, none of these problems could occur (but there would be more limited parallelism).

Mark Summerfield's Attempt

Mark Summerfield, in his book Programming in Go, also implemented a similar object, which he coincidentally called a SafeMap. Summerfield avoided all of these bugs by multiplexing up all different commands onto a single channel. This means he needs a way to distinguish between commands, and here it is:

const (
    remove commandAction = iota
    end
    find
    insert
    length
    update
)

The commands are wrapped up in functions like these:

func (sm safeMap) Len() int {
  reply := make(chan interface{})
  sm <- commandData{action: length, result: reply} return (<-reply).(int)
}

(Check out that last line.)

Lastly, the commands are demultiplexed in the goroutine in a big switch statement. So each method of SafeMap must be represented three different ways:

A function exposed to clients
A value in an enum (i.e. the Command pattern)
The actual implementation in the goroutine

Summerfield's approach avoided the bugs I introduced, but also requires a lot of boilerplate and does not allow for much parallelism.

Conclusions

On balance, I like Go and I hope it succeeds. My impression is that it's a strong foundation that gets marred in the details by some poor design decisions. Here's what I thought was good, and what was other.

Thumbs Up

Go captures much of the spirit of C, by eschewing the super-structure of type relationships, const-correctness, and "bondage and discipline" common in other modern languages. A modernized C is a compelling and unfilled niche.
Go feels modern in a balanced and familiar way, by incorporating features such as limited type inference, garbage collection, and multiple return values. In many areas Go does not try to introduce anything new, but instead codifies and unifies existing best practices from other languages. It's conservative in its design.
Go's inverted syntax for variable declarations is unusual and unfamiliar at first, but proves quickly to be simpler than and superior to C.
Channels and goroutines combine to make a powerful and flexible concurrency mechanism.
The C foreign function interface "cgo" works quite well.

Thumbs Down

The Damnable Use Requirement leads to hair pulling.
Syntactical warts:
- Branching is laborious due to the if statement's syntax.
- Channel reads violate the laws of time and space by changing their behavior based on how their return values get used.
- Semicolons are removed in a weird and leaky way that still required me to understand the rules governing their automatic injection. More JavaScript than Python.
Despite what they say, the string type is not Unicode savvy, and the Unicode additions to it are sparse and non-conforming.
Closures capture by reference, which makes it easy to introduce subtle, hard to spot bugs that may not be caught by testing.
Mark Summerfield's SafeMap feels like Java, because it requires repeating everything multiple times. It's a distressing example that I hope is not representative of Go.
I found buffered channels hard to reason about, for two, uh, reasons:
- A deadlock can be masked in testing by a channel's buffer. Unfortunately there are no channels with a dynamic or growable buffer: you must pick a fixed size at channel creation time.
- Threads exchange data only, instead of code and data (like in libdispatch). As a result, it's tempting to send different actions over different channels, as in the original Go tutorial. But this can introduce bugs: the actions can be dequeued and executed in a order different from how they were enqueued.

Good luck to Go, and I look forwards to hearing about all the things I got wrong!

You can bloviate back at reddit or Hacker News.