[ asynchronous communication about programming and whatever else ]
Advent of Performance: Go Edition
June 03, 2024

Every so often I like to spend some time working on Advent of Code problems when I need a break from my primary projects. It's also a great way to learn a new language, or sharpen skills in one I already know.

Another fun angle is to spend time making solutions fast. Usually, they are "fast enough" for the purposes of solving the problem and moving on, just to get the gist of how to think algorithmically. But, in some cases, there is an obvious slow down when going from part 1 to part 2 of a problem statement, which usually means there's a faster alternative. Day 4 from the 2015 problem set provided exactly that opportunity.

Mining AdventCoin

The problem statement for this day is essentially a simplified example of how bitcoin mining works, by finding a hash collision. Given some input string (e.g. yzbqklnj), the goal is to find some "nonce" integer, that when appended to the input string and hashed (using MD5), the resulting hash will have some number of leading zeros.

For example,

If your secret key is abcdef, the answer is 609043, because the MD5 hash of abcdef609043 starts with five zeroes (000001dbbfa...), and it is the lowest such number to do so.

Hashes can't be "reversed", meaning that the only way to do this is to try a bunch of nonces until we find a hash with the required number of leading zeros.

Fortunately, Go's standard crypto package gives us a function to compute an MD5 hash of a byte array, so we can do this pretty simply.

func naive(input []byte, prefix string) string {
	seed := strings.TrimSpace(string(input))
	nonce := 1

	for {
		preimage := []byte(fmt.Sprintf("%s%d", seed, nonce))
		hash := md5.Sum(preimage)

		if check(hash, prefix) {
			break // found nonce
		}

		nonce += 1
	}

	return fmt.Sprintf("%d", nonce)
}

The check function just formats the resulting hash in hex and looks at the prefix. We've separated it out so we can change it later...

func check(hash [16]byte, prefix string) bool {
	hex := fmt.Sprintf("%x", hash)
	return strings.HasPrefix(hex, prefix)
}

Easy enough! We've solved Part 1. For part 2, we just have to run the same loop, but instead of looking for a hash with 5 leading zeros, we have to find one with 6 leading zeros.

Updating our input and running it again... Oh. Well. It definitely takes longer. Noticeably longer. In these situations, you always wonder if something else went wrong and you've introduced an infinite loop. In this case, we definitely get correct output, but it's just slow.

$ go build -o day4 solutions/day4/main.go

# find hash with 5 leading zeros (00000)
$ ./day4 -part 1 input/day4
282749
time: 134.816113ms

# find hash with 6 leading zeros (000000)
$ ./day4 -part 2 input/day4
9962624
time: 4.029076709s

Technically, nothing is "wrong" with our algorithm here. It's just dealing with a much larger space of possibilities. Generally speaking, the more specific your target hash is, the longer it's going to take, effectively trying random nonce values until you find the needle in a ridiculously large haystack.

Of course, Go is known for making concurrency relatively easy, so let's just spin up a goroutine for each CPU core and try as many different nonces as we can simultaneously.

Originally, I had thought I would have runtime.NumCPU() goroutines as "workers" crunching on our hash function, with another goroutine that would act as a "ticket counter", giving out the next nonce over a channel when a worker needed a new one after failing to find the desired hash prefix.

Although, it didn't actually work. At least not in the way I expected. We got the right number, but it took even longer than the naive solution when looking for 6 leading zeros.

I figured that the overhead of the coordination between all the goroutines was just too much to be worth the parallel effort. The hashing function is so quick that it was likely the workers were putting too much contention on the nonce channel, too quickly, causing a lot of runtime context switching and blocking.

Actually making it fast

I knew that parallelization was definitely the way to go here because each hashing function is completely independent of the other. No state needs to be carried over between testing nonce 32535 and nonce 871279. Ultimately, that was the key insight needed to make this work.

If we consider each worker in isolation, we can define a simple formula for determining the next nonce in its sequence to test. Just make it count upwards, skipping by some fixed amount each iteration. In this case, the number of goroutines. This way, each worker tests a unique sequence of integers, but doesn't overlap with any other worker.

For simplicity, let's assume we have 4 workers. If they each count by 4, starting from a different place, they'll never overlap.

worker 0 -> 0, 4, 8, 12, ...
worker 1 -> 1, 5, 9, 13, ...
worker 2 -> 2, 6, 10, 14, ...
worker 3 -> 3, 7, 11, 15, ...

Each worker can calculate this sequence with 2 constants, the "base" (starting point), and "N", the number of workers.

nonce = base + (i * N)

If i increases indefinitely until we either find a solution, or are told to stop looking, we can parallelize this and eliminate virtually all coordination between each worker, until it needs to notify us of a solution.

func optimized(input []byte, prefix string) string {
	seed := strings.TrimSpace(string(input))
	numWorkers := runtime.NumCPU()

    // wait group ensures all goroutines clean up after cancellation
	var wg sync.WaitGroup

    // cancellable context is cancelled once a solution is found
	ctx, cancel := context.WithCancel(context.Background())

    // given to each worker
	solution := make(chan int, 1)

    // spawn a worker for each CPU core
	for b := 0; b < numWorkers; b++ {
		wg.Add(1)
		go func(base, n int) {
			defer wg.Done()

			i := 0
			for {
				select {
				case <-ctx.Done():
					return

				default:
					nonce := base + (i * n) // independently calculate next nonce in sequence
					preimage := []byte(fmt.Sprintf("%s%d", seed, nonce))
					hash := md5.Sum(preimage)

					if check(hash, prefix) {
						solution <- nonce
						return
					}

					i = i + 1
				}
			}
		}(b, numWorkers)
	}

	winner := <-solution
	cancel()

	wg.Wait()
	return fmt.Sprintf("%d", winner)
}

And the result...

$ ./day4 -part 2 input/day4
9962624
time: 3.976446064s

$ ./day4 -part 2 -optimized input/day4
9962624
time: 699.66703ms

Boom. Huge difference!

This solution actually helps improve performance in part 1 as well.

$ ./day4 -part 1 input/day4
282749
time: 119.919216ms

$ ./day4 -part 1 -optimized input/day4
282749
time: 22.183944ms

One more thing!

Remember the check function above? It encodes the resulting MD5 hash into hexadecimal and looks at that string to see if the nonce results in a hash with the desired number of leading zeros.

Primarily leaning on fmt.Sprintf() and the hex formatting directive, it requires allocating heap memory to build the hex string, which must be done on every iteration of the loop.

Since all we really need to do is check the leading byte values of the hash, we can perform that check without any heap allocation at all.

// Check if the given MD5 hash has the given leading number of zeros.
func check(hash [16]byte, leading int) bool {
	numBytes := leading / 2
	for i := 0; i < numBytes; i++ {
		if hash[i] != 0x00 {
			return false
		}
	}

	if leading%2 != 0 {
		// looking for odd number of zeros...
		// since each hex character represents 4 bits, we have to use a bit shift to check our last byte
		// in the hash if the desired number of leading zeros is odd. For example, if we are looking for
		// `000` (odd number of zeros), the byte sequence _could_ be `0x00 0x01`, or `0x00 0x02`, etc...
		// In this case, we bit shift the last byte to the right by 4, and check if that result is 0,
		// ignoring 4 least significant bits.

		// last index in hash to check is equal to `numBytes` because of the integer division above.
		// for example, if leading == 5, numBytes == 2 (due to integer division), and we need to
		// check the 3rd byte in the sequence (i.e. index 2)
		return (hash[numBytes] >> 4) == 0x00
	}

	return true
}

Because a single digit of a hex value actually represents 4 bits, looking for 6 leading zeros in the resulting hash means we look at the first 3 bytes. (0x000000 split on each byte looks like 0x00 0x00 0x00.)

If we need to check for an odd number of leading zeros (say, 5 like in part 1), we can lean on some integer division and a modulo check to ensure we check the last byte in the sequence correctly. As we said, that last 0 to check only represents 4 bits, so we can use a bit shift to ignore the 4 least significant bits of the byte, and check that result against the 0x00 byte value.

Assuming we get through those checks, we've found a solution!

Looking at the run time, this updated version of check improves performance across the board, in both the naive solution, and optimized solution that uses parallelization!

$ ./day4 -part 1 input/day4
282749
time: 55.216603ms

$ ./day4 -part 2 input/day4
9962624
time: 1.818860088s
$ ./day4 -part 1 -optimized input/day4
282749
time: 11.829511ms

$ ./day4 -part 2 -optimized input/day4
9962624
time: 319.841772ms

Summary

Here's a table summarizing the performance improvements for both part 1 and part 2. In each case, I ran the solutions 10 times and took the average of those reported run times, rounded to 2 decimal places.

("no alloc" means the version of check that doesn't allocate a string and encode to hex)

SolutionVersionRun TimeSpeedup
Part 1naive118.16 ms1.0x
naive + no alloc53.2 ms2.2x
optimized20.04 ms5.9x
optimized + no alloc10.2 ms11.6x
Part 2naive4.01 s1.0x
naive + no alloc1.81 s2.2x
optimized778.41 ms5.2x
optimized + no alloc358.53 ms11.2x

Environment

I ran these tests using Go version go1.22.3 linux/amd64, on a ThinkPad X1 Extreme Gen 4i, rocking an 11th Gen Intel i7-11800H (16) @ 4.600GHz (reported by neofetch)

All code for this and other solutions can be found on GitHub

This blog by Nick Miller is licensed under CC BY-NC-SA 4.0

Built with ☕ and zola.