Problem: sort a billion rows

https://github.com/gunnarmorling/1brc

Generate a billion rows

The code does this by:

  1. Using the wikipedia page
  2. Consuming the five CSV files with a SQL statement.
    • Interestingly, there’s 5 files mentioned in the comment but 6 continents listed in the Wikipedia page.
      • Listed: Asia, Africa, Oceania, Europe, North America
      • Not listed: South america. Spot checking a few south American cities shows them missing.
  3. Selecting a uniformly random station.
  4. Selecting a Gaussian distributed value from the mean for that station.
  5. Repeat until enough points.

The seeding of these random numbers is hard to figure out.

  • one implementation seems to use 0.
  • OpenJDK seems to use a combination of a setting, the network addresses, and the current time.

Golang reimplementation

generate.go is a reimplementation based on above. This takes

go run generate.go  247.27s user 9.32s system 98% cpu 4:19.81 total

And produces a 13G file.

Sorting a billion rows.

First attempt: simple.

process1.go is a simple implementation.

  1. open the file, read it with encoding/csv.
  2. have a map to a struct per city, accumulate into the struct.
  3. when processed every line, project the map into a slice, sort the slice.
  4. print the output.

This runs in

go run process1.go --cpuprofile=prof.cpu  141.79s user 4.70s system 101% cpu 2:24.75 total

141 seconds, of which only 4.7s is user time.

profile

% go tool pprof prof.cpu
File: process1
Type: cpu
Time: Mar 3, 2024 at 3:52pm (GMT)
Duration: 144.04s, Total samples = 147.39s (102.32%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 144.84s, 98.27% of 147.39s total
Dropped 137 nodes (cum <= 0.74s)
Showing top 10 nodes out of 58
      flat  flat%   sum%        cum   cum%
   131.25s 89.05% 89.05%    131.25s 89.05%  syscall.syscall
     6.57s  4.46% 93.51%      6.57s  4.46%  runtime.pthread_cond_wait
     2.25s  1.53% 95.03%      2.25s  1.53%  runtime.usleep
     1.41s  0.96% 95.99%      1.41s  0.96%  runtime.pthread_cond_timedwait_relative_np
     1.25s  0.85% 96.84%      1.25s  0.85%  runtime.pthread_kill
     1.07s  0.73% 97.56%      1.07s  0.73%  runtime.madvise
     0.74s   0.5% 98.07%      0.74s   0.5%  runtime.kevent
     0.19s  0.13% 98.20%      1.52s  1.03%  runtime.stealWork
     0.08s 0.054% 98.25%    131.77s 89.40%  encoding/csv.(*Reader).readRecord
     0.03s  0.02% 98.27%    132.16s 89.67%  main.main

Nearly all time goes in syscalls, likely read.

file benchmark

Next stop was to write file_benchmark.go, to see how slow the disk was. It can read the file (single threaded) in 4.5 seconds.

Attempt 2: faster parser.

Trying with a 1Mb buffer: 138.45s user 2.33s system 101% cpu 2:19.06 total

Trying with a 1Gb buffer: ` 146.99s user 3.13s system 98% cpu 2:32.72 total`

% go tool pprof prof.cpu
File: process2
Type: cpu
Time: Mar 3, 2024 at 4:58pm (GMT)
Duration: 152.15s, Total samples = 124.89s (82.08%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 105.42s, 84.41% of 124.89s total
Dropped 108 nodes (cum <= 0.62s)
Showing top 10 nodes out of 58
      flat  flat%   sum%        cum   cum%
    36.11s 28.91% 28.91%     93.80s 75.11%  main.(*reader).read
    25.67s 20.55% 49.47%     36.65s 29.35%  runtime.mallocgc
    11.45s  9.17% 58.64%     11.45s  9.17%  runtime.pthread_cond_signal
     6.90s  5.52% 64.16%     11.17s  8.94%  runtime.mapaccess2_faststr
     5.49s  4.40% 68.56%      5.94s  4.76%  runtime.newArenaMayUnlock
     4.70s  3.76% 72.32%      4.70s  3.76%  strconv.readFloat
     4.62s  3.70% 76.02%      4.62s  3.70%  runtime.memclrNoHeapPointers
     3.94s  3.15% 79.17%      3.94s  3.15%  runtime.madvise
     3.29s  2.63% 81.81%      3.29s  2.63%  aeshashbody
     3.25s  2.60% 84.41%     33.81s 27.07%  runtime.growslice

Next idea was to share buffers. 100 seconds, but still lots of GC.

Attempt 3: many threads

Before running out of time, I had a go at using many go routines. It was faster, but buggy.

Thoughts

The challenge is really “how fast can you read CSV?” which is a little odd. Would be better to use a better format.