https://github.com/gunnarmorling/1brc
The code does this by:
wikitable2csv
for making CSV files.The seeding of these random numbers is hard to figure out.
generate.go
is a reimplementation based on above. This takes
go run generate.go 247.27s user 9.32s system 98% cpu 4:19.81 total
And produces a 13G file.
process1.go
is a simple implementation.
encoding/csv
.This runs in
go run process1.go --cpuprofile=prof.cpu 141.79s user 4.70s system 101% cpu 2:24.75 total
141 seconds, of which only 4.7s is user time.
% go tool pprof prof.cpu
File: process1
Type: cpu
Time: Mar 3, 2024 at 3:52pm (GMT)
Duration: 144.04s, Total samples = 147.39s (102.32%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 144.84s, 98.27% of 147.39s total
Dropped 137 nodes (cum <= 0.74s)
Showing top 10 nodes out of 58
flat flat% sum% cum cum%
131.25s 89.05% 89.05% 131.25s 89.05% syscall.syscall
6.57s 4.46% 93.51% 6.57s 4.46% runtime.pthread_cond_wait
2.25s 1.53% 95.03% 2.25s 1.53% runtime.usleep
1.41s 0.96% 95.99% 1.41s 0.96% runtime.pthread_cond_timedwait_relative_np
1.25s 0.85% 96.84% 1.25s 0.85% runtime.pthread_kill
1.07s 0.73% 97.56% 1.07s 0.73% runtime.madvise
0.74s 0.5% 98.07% 0.74s 0.5% runtime.kevent
0.19s 0.13% 98.20% 1.52s 1.03% runtime.stealWork
0.08s 0.054% 98.25% 131.77s 89.40% encoding/csv.(*Reader).readRecord
0.03s 0.02% 98.27% 132.16s 89.67% main.main
Nearly all time goes in syscalls, likely read
.
Next stop was to write file_benchmark.go
, to see how slow the disk was. It can read the file (single threaded) in 4.5 seconds.
Trying with a 1Mb buffer: 138.45s user 2.33s system 101% cpu 2:19.06 total
Trying with a 1Gb buffer: 146.99s user 3.13s system 98% cpu 2:32.72 total
% go tool pprof prof.cpu
File: process2
Type: cpu
Time: Mar 3, 2024 at 4:58pm (GMT)
Duration: 152.15s, Total samples = 124.89s (82.08%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 105.42s, 84.41% of 124.89s total
Dropped 108 nodes (cum <= 0.62s)
Showing top 10 nodes out of 58
flat flat% sum% cum cum%
36.11s 28.91% 28.91% 93.80s 75.11% main.(*reader).read
25.67s 20.55% 49.47% 36.65s 29.35% runtime.mallocgc
11.45s 9.17% 58.64% 11.45s 9.17% runtime.pthread_cond_signal
6.90s 5.52% 64.16% 11.17s 8.94% runtime.mapaccess2_faststr
5.49s 4.40% 68.56% 5.94s 4.76% runtime.newArenaMayUnlock
4.70s 3.76% 72.32% 4.70s 3.76% strconv.readFloat
4.62s 3.70% 76.02% 4.62s 3.70% runtime.memclrNoHeapPointers
3.94s 3.15% 79.17% 3.94s 3.15% runtime.madvise
3.29s 2.63% 81.81% 3.29s 2.63% aeshashbody
3.25s 2.60% 84.41% 33.81s 27.07% runtime.growslice
Next idea was to share buffers. 100 seconds, but still lots of GC.
Before running out of time, I had a go at using many go routines. It was faster, but buggy.
The challenge is really “how fast can you read CSV?” which is a little odd. Would be better to use a better format.