Basic commandline tools never cease to surprise me it seems.
Today's surprise: by default, `du` reports the amount of bytes a file consumes on disk, not its apparent size as shown by `ls` (aka the amount of bytes you can read out of that file). To get apparent sizes, you need to `--apparent-size`, or use the shorthand `-b` (which also forces the block size to 1 byte, which shouldn't matter).
Brought to you by "why are the Unicode 16.0 data files 30MiB standalone, but 80MiB when I tar them up?"