Commit graph

5 commits

Author SHA1 Message Date
Yorhel
6b7983b2f5 binfmt: Support larger (non-data) block sizes
I realized that the 16 MiB limitation implied that the index block could
only hold ((2^24)-16)/8 =~ 2 mil data block pointers. At the default
64k data block size that means an export can only reference up to
~128 GiB of uncompressed data. That's pretty limiting.

This change increases the maximum size of the index block to 256 MiB,
supporting ~33 mil data block pointers and ~2 TiB of uncompressed data
with the default data block size.
2024-08-09 09:40:29 +02:00
Yorhel
9418079da3 binfmt: Remove CBOR-null-based padding hack
Seems like unnecessary complexity.
2024-08-09 09:19:27 +02:00
Yorhel
8ad61e87c1 Stick with zstd-4 + 64k block, add --compress-level, fix 32bit build
And do dynamic buffer allocation for bin_export, removing 128k of
.rodata that I accidentally introduced earlier and reducing memory use
for parallel scans.

Static binaries now also include the minimal version of zstd, current
sizes for x86_64 are:

  582k ncdu-2.5
  601k ncdu-new-nocompress
  765k ncdu-new-zstd

That's not great, but also not awful. Even zlib or LZ4 would've resulted
in a 700k binary.
2024-08-03 13:16:44 +02:00
Yorhel
5a0c8c6175 Add hardlink counting support for the new export format
This ended up a little different than I had originally planned.

The bad part is that my idea for the 'prevlnk' references wasn't going
to work out. For one because the reader has no efficient way to
determine the head reference of this list and implementing a lookup
table would be pretty costly and complex, and second because even with
those references working, they'd be pretty useless because there's no
way to go from an itemref to a full path. I don't see an easy way to
solve these problems, so I'm afraid the efficient hardlink list feature
will have to be disabled when reading from this new format. :(

The good news is that removing these references simplifies the hardlink
counting implementation and removes the requirement for a global inode
map and associated mutex. \o/

Performance is looking really good so far, too.
2024-08-01 07:32:38 +02:00
Yorhel
f25bc5cbf4 Experimental new export format
The goals of this format being:
- Streaming parallel export with minimal mandatory buffering.
- Exported data includes cumulative directory stats, so reader doesn't
  have to go through the entire tree to calculate these.
- Fast-ish directory listings without reading the entire file.
- Built-in compression.

Current implementation is missing compression, hardlink counting and
actually reading the file. Also need to tune and measure stuff.
2024-07-30 14:27:41 +02:00