Commit graph

544 commits

Author SHA1 Message Date
Yorhel
28d9eaecab Version 2.6 2024-09-27 10:49:22 +02:00
Yorhel
61d7fc8473 man: Mention new flags in the synopsis 2024-09-27 10:40:13 +02:00
Yorhel
e142d012f0 Man page updates
Haven't mentioned the new -O flag in the examples section yet. Let's
first keep it as a slightly lower-profile feature while the format gains
wider testing and adoption.
2024-09-26 15:07:09 +02:00
Yorhel
39517c01a8 Remove kernfs dev id cache
Kernfs checking was previously done for every directory scanned, but the
new parallel scanning code only performs the check when the dev id is
different from parent, which isn't nearly as common.
(In fact, in typical scenarios this only ever happens once per dev id,
rendering the cache completely useless. But even people will 10k bind
mounts are unlikely to notice a performance impact)
2024-08-25 09:29:41 +02:00
Yorhel
cc26ead5f8 Fix integer overflow and off-by-one in binfmt itemref parsing 2024-08-23 09:53:33 +02:00
Yorhel
ca46c7241f Fix off-by-one in binfmt reader 2024-08-18 08:38:57 +02:00
Yorhel
e324804cdd Strip stack unwinding info from static binaries
Saves another 70k on the x86_64 binary, more on x86.

None of the included C or Zig code will unwind the stack at any point,
so these sections seem pretty useless.
2024-08-11 16:26:40 +02:00
Yorhel
26229d7a63 binfmt: Remove "rawlen" field, require use of ZSTD_getFrameContentSize()
The zstd frame format already supports this functionality and I don't
really see a benefit in not making use of that.
2024-08-11 15:56:14 +02:00
Yorhel
4ef9c3e817 bin_export: Adaptively adjust block size 2024-08-11 10:56:43 +02:00
Yorhel
c30699f93b Track which extended mode fields we have + bugfixes
This prevents displaying invalid zero values or writing such values out
in JSON/bin exports. Very old issue, actually, but with the new binfmt
experiments it's finally started annoying me.
2024-08-09 18:32:47 +02:00
Yorhel
6b7983b2f5 binfmt: Support larger (non-data) block sizes
I realized that the 16 MiB limitation implied that the index block could
only hold ((2^24)-16)/8 =~ 2 mil data block pointers. At the default
64k data block size that means an export can only reference up to
~128 GiB of uncompressed data. That's pretty limiting.

This change increases the maximum size of the index block to 256 MiB,
supporting ~33 mil data block pointers and ~2 TiB of uncompressed data
with the default data block size.
2024-08-09 09:40:29 +02:00
Yorhel
9418079da3 binfmt: Remove CBOR-null-based padding hack
Seems like unnecessary complexity.
2024-08-09 09:19:27 +02:00
Yorhel
18f322c532 Throw an error when running out of DevId numbers
I find it hard to imagine that this will happen on a real filesystem,
but it can be triggered by a malicious export file. Better protect
against that than invoke undefined behavior.
2024-08-08 15:38:56 +02:00
Yorhel
252f7fc253 Use u64 for item counts in binary export
They're still clamped to a u32 upon reading, but at least the file will
have correct counts and can be read even when it exceeds 4.2 billion
items.
2024-08-08 11:37:55 +02:00
Yorhel
49ef7cc34e Add progress indicator to loadDir()
Ugh browser.zig is becoming such a complex mess.
2024-08-07 10:39:29 +02:00
Yorhel
17e384b485 Disable refresh, delete and link list when reading from file
TODO: Add an option to re-enable these features by importing the file
into RAM?
2024-08-07 09:44:21 +02:00
Yorhel
ad166de925 Fix hardlink counting off-by-one in binary export 2024-08-06 14:51:42 +02:00
Yorhel
22dca22450 Add custom panic handler
Sadly, it doesn't seem to be called on segfaults, which means that will
still output garbage. I could install a custom segfault handler, but not
sure that's worth the effort.
2024-08-06 14:43:46 +02:00
Yorhel
30d6ddf149 Support direct browsing of a binary export
Code is more hacky than I prefer, but this approach does work and isn't
even as involved as I had anticipated.

Still a few known bugs and limitations left to resolve.
2024-08-06 09:50:10 +02:00
Yorhel
8fb2290d5e Fix division by zero in percent calculation
Broken in previous commit.
2024-08-05 07:07:46 +02:00
Yorhel
90b43755b8 Use integer formatting instead of floating points
This avoids embedding Zig's floating point formatting tables and
ancillary code, shaving 17k off the final static binary for x86_64.

Also adjusted the cut-off points for the units to be more precise.
2024-08-03 15:37:54 +02:00
Yorhel
8ad61e87c1 Stick with zstd-4 + 64k block, add --compress-level, fix 32bit build
And do dynamic buffer allocation for bin_export, removing 128k of
.rodata that I accidentally introduced earlier and reducing memory use
for parallel scans.

Static binaries now also include the minimal version of zstd, current
sizes for x86_64 are:

  582k ncdu-2.5
  601k ncdu-new-nocompress
  765k ncdu-new-zstd

That's not great, but also not awful. Even zlib or LZ4 would've resulted
in a 700k binary.
2024-08-03 13:16:44 +02:00
Yorhel
85e12beb1c Improve performance of bin format import by 30%
By calling die() instead of propagating error unions. Not surprising
that error propagation has a performance impact, but I was hoping it
wasn't this bad.

Import performance was already quite good, but now it's even better!
With the one test case I have it's faster than JSON import, but I expect
that some dir structures will be much slower.
2024-08-02 14:09:46 +02:00
Yorhel
025e5ee99e Add import function for the new binary format
This isn't the low-memory browsing experience I was hoping to implement,
yet, but it serves as a good way to test the new format and such a
sink-based import is useful to have anyway.

Performance is much better than I had expected, and I haven't even
profiled anything yet.
2024-08-02 14:03:30 +02:00
Yorhel
cd00ae50d1 refactor: Merge sink.Special and bin_export.ItemType into model.EType
Simplifies code a little bit and saves one whole byte off of file
entries.
2024-08-01 14:24:56 +02:00
Yorhel
5a0c8c6175 Add hardlink counting support for the new export format
This ended up a little different than I had originally planned.

The bad part is that my idea for the 'prevlnk' references wasn't going
to work out. For one because the reader has no efficient way to
determine the head reference of this list and implementing a lookup
table would be pretty costly and complex, and second because even with
those references working, they'd be pretty useless because there's no
way to go from an itemref to a full path. I don't see an easy way to
solve these problems, so I'm afraid the efficient hardlink list feature
will have to be disabled when reading from this new format. :(

The good news is that removing these references simplifies the hardlink
counting implementation and removes the requirement for a global inode
map and associated mutex. \o/

Performance is looking really good so far, too.
2024-08-01 07:32:38 +02:00
Yorhel
ebaa9b6a89 Add (temporary) compression support for the new export format
This is mainly for testing and benchmarking, I plan to choose a single
block size and compression library before release, to avoid bloating the
ncdu binary too much.

Currently this links against the system-provided zstd, zlib and lz4.
ncdubinexp.pl doesn't support compressed files yet.

Early benchmarks of `ncdu -f firefox-128.0.json` (407k files) with
different block sizes and compression options:

            bin8k        bin16k       bin32k       bin64k       bin128k      bin256k      bin512k      json
  algo      size  time   size  time   size  time   size  time   size  time   size  time   size  time   size  time

  none      16800  128   16760  126   16739  125   16728  124   16724  125   16722  124   16721  124   24835  127
  lz4        7844  143    7379  141    7033  140    6779  140    6689  138    6626  139    6597  139    5850  179

  zlib-1     6017  377    5681  310    5471  273    5345  262    5289  259    5257  256    5242  255    4415  164
  zlib-2     5843  386    5496  319    5273  284    5136  276    5072  271    5037  270    5020  268    4164  168
  zlib-3     5718  396    5361  339    5130  316    4977  321    4903  318    4862  324    4842  319    3976  196
  zlib-4     5536  424    5153  372    4903  341    4743  339    4665  338    4625  340    4606  336    3798  212
  zlib-5     5393  464    4993  419    4731  406    4561  414    4478  422    4434  426    4414  420    3583  261
  zlib-6     5322  516    4902  495    4628  507    4450  535    4364  558    4318  566    4297  564    3484  352
  zlib-7     5311  552    4881  559    4599  601    4417  656    4329  679    4282  696    4260  685    3393  473
  zlib-8     5305  588    4864  704    4568 1000    4374 1310    4280 1470    4230 1530    4206 1550    3315 1060
  zlib-9     5305  589    4864  704    4568 1030    4374 1360    4280 1510    4230 1600    4206 1620    3312 1230

  zstd-1     5845  177    5426  169    5215  165    5030  160    4921  156    4774  157    4788  153    3856  126
  zstd-2     5830  178    5424  170    5152  164    4963  161    4837  160    4595  162    4614  158    3820  134
  zstd-3     5683  187    5252  177    5017  172    4814  168    4674  169    4522  169    4446  170    3664  145
  zstd-4     5492  235    5056  230    4966  173    4765  170    4628  169    4368  222    4437  170    3656  145
  zstd-5     5430  270    4988  266    4815  234    4616  229    4485  224    4288  241    4258  223    3366  189
  zstd-6     5375  323    4928  322    4694  282    4481  279    4334  276    4231  275    4125  271    3234  235
  zstd-7     5322  400    4866  420    4678  319    4464  314    4315  312    4155  300    4078  295    3173  269
  zstd-8     5314  454    4848  689    4636  344    4420  346    4270  345    4137  350    4060  342    3082  330
  zstd-9     5320  567    4854  615    4596  392    4379  398    4228  401    4095  408    4060  345    3057  385
  zstd-10    5319  588    4852  662    4568  458    4350  466    4198  478    4066  491    4024  395    3005  489
  zstd-11    5310  975    4857 1040    4543  643    4318  688    4164  743    4030  803    3999  476    2967  627
  zstd-12    5171 1300    4692 1390    4539  699    4313  765    4154  854    4018  939    3999  478    2967  655
  zstd-13    5128 1760    4652 1880    4556 1070    4341 1130    4184 1230    3945 1490    3980  705    2932 1090
  zstd-14    5118 2040    4641 2180    4366 1540    4141 1620    3977 1780    3854 1810    3961  805    2893 1330

  mzstd-1    5845  206    5426  195    5215  188    5030  180    4921  176    4774  175    4788  172
  mzstd-2    5830  207    5424  196    5152  186    4963  183    4837  181    4765  178    4614  176
  mzstd-3    5830  207    5424  196    5150  187    4960  183    4831  180    4796  181    4626  180
  mzstd-4    5830  206    5427  196    5161  188    4987  185    4879  182    4714  180    4622  179
  mzstd-5    5430  347    4988  338    5161  189    4987  185    4879  181    4711  180    4620  180
  mzstd-6    5384  366    4939  359    4694  390    4481  391    4334  383    4231  399    4125  394
  mzstd-7    5328  413    4873  421    4694  390    4481  390    4334  385    4155  442    4078  435
  mzstd-8    5319  447    4854  577    4649  417    4434  421    4286  419    4155  440    4078  436
  mzstd-9    5349  386    4900  385    4606  469    4390  478    4241  478    4110  506    4078  436
  mzstd-10   5319  448    4853  597    4576  539    4360  560    4210  563    4079  597    4039  502
  mzstd-11   5430  349    4988  339    4606  468    4390  478    4241  478    4110  506    4013  590
  mzstd-12   5384  366    4939  361    4576  540    4360  556    4210  559    4079  597    4013  589
  mzstd-13   5349  387    4900  388    4694  390    4481  392    4334  386    4155  439    4078  436
  mzstd-14   5328  414    4873  420    4649  417    4434  424    4286  420    4155  444    4039  500

I'll need to do benchmarks on other directories, with hardlink support
and in extended mode as well to get more varied samples.

Another consideration in choosing a compression library is the size of
its implementation:

  zlib: 100k
  lz4:  106k
  zstd: 732k (regular), 165k (ZSTD_LIB_MINIFY, "mzstd" above)
2024-07-31 12:55:43 +02:00
Yorhel
f25bc5cbf4 Experimental new export format
The goals of this format being:
- Streaming parallel export with minimal mandatory buffering.
- Exported data includes cumulative directory stats, so reader doesn't
  have to go through the entire tree to calculate these.
- Fast-ish directory listings without reading the entire file.
- Built-in compression.

Current implementation is missing compression, hardlink counting and
actually reading the file. Also need to tune and measure stuff.
2024-07-30 14:27:41 +02:00
Yorhel
87d336baeb Add progress indicator to hardlink counting + fix import/mem UI updating 2024-07-28 10:54:58 +02:00
Yorhel
0a6bcee32b Fix counts when new link to existing inode is found on refresh
And the inode should not already have been in the directory before
refresh. Seems like a fairly obscure bug, but a bug nonetheless.
2024-07-28 10:35:59 +02:00
Yorhel
3c055810d0 Split mem import and json export out of sink.zig
Mainly to make room for another export format, though that'll take a lot
more experimenting before it'll get anywhere.
2024-07-27 11:58:08 +02:00
Yorhel
f6bffa40c7 Version 2.5 2024-07-24 14:07:17 +02:00
Yorhel
08d373881c Fix JSON export of "otherfs" excluded type
The exporter would write "othfs" while the import code was expecting
"otherfs". This bug also exists in the 1.x branch and is probably as old
as the JSON import/export feature. D'oh.

Normalized the export to use "otherfs" now (which is what all versions can
read correctly) and fixed the importer to also accept "othfs" (which
is what all previous versions exported).
2024-07-24 10:30:30 +02:00
Yorhel
dc42c91619 Fix JSON export of special entries 2024-07-24 07:34:12 +02:00
Yorhel
2b2b4473e5 mem_src: Fix setting nlink for non-hardlinks
That field is still used in sink.zig for total size estimation.
2024-07-23 10:51:21 +02:00
Yorhel
9cbe1bc91f Use slower and smaller heap sort for hardlink list
Saves 20 KiB off of the ReleaseSafe + stripped binary. That feature is
(1) rarely used and (2) rarely deals with large lists, so no point
spending that much space on an efficient sort implementation.
2024-07-18 21:43:20 +02:00
Yorhel
f28f69d831 README: Mention zig 0.13 as well 2024-07-18 10:53:55 +02:00
Yorhel
a5e57ee5ad Fix use of u64 atomic integers on 32-bit platforms 2024-07-18 10:53:27 +02:00
Yorhel
b0d4fbe94f Rename threading flag to -t,--threads + update man page 2024-07-18 07:49:41 +02:00
Yorhel
99f92934c6 Improve JSON export performance
When you improve performance in one part of the code, another part
becomes the new bottleneck. The slow JSON writer was very noticeable
with the parallel export option.

This provides a 20% improvement on total run-time when scanning a hot
directory with 8 threads.
2024-07-18 07:11:32 +02:00
Yorhel
9b517f27b1 Add support for multithreaded scanning to JSON export
by scanning into memory first.
2024-07-17 16:40:02 +02:00
Yorhel
705bd8907d Move nlink count from inode map into Link node
This adds another +4 bytes* to Link nodes, but allows for the in-memory
tree to be properly exported to JSON, which we'll need for multithreaded
export. It's also slightly nicer conceptually, as we can now detect
inconsistencies without throwing away the actual data, so have a better
chance of recovering on partial refresh. Still unlikely, anyway, but
whatever.

(* but saves 4+ bytes per unique inode in the inode map, so the memory
increase is only noticeable when links are repeated in the scanned tree.
Admittedly, that may be the common case)
2024-07-17 14:15:53 +02:00
Yorhel
e5508ba9b4 Fix OOM handling to be thread-safe 2024-07-17 11:48:58 +02:00
Yorhel
6bb31a4653 More consistent handling of directory read errors
These are now always added as a separate dir followed by setReadError().
JSON export can catch these cases when the error happens before any
entries are read, which is the common error scenario.
2024-07-17 09:09:04 +02:00
Yorhel
7558fd7f8e Re-add single-threaded JSON export
That was the easy part, next up is fixing multi-threaded JSON export.
2024-07-17 07:05:18 +02:00
Yorhel
1e56c8604e Improve JSON import performance by another 10%
Profiling showed that string parsing was a bottleneck. We rarely need
the full power of JSON strings, though, so we can optimize for the
common case of plain strings without escape codes. Keeping the slower
string parser as fallback, of course.
2024-07-16 17:36:39 +02:00
Yorhel
d2e8dd8a90 Reimplement JSON import + minor fixes
Previous import code did not correctly handle a non-empty directory with
the "read_error" flag set. I have no clue if that can ever happen in
practice, but at least ncdu 1.x can theoretically emit such JSON so we
handle it now.

Also fixes mtime display of "special" files. i.e. don't display the
mtime of the parent directory - that's confusing.

Split a generic-ish JSON parser out of the import code for easier
reasoning and implemented a few more performance improvements as well.
New code is ~30% faster in both ReleaseSafe and ReleaseFast.
2024-07-16 14:20:30 +02:00
Yorhel
ddbed8b07f Some fixes in mtime propagation and hardlink refresh counting 2024-07-15 11:00:14 +02:00
Yorhel
db51987446 Re-add hard link counting + parent suberror & stats propagation
Ended up turning the Links into a doubly-linked list, because the
current approach of refreshing a subdirectory makes it more likely to
run into problems with the O(n) removal behavior of singly-linked lists.

Also found a bug that was present in the old scanning code as well;
fixed here and in c41467f240.
2024-07-14 20:17:34 +02:00
Yorhel
cc12c90dbc Re-add scan progress UI + directory refreshing 2024-07-14 20:17:19 +02:00