I'm tagging this as a "stable" 2.0 release because the 2.0-beta#
numbering will get confusing when I'm working on new features and fixes.
It's still only usable for people who can use the particular Zig version
that's required (0.9.0 currently) and it will certainly break on
different Zig versions. But once you have a working binary for a
supported arch, it's perfectly stable.
The --enable-* options also work for imported files, this fixes#120.
Most other options are not super useful on its own, but these will be
useful when there's a config file.
As aluded to in the previous commit. This approach keeps track of hard
links information much the same way as ncdu 1.16, with the main
difference being that the actual /counting/ of hard link sizes is
deferred until the scan is complete, thus allowing the use of a more
efficient algorithm and amortizing the counting costs.
As an additional benefit, the links listing in the information window
now doesn't need a full scan through the in-memory tree anymore.
A few memory usage benchmarks:
1.16 2.0-beta1 this commit
root: 429 162 164
backup: 3969 1686 1601
many links: 155 194 106
many links2*: 155 602 106
(I'm surprised my backup dir had enough hard links for this to be an
improvement)
(* this is the same as the "many links" benchmarks, but with a few
parent directories added to increase the tree depth. 2.0-beta1 doesn't
like that at all)
Performance-wise, refresh and delete operations can still be improved a
bit.
While this simplifies the code a bit, it's a regression in the sense
that it increases memory use.
This commit is yak shaving for another hard link counting approach I'd
like to try out, which should be a *LOT* less memory hungry compared to
the current approach. Even though it does, indeed, add an extra cost of
these parent node pointers.
It's a bit ugly, but appears to work. I've not tested the 32bit arm
version, but the others run.
The static binaries are about twice as large as the ncdu 1.x
counterparts.
I had planned to checkout out async functions here so I could avoid
recursing onto the stack alltogether, but it's still unclear to me how
to safely call into libc from async functions so let's wait for all that
to get fleshed out a bit more.
Sticking to "compiletime-known" error types will essentially just bring
in *every* possible error anyway, so might as well take advantage of
@errorName.
This complicated the scan code more than I had anticipated and has a
few inherent bugs with respect to calculating shared hardlink sizes.
Still, the merge approach avoids creating a full copy of the subtree, so
that's another memory usage related win compared to the C version.
On the other hand, it does leak memory if nodes can't be reused.
Not quite as well tested as I should have, so I'm sure there's bugs.
Two differences compared to the C version:
- You can now select individual paths in the listing, pressing enter
will open the selected path in the browser window.
- Creating this listing is much slower and requires, in the worst case,
a full traversal through the in-memory tree. I've tested this without
the same-dev and shared-parent optimizations (i.e. worst case) on an
import with 30M files and performance was still quite acceptable - the
listing completed in a second - so I didn't bother adding a loading
indicator. On slower systems and even larger trees this may be a
little annoying, though.
(also, calling nonl() apparently breaks detection of the return key,
neither \n nor KEY_ENTER are emitted for some reason)
The good news is: apart from this little thing, everything seems to just
work(tm) on FreeBSD. Think I had more trouble with C because of minor
header file differences.
I had used them as a HashSet with mutable keys already in order to avoid
padding problems. This is not always necessary anymore now that Zig's
new HashMap uses separate arrays for keys and values, but I still need
the HashSet trick for the link_count nodes table, as the key itself
would otherwise have padding.
Under the assumption that there are no external references to files
mentioned in the dump, i.e. a file's nlink count matches the number of
times the file occurs in the dump.
This machinery could also be used for regular scans, when you want to
scan an individual directory without caring about external hard links.
Maybe that should be the default, even? Not sure...