Ended up turning the Links into a doubly-linked list, because the
current approach of refreshing a subdirectory makes it more likely to
run into problems with the O(n) removal behavior of singly-linked lists.
Also found a bug that was present in the old scanning code as well;
fixed here and in c41467f240.
Benchmarks are looking very promising this time. This commit breaks a
lot, though:
- Hard link counting
- Refreshing
- JSON import
- JSON export
- Progress UI
- OOM handling is not thread-safe
All of which needs to be reimplemented and fixed again. Also haven't
really tested this code very well yet so there's likely to be bugs.
There's also a behavioral change: --exclude-kernfs is not checked on the
given root directory anymore, meaning that the filesystem the user asked
to scan is being scanned even if that's a 'kernfs'. I suspect that's
more sensible behavior.
The old scan.zig was quite messy and hard for me to reason about and
extend, this new sink API is looking to be less confusing. I hope it
stays that way as more features are added.
* rearrangment of entries in `std.os` and `std.c`, `std.posix`
finally extracted in https://github.com/ziglang/zig/pull/19354 .
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
* New `redundant inline keyword in comptime scope` error
introduced in https://github.com/ziglang/zig/pull/18227 .
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
Fixes these errors (introduced in https://github.com/ziglang/zig/pull/18017
and 6b1a823b2b ):
```
src/main.zig:290:13: error: local variable is never mutated
var line_ = line_fbs.getWritten();
^~~~~
src/main.zig:290:13: note: consider using 'const'
src/main.zig:450:17: error: local variable is never mutated
var path = std.fs.path.joinZ(allocator, &.{p, "ncdu", "config"}) catch unreachable;
^~~~
src/main.zig:450:17: note: consider using 'const'
...
```
Will be included in future Zig 0.12, this fix is backward compatible:
ncdu still builds and runs fine on Zig 0.11.0.
Signed-off-by: Eric Joldasov <bratishkaerik@getgoogleoff.me>
With a little help from IRC:
<ifreund> Ayo: its probaly stupidly copying that array to the stack to do the
safety check, pretty sure there's an open issue on this still
<ifreund> you may be able to work around the compiler's stupidity by using a
pointer to the array or slice or something
<Ayo> ifreund: Yup, (&self.rdbuf)[self.rdoff] does the trick, thanks.
<ifreund> no problem! should get fixed eventually
Still using a few embedded packed structs for those fields that benefit
from bit packing. This isn't much cleaner than using packed structs for
everything, but it does have better semantics. In particular, all fields
(except those inside nested packed structs) are now guaranteed to be
byte-aligned and I don't have to worry about the memory representation
of integers when pointer-casting between the different Entry types.
Zig requires alignment to be specified when specifying a fill character,
as otherwise digits specified after ':' are interpreted as part of the
field width.
The missing alignment specifier caused character codes < 0x10 to be
serialized incorrectly, producing an export file ncdu could not import.
For example, a character with code 1 would be serialized as '\u00 1'
instead of '\u0001'.
A directory of test files can be generated using:
mkdir test_files; i=1; while [ $i -le 255 ]; do c="$(printf "$(printf "\\\\x%02xZ" "$i")")"; c="${c%Z}"; touch "test_files/$c"; i=$((i+1)); done
Behavioral changes:
- A single wildcard ('*') does not cross directory boundary anymore.
Previously 'a*b' would also match 'a/b', but no other tool that I am
aware of matches paths that way. This change breaks compatibility with
old exclude patterns but improves consistency with other tools.
- Patterns with a trailing '/' now prevent recursing into the directory.
Previously any directory excluded with such a pattern would show up as
a regular directory with all its contents excluded, but now the
directory entry itself shows up as excluded.
- If the path given to ncdu matches one of the exclude patterns, the old
implementation would exclude every file/dir being read, this new
implementation would instead ignore the rule. Not quite sure how to
best handle this case, perhaps just exit with an error message?
Performance wise, I haven't yet found a scenario where this
implementation is slower than the old one and it's *significantly*
faster in some cases - in particular when using a large amount of
patterns, especially with literal paths and file names.
That's not to say this implementation is anywhere near optimal:
- A list of relevant patterns is constructed for each directory being
scanned. It may be possible to merge pattern lists that share
the same prefix, which could both reduce memory use and the number of
patterns that need to be matched upon entering a directory.
- A hash table with dynamic arrays as values is just garbage from a
memory allocation point of view.
- This still uses libc fnmatch(), but there's an opportunity to
precompile patterns for faster matching.
I wasn't planning on (publicly) keeping up with Zig master before the
next release, but it's looking like 0.10 will mainly focus on the new
stage2 compiler and there might not be any significant language/stdlib
changes. If that's the case, might as well pull in this little change in
order to increase chances of ncdu working out of the box when 0.10 is
out.
While it's true that the root item can't be a special, the first item to
be added is not necessarily the root item. In particular, it isn't when
refreshing.
Probably fixes#194
That *usually* doesn't take longer than a few milliseconds, but it can
take a few seconds for some extremely large dirs, on very slow computers
or with optimizations disabled. Better display a message than make it
seem as if ncdu has stopped doing anything.
And also adjust the graph width calculation to do a better job when the
largest item is smaller than the number of columns used for the graph,
which would previously draw either nothing (if size = 0) or a full bar
(if size > 0).
Fixes#172.
I'm tagging this as a "stable" 2.0 release because the 2.0-beta#
numbering will get confusing when I'm working on new features and fixes.
It's still only usable for people who can use the particular Zig version
that's required (0.9.0 currently) and it will certainly break on
different Zig versions. But once you have a working binary for a
supported arch, it's perfectly stable.
The --enable-* options also work for imported files, this fixes#120.
Most other options are not super useful on its own, but these will be
useful when there's a config file.
As aluded to in the previous commit. This approach keeps track of hard
links information much the same way as ncdu 1.16, with the main
difference being that the actual /counting/ of hard link sizes is
deferred until the scan is complete, thus allowing the use of a more
efficient algorithm and amortizing the counting costs.
As an additional benefit, the links listing in the information window
now doesn't need a full scan through the in-memory tree anymore.
A few memory usage benchmarks:
1.16 2.0-beta1 this commit
root: 429 162 164
backup: 3969 1686 1601
many links: 155 194 106
many links2*: 155 602 106
(I'm surprised my backup dir had enough hard links for this to be an
improvement)
(* this is the same as the "many links" benchmarks, but with a few
parent directories added to increase the tree depth. 2.0-beta1 doesn't
like that at all)
Performance-wise, refresh and delete operations can still be improved a
bit.
While this simplifies the code a bit, it's a regression in the sense
that it increases memory use.
This commit is yak shaving for another hard link counting approach I'd
like to try out, which should be a *LOT* less memory hungry compared to
the current approach. Even though it does, indeed, add an extra cost of
these parent node pointers.
I had planned to checkout out async functions here so I could avoid
recursing onto the stack alltogether, but it's still unclear to me how
to safely call into libc from async functions so let's wait for all that
to get fleshed out a bit more.
Sticking to "compiletime-known" error types will essentially just bring
in *every* possible error anyway, so might as well take advantage of
@errorName.
This complicated the scan code more than I had anticipated and has a
few inherent bugs with respect to calculating shared hardlink sizes.
Still, the merge approach avoids creating a full copy of the subtree, so
that's another memory usage related win compared to the C version.
On the other hand, it does leak memory if nodes can't be reused.
Not quite as well tested as I should have, so I'm sure there's bugs.
Two differences compared to the C version:
- You can now select individual paths in the listing, pressing enter
will open the selected path in the browser window.
- Creating this listing is much slower and requires, in the worst case,
a full traversal through the in-memory tree. I've tested this without
the same-dev and shared-parent optimizations (i.e. worst case) on an
import with 30M files and performance was still quite acceptable - the
listing completed in a second - so I didn't bother adding a loading
indicator. On slower systems and even larger trees this may be a
little annoying, though.
(also, calling nonl() apparently breaks detection of the return key,
neither \n nor KEY_ENTER are emitted for some reason)
The good news is: apart from this little thing, everything seems to just
work(tm) on FreeBSD. Think I had more trouble with C because of minor
header file differences.
I had used them as a HashSet with mutable keys already in order to avoid
padding problems. This is not always necessary anymore now that Zig's
new HashMap uses separate arrays for keys and values, but I still need
the HashSet trick for the link_count nodes table, as the key itself
would otherwise have padding.
Under the assumption that there are no external references to files
mentioned in the dump, i.e. a file's nlink count matches the number of
times the file occurs in the dump.
This machinery could also be used for regular scans, when you want to
scan an individual directory without caring about external hard links.
Maybe that should be the default, even? Not sure...