This isn't the low-memory browsing experience I was hoping to implement,
yet, but it serves as a good way to test the new format and such a
sink-based import is useful to have anyway.
Performance is much better than I had expected, and I haven't even
profiled anything yet.
The goals of this format being:
- Streaming parallel export with minimal mandatory buffering.
- Exported data includes cumulative directory stats, so reader doesn't
have to go through the entire tree to calculate these.
- Fast-ish directory listings without reading the entire file.
- Built-in compression.
Current implementation is missing compression, hardlink counting and
actually reading the file. Also need to tune and measure stuff.
Previous import code did not correctly handle a non-empty directory with
the "read_error" flag set. I have no clue if that can ever happen in
practice, but at least ncdu 1.x can theoretically emit such JSON so we
handle it now.
Also fixes mtime display of "special" files. i.e. don't display the
mtime of the parent directory - that's confusing.
Split a generic-ish JSON parser out of the import code for easier
reasoning and implemented a few more performance improvements as well.
New code is ~30% faster in both ReleaseSafe and ReleaseFast.
Benchmarks are looking very promising this time. This commit breaks a
lot, though:
- Hard link counting
- Refreshing
- JSON import
- JSON export
- Progress UI
- OOM handling is not thread-safe
All of which needs to be reimplemented and fixed again. Also haven't
really tested this code very well yet so there's likely to be bugs.
There's also a behavioral change: --exclude-kernfs is not checked on the
given root directory anymore, meaning that the filesystem the user asked
to scan is being scanned even if that's a 'kernfs'. I suspect that's
more sensible behavior.
The old scan.zig was quite messy and hard for me to reason about and
extend, this new sink API is looking to be less confusing. I hope it
stays that way as more features are added.
* rearrangment of entries in `std.os` and `std.c`, `std.posix`
finally extracted in https://github.com/ziglang/zig/pull/19354 .
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
Fixes these errors (introduced in https://github.com/ziglang/zig/pull/18017
and 6b1a823b2b ):
```
src/main.zig:290:13: error: local variable is never mutated
var line_ = line_fbs.getWritten();
^~~~~
src/main.zig:290:13: note: consider using 'const'
src/main.zig:450:17: error: local variable is never mutated
var path = std.fs.path.joinZ(allocator, &.{p, "ncdu", "config"}) catch unreachable;
^~~~
src/main.zig:450:17: note: consider using 'const'
...
```
Will be included in future Zig 0.12, this fix is backward compatible:
ncdu still builds and runs fine on Zig 0.11.0.
Signed-off-by: Eric Joldasov <bratishkaerik@getgoogleoff.me>
Behavioral changes:
- A single wildcard ('*') does not cross directory boundary anymore.
Previously 'a*b' would also match 'a/b', but no other tool that I am
aware of matches paths that way. This change breaks compatibility with
old exclude patterns but improves consistency with other tools.
- Patterns with a trailing '/' now prevent recursing into the directory.
Previously any directory excluded with such a pattern would show up as
a regular directory with all its contents excluded, but now the
directory entry itself shows up as excluded.
- If the path given to ncdu matches one of the exclude patterns, the old
implementation would exclude every file/dir being read, this new
implementation would instead ignore the rule. Not quite sure how to
best handle this case, perhaps just exit with an error message?
Performance wise, I haven't yet found a scenario where this
implementation is slower than the old one and it's *significantly*
faster in some cases - in particular when using a large amount of
patterns, especially with literal paths and file names.
That's not to say this implementation is anywhere near optimal:
- A list of relevant patterns is constructed for each directory being
scanned. It may be possible to merge pattern lists that share
the same prefix, which could both reduce memory use and the number of
patterns that need to be matched upon entering a directory.
- A hash table with dynamic arrays as values is just garbage from a
memory allocation point of view.
- This still uses libc fnmatch(), but there's an opportunity to
precompile patterns for faster matching.
And also adjust the graph width calculation to do a better job when the
largest item is smaller than the number of columns used for the graph,
which would previously draw either nothing (if size = 0) or a full bar
(if size > 0).
Fixes#172.
I'm tagging this as a "stable" 2.0 release because the 2.0-beta#
numbering will get confusing when I'm working on new features and fixes.
It's still only usable for people who can use the particular Zig version
that's required (0.9.0 currently) and it will certainly break on
different Zig versions. But once you have a working binary for a
supported arch, it's perfectly stable.