This ended up a little different than I had originally planned.
The bad part is that my idea for the 'prevlnk' references wasn't going
to work out. For one because the reader has no efficient way to
determine the head reference of this list and implementing a lookup
table would be pretty costly and complex, and second because even with
those references working, they'd be pretty useless because there's no
way to go from an itemref to a full path. I don't see an easy way to
solve these problems, so I'm afraid the efficient hardlink list feature
will have to be disabled when reading from this new format. :(
The good news is that removing these references simplifies the hardlink
counting implementation and removes the requirement for a global inode
map and associated mutex. \o/
Performance is looking really good so far, too.
The goals of this format being:
- Streaming parallel export with minimal mandatory buffering.
- Exported data includes cumulative directory stats, so reader doesn't
have to go through the entire tree to calculate these.
- Fast-ish directory listings without reading the entire file.
- Built-in compression.
Current implementation is missing compression, hardlink counting and
actually reading the file. Also need to tune and measure stuff.
The exporter would write "othfs" while the import code was expecting
"otherfs". This bug also exists in the 1.x branch and is probably as old
as the JSON import/export feature. D'oh.
Normalized the export to use "otherfs" now (which is what all versions can
read correctly) and fixed the importer to also accept "othfs" (which
is what all previous versions exported).
Saves 20 KiB off of the ReleaseSafe + stripped binary. That feature is
(1) rarely used and (2) rarely deals with large lists, so no point
spending that much space on an efficient sort implementation.
When you improve performance in one part of the code, another part
becomes the new bottleneck. The slow JSON writer was very noticeable
with the parallel export option.
This provides a 20% improvement on total run-time when scanning a hot
directory with 8 threads.
This adds another +4 bytes* to Link nodes, but allows for the in-memory
tree to be properly exported to JSON, which we'll need for multithreaded
export. It's also slightly nicer conceptually, as we can now detect
inconsistencies without throwing away the actual data, so have a better
chance of recovering on partial refresh. Still unlikely, anyway, but
whatever.
(* but saves 4+ bytes per unique inode in the inode map, so the memory
increase is only noticeable when links are repeated in the scanned tree.
Admittedly, that may be the common case)
These are now always added as a separate dir followed by setReadError().
JSON export can catch these cases when the error happens before any
entries are read, which is the common error scenario.
Profiling showed that string parsing was a bottleneck. We rarely need
the full power of JSON strings, though, so we can optimize for the
common case of plain strings without escape codes. Keeping the slower
string parser as fallback, of course.
Previous import code did not correctly handle a non-empty directory with
the "read_error" flag set. I have no clue if that can ever happen in
practice, but at least ncdu 1.x can theoretically emit such JSON so we
handle it now.
Also fixes mtime display of "special" files. i.e. don't display the
mtime of the parent directory - that's confusing.
Split a generic-ish JSON parser out of the import code for easier
reasoning and implemented a few more performance improvements as well.
New code is ~30% faster in both ReleaseSafe and ReleaseFast.
Ended up turning the Links into a doubly-linked list, because the
current approach of refreshing a subdirectory makes it more likely to
run into problems with the O(n) removal behavior of singly-linked lists.
Also found a bug that was present in the old scanning code as well;
fixed here and in c41467f240.
Benchmarks are looking very promising this time. This commit breaks a
lot, though:
- Hard link counting
- Refreshing
- JSON import
- JSON export
- Progress UI
- OOM handling is not thread-safe
All of which needs to be reimplemented and fixed again. Also haven't
really tested this code very well yet so there's likely to be bugs.
There's also a behavioral change: --exclude-kernfs is not checked on the
given root directory anymore, meaning that the filesystem the user asked
to scan is being scanned even if that's a 'kernfs'. I suspect that's
more sensible behavior.
The old scan.zig was quite messy and hard for me to reason about and
extend, this new sink API is looking to be less confusing. I hope it
stays that way as more features are added.
* rearrangment of entries in `std.os` and `std.c`, `std.posix`
finally extracted in https://github.com/ziglang/zig/pull/19354 .
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
* ZBS was reorganized around `Module` struct:
https://www.github.com/ziglang/zig/pull/18160 .
* Changes for ReleaseSafe: error return tracing is now off by default.
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
* New `redundant inline keyword in comptime scope` error
introduced in https://github.com/ziglang/zig/pull/18227 .
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
Still not a fan of roff, but even less a fan of build system stuff and a
dependency on a tool that is getting less ubiquitous over time.
I've removed the "hard links" section from the man page for now. Such a
section might be useful, but much of it was outdated.
Fixes these errors (introduced in https://github.com/ziglang/zig/pull/18017
and 6b1a823b2b ):
```
src/main.zig:290:13: error: local variable is never mutated
var line_ = line_fbs.getWritten();
^~~~~
src/main.zig:290:13: note: consider using 'const'
src/main.zig:450:17: error: local variable is never mutated
var path = std.fs.path.joinZ(allocator, &.{p, "ncdu", "config"}) catch unreachable;
^~~~
src/main.zig:450:17: note: consider using 'const'
...
```
Will be included in future Zig 0.12, this fix is backward compatible:
ncdu still builds and runs fine on Zig 0.11.0.
Signed-off-by: Eric Joldasov <bratishkaerik@getgoogleoff.me>