Turns out that zstd can consume compressed data without returning any
decompressed data when the input buffer isn't full enough. I just
increased the input buffer as a workaround.
Fixes#245
json/scanner.zig in std notes inconsistencies in the standard as to
whether unpaired surrogate halves are allowed. That implementation
disallows them and so does this commit.
This prevents displaying invalid zero values or writing such values out
in JSON/bin exports. Very old issue, actually, but with the new binfmt
experiments it's finally started annoying me.
This isn't the low-memory browsing experience I was hoping to implement,
yet, but it serves as a good way to test the new format and such a
sink-based import is useful to have anyway.
Performance is much better than I had expected, and I haven't even
profiled anything yet.
The goals of this format being:
- Streaming parallel export with minimal mandatory buffering.
- Exported data includes cumulative directory stats, so reader doesn't
have to go through the entire tree to calculate these.
- Fast-ish directory listings without reading the entire file.
- Built-in compression.
Current implementation is missing compression, hardlink counting and
actually reading the file. Also need to tune and measure stuff.
The exporter would write "othfs" while the import code was expecting
"otherfs". This bug also exists in the 1.x branch and is probably as old
as the JSON import/export feature. D'oh.
Normalized the export to use "otherfs" now (which is what all versions can
read correctly) and fixed the importer to also accept "othfs" (which
is what all previous versions exported).
Profiling showed that string parsing was a bottleneck. We rarely need
the full power of JSON strings, though, so we can optimize for the
common case of plain strings without escape codes. Keeping the slower
string parser as fallback, of course.
Previous import code did not correctly handle a non-empty directory with
the "read_error" flag set. I have no clue if that can ever happen in
practice, but at least ncdu 1.x can theoretically emit such JSON so we
handle it now.
Also fixes mtime display of "special" files. i.e. don't display the
mtime of the parent directory - that's confusing.
Split a generic-ish JSON parser out of the import code for easier
reasoning and implemented a few more performance improvements as well.
New code is ~30% faster in both ReleaseSafe and ReleaseFast.