Commit graph

20 commits

Author SHA1 Message Date
Yorhel
5929bf57cc Keep track of uncounted hard links to speed up refresh+delete operations 2021-07-28 20:12:50 +02:00
Yorhel
ba14c0938f Fix Dir.fmtPath() when given the root dir 2021-07-28 20:09:48 +02:00
Yorhel
0d314ca0ca Implement a more efficient hard link counting approach
As aluded to in the previous commit. This approach keeps track of hard
links information much the same way as ncdu 1.16, with the main
difference being that the actual /counting/ of hard link sizes is
deferred until the scan is complete, thus allowing the use of a more
efficient algorithm and amortizing the counting costs.

As an additional benefit, the links listing in the information window
now doesn't need a full scan through the in-memory tree anymore.

A few memory usage benchmarks:

              1.16  2.0-beta1  this commit
root:          429        162          164
backup:       3969       1686         1601
many links:    155        194          106
many links2*:  155        602          106

(I'm surprised my backup dir had enough hard links for this to be an
improvement)
(* this is the same as the "many links" benchmarks, but with a few
parent directories added to increase the tree depth. 2.0-beta1 doesn't
like that at all)

Performance-wise, refresh and delete operations can still be improved a
bit.
2021-07-28 10:35:56 +02:00
Yorhel
36bc405a69 Add parent node pointers to Dir struct + remove Parents abstraction
While this simplifies the code a bit, it's a regression in the sense
that it increases memory use.

This commit is yak shaving for another hard link counting approach I'd
like to try out, which should be a *LOT* less memory hungry compared to
the current approach. Even though it does, indeed, add an extra cost of
these parent node pointers.
2021-07-26 14:03:10 +02:00
Yorhel
c8636b8982 Add REUSE-compliant copyright headers 2021-07-18 11:50:50 +02:00
Yorhel
e9c8d12c0f Store Ext before Entry
Which is slightly simpler and should provide a minor performance
improvement.
2021-07-16 19:13:04 +02:00
Yorhel
6c2ab5001c Implement directory refresh
This complicated the scan code more than I had anticipated and has a
few inherent bugs with respect to calculating shared hardlink sizes.

Still, the merge approach avoids creating a full copy of the subtree, so
that's another memory usage related win compared to the C version.
On the other hand, it does leak memory if nodes can't be reused.

Not quite as well tested as I should have, so I'm sure there's bugs.
2021-07-13 13:45:08 +02:00
Yorhel
ff3e3bccc6 Add link path listing to information window
Two differences compared to the C version:
- You can now select individual paths in the listing, pressing enter
  will open the selected path in the browser window.
- Creating this listing is much slower and requires, in the worst case,
  a full traversal through the in-memory tree. I've tested this without
  the same-dev and shared-parent optimizations (i.e. worst case) on an
  import with 30M files and performance was still quite acceptable - the
  listing completed in a second - so I didn't bother adding a loading
  indicator. On slower systems and even larger trees this may be a
  little annoying, though.

(also, calling nonl() apparently breaks detection of the return key,
neither \n nor KEY_ENTER are emitted for some reason)
2021-07-06 18:33:31 +02:00
Yorhel
618972b82b Add item info window
Doesn't display the item's path anymore (seems rather redundant) but
adds a few more other fields.
2021-06-11 13:12:00 +02:00
Yorhel
40f9dff5d6 Update for Zig 0.8 HashMap changes
I had used them as a HashSet with mutable keys already in order to avoid
padding problems. This is not always necessary anymore now that Zig's
new HashMap uses separate arrays for keys and values, but I still need
the HashSet trick for the link_count nodes table, as the key itself
would otherwise have padding.
2021-06-07 10:57:30 +02:00
Yorhel
e6b2cff356 Support hard link counts when importing old ncdu dumps
Under the assumption that there are no external references to files
mentioned in the dump, i.e. a file's nlink count matches the number of
times the file occurs in the dump.

This machinery could also be used for regular scans, when you want to
scan an individual directory without caring about external hard links.
Maybe that should be the default, even? Not sure...
2021-06-01 13:00:58 +02:00
Yorhel
59ef5fd27b Improved error reporting + minor cleanup 2021-05-29 19:22:00 +02:00
Yorhel
2390308883 Handle allocation failures
In a similar way to the C version of ncdu: by wrapping malloc(). It's
simpler to handle allocation failures at the source to allow for easy
retries, pushing the retries up the stack will complicate code somewhat
more. Likewise, this is a best-effort approach to handling OOM,
allocation failures in ncurses aren't handled and display glitches may
occur when we get an OOM inside a drawing function.

This is a somewhat un-Zig-like way of handling errors and adds
scary-looking 'catch unreachable's all over the code, but that's okay.
2021-05-29 13:18:23 +02:00
Yorhel
c077c5bed5 Implement JSON file import
Performance is looking great, but the code is rather ugly and
potentially buggy. Also doesn't handle hard links without an "nlink"
field yet.

Error handling of the import code is different from what I've been doing
until now. That's intentional, I'll change error handling of other
pieces to call ui.die() directly rather than propagating error enums.
The approach is less testable but conceptually simpler, it's perfectly
fine for a tiny application like ncdu.
2021-05-29 10:54:45 +02:00
Yorhel
9474aa4329 Only keep total_items + Zig test update + pointless churn 2021-05-24 11:02:26 +02:00
Yorhel
7b3ebf9241 Implement all existing browsing display options + some fixes
I plan to add more display options, but ran out of keys to bind.
Probably going for a quick-select menu thingy so that we can keep the
old key bindings for people accustomed to it.

The graph width algorithm is slightly different, but I think this one's
a minor improvement.
2021-05-23 17:34:40 +02:00
Yorhel
27cb599e22 More UI stuff + shave off 16 bytes from model.Dir
I initially wanted to keep a directory's block count and size as a
separate field so that exporting an in-memory tree to a JSON dump would
be easier to do, but that doesn't seem like a common operation to
optimize for. We'll probably need the algorithms to subtract sub-items
from directory counts anyway, so such an export can still be
implemented, albeit slower.
2021-05-06 19:20:55 +02:00
Yorhel
3e27d37012 Correct int truncating/saturating + avoid one toPosixPath() 2021-05-01 11:10:24 +02:00
Yorhel
097f49d9e6 Fix some scanning bugs + support --exclude-caches and --follow-symlinks
Supporting kernfs checking is going to be a bit more annoying.
And so is exclude patterns. Ugh.
2021-04-30 19:15:29 +02:00
Yorhel
0783d35793 WIP: Experimenting with a rewrite to Zig & a new data model
The new data model is supposed to solve a few problems with ncdu 1.x's
'struct dir':
- Reduce memory overhead,
- Fix extremely slow counting of hard links in some scenarios
  (issue #121)
- Add support for counting 'shared' data with other directories
  (issue #36)

Quick memory usage comparison of my root directory with ~3.5 million
files (normal / extended mode):

  ncdu 1.15.1:     379M / 451M
  new (unaligned): 145M / 178M
  new (aligned):   155M / 200M

There's still a /lot/ of to-do's left before this is usable, however,
and there's a bunch of issues I haven't really decided on yet, such as
which TUI library to use.

Backporting this data model to the C version of ncdu is also possible,
but somewhat painful. Let's first see how far I get with Zig.
2021-04-29 12:48:52 +02:00