That *usually* doesn't take longer than a few milliseconds, but it can
take a few seconds for some extremely large dirs, on very slow computers
or with optimizations disabled. Better display a message than make it
seem as if ncdu has stopped doing anything.
And also adjust the graph width calculation to do a better job when the
largest item is smaller than the number of columns used for the graph,
which would previously draw either nothing (if size = 0) or a full bar
(if size > 0).
Fixes#172.
I'm tagging this as a "stable" 2.0 release because the 2.0-beta#
numbering will get confusing when I'm working on new features and fixes.
It's still only usable for people who can use the particular Zig version
that's required (0.9.0 currently) and it will certainly break on
different Zig versions. But once you have a working binary for a
supported arch, it's perfectly stable.
The --enable-* options also work for imported files, this fixes#120.
Most other options are not super useful on its own, but these will be
useful when there's a config file.
As aluded to in the previous commit. This approach keeps track of hard
links information much the same way as ncdu 1.16, with the main
difference being that the actual /counting/ of hard link sizes is
deferred until the scan is complete, thus allowing the use of a more
efficient algorithm and amortizing the counting costs.
As an additional benefit, the links listing in the information window
now doesn't need a full scan through the in-memory tree anymore.
A few memory usage benchmarks:
1.16 2.0-beta1 this commit
root: 429 162 164
backup: 3969 1686 1601
many links: 155 194 106
many links2*: 155 602 106
(I'm surprised my backup dir had enough hard links for this to be an
improvement)
(* this is the same as the "many links" benchmarks, but with a few
parent directories added to increase the tree depth. 2.0-beta1 doesn't
like that at all)
Performance-wise, refresh and delete operations can still be improved a
bit.
While this simplifies the code a bit, it's a regression in the sense
that it increases memory use.
This commit is yak shaving for another hard link counting approach I'd
like to try out, which should be a *LOT* less memory hungry compared to
the current approach. Even though it does, indeed, add an extra cost of
these parent node pointers.
I had planned to checkout out async functions here so I could avoid
recursing onto the stack alltogether, but it's still unclear to me how
to safely call into libc from async functions so let's wait for all that
to get fleshed out a bit more.
Sticking to "compiletime-known" error types will essentially just bring
in *every* possible error anyway, so might as well take advantage of
@errorName.
This complicated the scan code more than I had anticipated and has a
few inherent bugs with respect to calculating shared hardlink sizes.
Still, the merge approach avoids creating a full copy of the subtree, so
that's another memory usage related win compared to the C version.
On the other hand, it does leak memory if nodes can't be reused.
Not quite as well tested as I should have, so I'm sure there's bugs.
Two differences compared to the C version:
- You can now select individual paths in the listing, pressing enter
will open the selected path in the browser window.
- Creating this listing is much slower and requires, in the worst case,
a full traversal through the in-memory tree. I've tested this without
the same-dev and shared-parent optimizations (i.e. worst case) on an
import with 30M files and performance was still quite acceptable - the
listing completed in a second - so I didn't bother adding a loading
indicator. On slower systems and even larger trees this may be a
little annoying, though.
(also, calling nonl() apparently breaks detection of the return key,
neither \n nor KEY_ENTER are emitted for some reason)
The good news is: apart from this little thing, everything seems to just
work(tm) on FreeBSD. Think I had more trouble with C because of minor
header file differences.
I had used them as a HashSet with mutable keys already in order to avoid
padding problems. This is not always necessary anymore now that Zig's
new HashMap uses separate arrays for keys and values, but I still need
the HashSet trick for the link_count nodes table, as the key itself
would otherwise have padding.
Under the assumption that there are no external references to files
mentioned in the dump, i.e. a file's nlink count matches the number of
times the file occurs in the dump.
This machinery could also be used for regular scans, when you want to
scan an individual directory without caring about external hard links.
Maybe that should be the default, even? Not sure...
In a similar way to the C version of ncdu: by wrapping malloc(). It's
simpler to handle allocation failures at the source to allow for easy
retries, pushing the retries up the stack will complicate code somewhat
more. Likewise, this is a best-effort approach to handling OOM,
allocation failures in ncurses aren't handled and display glitches may
occur when we get an OOM inside a drawing function.
This is a somewhat un-Zig-like way of handling errors and adds
scary-looking 'catch unreachable's all over the code, but that's okay.
Performance is looking great, but the code is rather ugly and
potentially buggy. Also doesn't handle hard links without an "nlink"
field yet.
Error handling of the import code is different from what I've been doing
until now. That's intentional, I'll change error handling of other
pieces to call ui.die() directly rather than propagating error enums.
The approach is less testable but conceptually simpler, it's perfectly
fine for a tiny application like ncdu.
I plan to add more display options, but ran out of keys to bind.
Probably going for a quick-select menu thingy so that we can keep the
old key bindings for people accustomed to it.
The graph width algorithm is slightly different, but I think this one's
a minor improvement.
Now we're getting somewhere. This works surprisingly well, too. Existing
ncdu behavior is to remember which entry was previously selected but not
which entry was displayed at the top, so the view would be slightly
different when switching directories. This new approach remembers both
the entry and the offset.
I initially wanted to keep a directory's block count and size as a
separate field so that exporting an in-memory tree to a JSON dump would
be easier to do, but that doesn't seem like a common operation to
optimize for. We'll probably need the algorithms to subtract sub-items
from directory counts anyway, so such an export can still be
implemented, albeit slower.
libc locale-dependent APIs are pure madness, but I can't avoid them as
long as I use ncurses. libtickit seems like a much saner alternative (at
first glance), but no popular application seems to use it. :(
Eaiser to implement now that we're linking against libc.
But exclude pattern matching is extremely slow, so that should really be
rewritten with a custom fnmatch implementation. It's exactly as slow as
in ncdu 1.x as well, I'm surprised nobody's complained about it yet.
And while I'm at it, supporting .gitignore-style patterns would be
pretty neat, too.
I tried playing with zbox (pure Zig termbox-like lib) for a bit, but I
don't think I want to have to deal with the terminal support issues that
will inevitably come with it. I already stumbled upon one myself: it
doesn't properly put the terminal in a sensible state after cleanup in
tmux. As much as I dislike ncurses, it /is/ ubiquitous and tends to kind
of work.
The new data model is supposed to solve a few problems with ncdu 1.x's
'struct dir':
- Reduce memory overhead,
- Fix extremely slow counting of hard links in some scenarios
(issue #121)
- Add support for counting 'shared' data with other directories
(issue #36)
Quick memory usage comparison of my root directory with ~3.5 million
files (normal / extended mode):
ncdu 1.15.1: 379M / 451M
new (unaligned): 145M / 178M
new (aligned): 155M / 200M
There's still a /lot/ of to-do's left before this is usable, however,
and there's a bunch of issues I haven't really decided on yet, such as
which TUI library to use.
Backporting this data model to the C version of ncdu is also possible,
but somewhat painful. Let's first see how far I get with Zig.
Reduces memory by a tiny bit. Arguably we never needed tombstones
because entries are never removed, so there shouldn't be any performance
hit there. We don't even need a 'used' flag either, considering that can
be represented by a NULL value, but I'm not really up for
implementing/modifying my own hash table.
ref: https://attractivechaos.wordpress.com/2019/12/28/deletion-from-hash-tables-without-tombstones/
This is a best-effort approach to save ncdu state when memory is low.
There's likely allocation in libraries that isn't being checked
(ncurses, printf).
Fixes#132 (it actually doesn't, that needs a 64bit static binary too,
but I'll get to that)
This allocation is currently leaked, but as long as we don't allocate
new ones for each refresh, that shouldn't be much of an issue.
(cherry picked from commit 9dc2d32a8f)
This adds an 'm' command to show the latest modified time of all files
in a directory. The 'M' command allows for ascending and descending
mtime sorting. These are only enabled with the -e flag and overload
the dir_ext mtime field.
I had taken care to not sort empty directories during dirlist_open(),
but forgot that manual user actions can still cause dirlist_set_sort()
to be called, which does not handle empty directories.
Reported by Alex Wilson.
'char' may be unsigned on some architectures, which will cause the
"overflow check" on decrement to fail.
This would at most result in a confusing UI issue where no confirmation
option appears to be selected.
Unfortunately, there wasn't a single bit free in struct dir.flags, so I
had to increase its size to 16 bit. This commit is just the initial
preparation, there's still a few things to do:
- Add "extended information" cli flag to enable/disable this
functionality.
- Export and import extended information when requested
- Do something with the data.
I also did a few memory measurements on a file list with 12769842 items:
before this commit: 1.239 GiB
without extended info: 1.318 GiB
with extended info: 1.698 GiB
It's surprising what adding a single byte to a struct can do to the
memory usage. :(
Fixes https://dev.yorhel.nl/ncdu/bug/103
I don't think a stack overflow as a result of recursion is exploitable
on a modern system. It should just result in an unfortunate write to a
page that is not writable, followed by a crash.
I've decided not to use ls-like file name coloring for now, instead just
coloring the difference between a (regular) file and a dir.
Still looking for a good color scheme for light backgrounds.
This should fix https://dev.yorhel.nl/ncdu/bug/99 - with the downside
that this requires a C99 compiler.
I also replaced all occurrences of static allocation of struct dir with
use dynamic allocation, because I wasn't really sure if static
allocation of flexible structs is allowed. In the case of dirlist.c the
dynamic allocation is likely required anyway, because it does store a
few bytes in the name field.
TODO:
- Add (ls-like) colors to the actual file names
-> Implement full $LS_COLORS handling or something simple and custom?
- Test on a white/black terminal, and provide an alternate color scheme
if necessary.
- Make colors opt-in?
Check if the environment variable NCDU_SHELL is defined before the SHELL
variable is checked. This makes it possible to specify a program to
execute when 'b' is pressed. Setting SHELL to for example "mc" (Midnight
Commander) didn't work because mc already uses SHELL to execute
commands.
The check for the system() exit status is slightly problematic, because
bash returns the status code of the last command it executed. I've set
it to only check for status code 127 now (command not found) in order to
at least provide a message when the $SHELL command can't be found. This
error can still be triggered when executing a nonexistant command within
the shell and then exiting.
Key 'b' in the browse window spawns a shell in the current directoy.
We first check the $SHELL environment variable of the user for the preferred
shell interpreter. If it's not set, we fall back to the compile time
configured default shell (usually /bin/bash).
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Turns out that being able to open an empty directory actually has its
uses:
- If you delete the last file in a directory, you now won't be directed
to the parent directory anymore. This allows keeping 'd' pressed
without worrying that you'll delete stuff outside of the current dir.
(This is the primary motivation for doing this)
- You can now scan and later refresh an empty directory, as suggested by
#2 in http://dev.yorhel.nl/ncdu/bug/15