Unfortunately, there wasn't a single bit free in struct dir.flags, so I
had to increase its size to 16 bit. This commit is just the initial
preparation, there's still a few things to do:
- Add "extended information" cli flag to enable/disable this
functionality.
- Export and import extended information when requested
- Do something with the data.
I also did a few memory measurements on a file list with 12769842 items:
before this commit: 1.239 GiB
without extended info: 1.318 GiB
with extended info: 1.698 GiB
It's surprising what adding a single byte to a struct can do to the
memory usage. :(
I've decided not to use ls-like file name coloring for now, instead just
coloring the difference between a (regular) file and a dir.
Still looking for a good color scheme for light backgrounds.
TODO:
- Add (ls-like) colors to the actual file names
-> Implement full $LS_COLORS handling or something simple and custom?
- Test on a white/black terminal, and provide an alternate color scheme
if necessary.
- Make colors opt-in?
I realized that I used addparentstats() with negative values when
removing stuff, so it had to be done this way (without rewriting
everything). It's a simple solution, anyway.
This mostly avoids the issue of getting negative sizes. It's still
possible to get a negative size after refresh or deletion, I'll get to
that in a bit.
2 billion files should be enough for everyone. You probably won't have
enough memory to scan such a filesystem. int is a better choice than
long, as sizeof(int) is 4 on pretty much any system where ncdu runs.
The architecture is explained in dir.h. The reasons for these changes is
two-fold:
- calc.c was too complex, it simply did too many things. 399ccdeb is a
nice example of that: Should have been an easy fix, but it introduced
a segfault (fixed in 0b49021a), and added a small memory leak.
- This architecture features a pluggable input/output system, which
should make a file export/import feature relatively simple.
The current commit does not feature any user interface, so there's no
feedback yet when scanning a directory. I'll get to that in a bit.
I've also not tested the new scanning code very well yet, so I might
have introduced some bugs.
The directory sizes are now incorrect as hard links will be counted
twice again (as if there wasn't any detection in the first place), but
this will get fixed by adding a shared size field.
This method of keeping track of hard links is a lot faster and allows
adding an interface which lists the found links.
Hard link detection is now done in a separate pass on the in-memory tree,
and duplicates can be 'removed' and 're-added' on the fly. When making any
changes in the tree, all hard links are re-added before the operation and
removed again afterwards.
While this guarantees that all hard link information is correct, it does
have a few drawbacks. I can currently think of two:
1. It's not the most efficient way to do it, and may be quite slow on
large trees. Will have to do some benchmarks later to see whether
it is anything to be concerned about.
2. The first encountered item is considered as 'counted' and all items
encountered after that are considered as 'duplicate'. Because the
order in which we traverse the tree doesn't always have to be the
same, the items that will be considered as 'duplicate' can vary with
each deletion or re-calculation. This might cause confusion for
people who aren't aware of how hard links work.