Fix symlink dirs #59

donald · 2023-10-24T14:38:18Z

Better handle files installed via existing symlinks ( e.g. /lib -> /usr/lib )

mariux64/mariux64-issues#91

The bee inventory file `/var/cache/bee/beecache/INVENTORY` contains the filenames of the files installed from bee. One problem is, that a file can have more than one name name because of symlinks. For example, we currently have a symlink `/usr/doc -> share/doc` and the inventory currently contains this entry: cunit-2.1_3-1.x86_64 1619448799 0 0 0100644 683 7eb1686ef4ebac0f4743c407da5aab52 /usr/doc/CUnit/CUnit_doc.css So the file is not registerd by its canonical name `/usr/share/doc/CUnit/CUnit_doc.css` but by the alias name `/usr/doc/CUnit/CUnit_doc.css`. This can happen, if a package is installed "through" a symlink in the system. The problem with multiple paths to the same file is that if one version of a package installs a file through one path and another version of a package installs the same file though another path, then the file is lost after "bee update". This is because all files from the old package, which are not registered by the new version of the same package (or any other installed package) are removed. While the removal works through a symlink, the protection by its single registered filename does not. To mitigate this problem, bee should protect the file itself during certain operations like `bee update`. It is not enough to protect name variant used in the inventory. We want to achieve this by translating the directory names used in the inventory to their canonical form (if possible). This patch adds a filter tool for the bee index format, which translates the filenames into a canonical form. Don't just use realpath(3) or canonicalize_file_name(3) for that, because this would be much to slow. These functions do readlink() for every path component of the provided name. As we have to do this for every file of the inventory, the system call usage and file system access would explode. Processing the inventory file this way, took more than two minutes. Instead, cache the results of readlink for a single invocation of the tool, so that the operation is done only once per file. Use another cache for complete translated path names to reduce the load to the readlink-cache. This way, the same result can be achived in 0.3 seconds. $ ls -ld /usr/doc lrwxrwxrwx 1 root root 9 Mar 7 2011 /usr/doc -> share/doc $ grep CUnit_doc.css /var/cache/bee/bee-cache/INVENTORY cunit-2.1_3-1.x86_64 1619448799 0 0 0100644 683 7eb1686ef4ebac0f4743c407da5aab52 /usr/doc/CUnit/CUnit_doc.css $ ./beeindextr /var/cache/bee/bee-cache/INVENTORY | grep CUnit_doc.css cunit-2.1_3-1.x86_64 1619448799 0 0 0100644 683 7eb1686ef4ebac0f4743c407da5aab52 /usr/share/doc/CUnit/CUnit_doc.css

donald · 2024-05-09T15:56:44Z

Oh, nice, the change unexpectedly also resolves 42 files not known to bee (wild files), because they they were installed over the /lib64 symlink.
Old:

buczek@claptrap:~$ ls -l /usr/lib/libwoff2common.so.1.0.2
-rwxr-xr-x 1 root root 33880 Apr 17  2018 /usr/lib/libwoff2common.so.1.0.2
buczek@claptrap:~$ fakeroot bee query /usr/lib/libwoff2common.so.1.0.2
buczek@claptrap:~$ fakeroot bee query libwoff2common.so.1.0.2
woff2-1.0.2-0.x86_64
  /usr/lib64/libwoff2common.so.1.0.2
  /usr/lib64/libwoff2common.so//libwoff2common.so.1.0.2

new:

buczek@dose:~/git/bee (fix-symlink-dirs)$ ls -l /usr/lib/libwoff2common.so.1.0.2
-rwxr-xr-x 1 root root 33880 Apr 17  2018 /usr/lib/libwoff2common.so.1.0.2
buczek@dose:~/git/bee (fix-symlink-dirs)$ fakeroot bee query /usr/lib/libwoff2common.so.1.0.2
woff2-1.0.2-0.x86_64
   /usr/lib/libwoff2common.so.1.0.2
buczek@dose:~/git/bee (fix-symlink-dirs)$

When "bee query PATTERN" is used to grep for installed files, use the inventory file and translate the filenames from the inventory to their canonical form, which is the real path, not the path used to install them, possibly following symlinks. The new code is a lot faster than the old one. The wall time for `fakeroot bee query otop` is reduced from 0m11.712s to 0m0.760s.

donald added the not ready for merge label Oct 24, 2023

donald force-pushed the fix-symlink-dirs branch from ccc5348 to 766f24e Compare October 24, 2023 20:31

donald force-pushed the fix-symlink-dirs branch 2 times, most recently from 7aac3ca to 6285724 Compare May 9, 2024 08:26

donald added 3 commits May 9, 2024 10:29

beeissue: Add generated script to .gitignore

317a5c0

bee-cache: Use beeindexstr

3cfe004

donald force-pushed the fix-symlink-dirs branch from 6285724 to 75e726a Compare May 9, 2024 11:31

donald added 2 commits May 14, 2024 16:05

bee-cache: Remove unsued "beecache grep" command

a015a23

donald force-pushed the fix-symlink-dirs branch from 54d1627 to a015a23 Compare May 14, 2024 14:07

donald removed the not ready for merge label May 14, 2024

donald merged commit 79a61e6 into master May 14, 2024

donald mentioned this pull request Jun 26, 2024

Still update problems and UsrMerge? #61

Closed

Fix symlink dirs #59

Fix symlink dirs #59

donald commented Oct 24, 2023 •

edited

Loading

donald commented May 9, 2024

Fix symlink dirs #59

Fix symlink dirs #59

Conversation

donald commented Oct 24, 2023 • edited Loading

donald commented May 9, 2024

donald commented Oct 24, 2023 •

edited

Loading