cmd/run: Optimize 'enter' and 'run' for already running containers, and turn IsToolboxContainer() into Container.IsToolbx() by debarshiray · Pull Request #1491 · containers/toolbox

debarshiray · 2024-05-16T15:23:13Z

softwarefactory-project-zuul · 2024-05-16T16:08:08Z

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/71dc5cb75937401896b684d41b607aa4

✔️ unit-test SUCCESS in 6m 55s
❌ unit-test-migration-path-for-coreos-toolbox FAILURE in 3m 39s
✔️ unit-test-restricted SUCCESS in 5m 41s
✔️ system-test-fedora-rawhide SUCCESS in 42m 21s
✔️ system-test-fedora-40 SUCCESS in 40m 34s
✔️ system-test-fedora-39 SUCCESS in 34m 17s
✔️ system-test-fedora-38 SUCCESS in 40m 40s

containers#1491

This makes it possible to confine the details of detecting a Toolbx container within the podman package, because it was not possible to use podman.IsToolboxContainer() when listing all the Toolbx containers. containers#1491

softwarefactory-project-zuul · 2024-05-16T19:19:37Z

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/a4431f5aa76b4d488edea5b135479c95

✔️ unit-test SUCCESS in 7m 06s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 27s
✔️ unit-test-restricted SUCCESS in 5m 58s
❌ system-test-fedora-rawhide FAILURE in 39m 33s
❌ system-test-fedora-40 FAILURE in 37m 26s
❌ system-test-fedora-39 FAILURE in 37m 22s
❌ system-test-fedora-38 FAILURE in 34m 09s

This makes it possible to confine the details of detecting a Toolbx container within the podman package, because it was not possible to use podman.IsToolboxContainer() when listing all the Toolbx containers. containers#1491

debarshiray · 2024-05-17T10:19:46Z

Build failed. https://softwarefactory-project.io/zuul/t/local/buildset/a4431f5aa76b4d488edea5b135479c95

❌ system-test-fedora-rawhide FAILURE in 39m 33s ❌ system-test-fedora-40 FAILURE in 37m 26s ❌ system-test-fedora-39 FAILURE in 37m 22s ❌ system-test-fedora-38 FAILURE in 34m 09s

TASK [Run system tests]
...
fedora-rawhide | 1..343
fedora-rawhide | # test suite: Set up
...
fedora-rawhide | not ok 179 rm: Try to remove a non-existent container in 462ms
fedora-rawhide | # (from function `assert_output' in file test/system/libs/bats-assert/src/assert.bash, line 255,
fedora-rawhide | #  in test file test/system/106-rm.bats, line 37)
fedora-rawhide | #   `assert_output "Error: failed to inspect container $container_name"' failed
fedora-rawhide | #
fedora-rawhide | # -- output differs --
fedora-rawhide | # expected : Error: failed to inspect container nonexistentcontainer
fedora-rawhide | # actual   : Error: failed to invoke podman(1)
fedora-rawhide | # --
fedora-rawhide | #

softwarefactory-project-zuul · 2024-05-17T10:20:58Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/dda1492433f7453d8757bac23d9af918

✔️ unit-test SUCCESS in 7m 05s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 30s
✔️ unit-test-restricted SUCCESS in 6m 04s
✔️ system-test-fedora-rawhide SUCCESS in 36m 45s
✔️ system-test-fedora-40 SUCCESS in 34m 25s
✔️ system-test-fedora-39 SUCCESS in 34m 41s
✔️ system-test-fedora-38 SUCCESS in 35m 21s

softwarefactory-project-zuul · 2024-05-17T11:05:47Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/9ba8497b17ed4da9892a9b40ea77b228

✔️ unit-test SUCCESS in 6m 42s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 31s
✔️ unit-test-restricted SUCCESS in 6m 02s
✔️ system-test-fedora-rawhide SUCCESS in 36m 20s
✔️ system-test-fedora-40 SUCCESS in 35m 59s
✔️ system-test-fedora-39 SUCCESS in 37m 07s
✔️ system-test-fedora-38 SUCCESS in 36m 24s

containers#1493

Fallout from 238f245 containers#1493

containers#1491

This makes it possible to confine the details of detecting a Toolbx container within the podman package, because it was not possible to use podman.IsToolboxContainer() when listing all the Toolbx containers. containers#1491

Currently, the 'enter' and 'run' commands always invoke 'podman start' even if the Toolbx container's entry point is already running. There's no need for that. The commands already invoke 'podman inspect' to find out if the org.freedesktop.Flatpak.SessionHelper D-Bus service needs to be started. Thus, they already have what is needed to find out if the container is stopped and 'podman start' is necessary before it can be used with 'podman exec', or if it's already running. The unconditional 'podman start' invocation was followed by a second 'podman inspect' invocation to find out if the 'podman start' managed to start the container's entry point. There's no need for this second 'podman inspect' either, just like the 'podman start', when it's already known from the first 'podman inspect' that the container is running. The extra 'podman start' and 'podman inspect' invocations are sufficiently expensive to add a noticeable overhead to the 'enter' and 'run' commands. It's common to use a container that's already running, just like having multiple terminals within the same working directory, and terminal emulation applications like Ptyxis try to make it easier to do so [1]. Therefore, it's worth optimizing this code path. [1] https://gitlab.gnome.org/chergert/ptyxis https://flathub.org/apps/app.devsuite.Ptyxis containers#1070

softwarefactory-project-zuul · 2024-05-19T22:14:16Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/d98ee29bf25744b08623c2ef997771c6

✔️ unit-test SUCCESS in 6m 50s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 22s
✔️ unit-test-restricted SUCCESS in 4m 46s
✔️ system-test-fedora-rawhide SUCCESS in 37m 35s
✔️ system-test-fedora-40 SUCCESS in 35m 46s
✔️ system-test-fedora-39 SUCCESS in 34m 38s
✔️ system-test-fedora-38 SUCCESS in 36m 02s

softwarefactory-project-zuul · 2024-05-19T22:56:47Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/8935764a9ede4fd8a17d41a1f7a8be81

✔️ unit-test SUCCESS in 6m 28s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 4m 06s
✔️ unit-test-restricted SUCCESS in 5m 54s
✔️ system-test-fedora-rawhide SUCCESS in 41m 28s
✔️ system-test-fedora-40 SUCCESS in 36m 03s
✔️ system-test-fedora-39 SUCCESS in 36m 25s
✔️ system-test-fedora-38 SUCCESS in 35m 34s

softwarefactory-project-zuul · 2024-05-20T08:16:59Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/dc12f0bb935340f9a0c7aec0d2dc1159

✔️ unit-test SUCCESS in 7m 02s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 23s
✔️ unit-test-restricted SUCCESS in 5m 47s
✔️ system-test-fedora-rawhide SUCCESS in 36m 12s
✔️ system-test-fedora-40 SUCCESS in 33m 54s
✔️ system-test-fedora-39 SUCCESS in 34m 38s
✔️ system-test-fedora-38 SUCCESS in 34m 22s

Currently, once a Toolbx container gets started with 'podman start', as part of the 'enter' or 'run' commands, it doesn't stop unless the host is shut down or someone explicitly calls 'podman stop'. This becomes annoying if someone tries to remove the container because commands like 'podman rm' and such don't work without the '--force' flag, even if all active 'enter' and 'run' sessions have ended, and the lingering entry points of those containers are can be considered a waste of resources. A system of reference counting based on advisory file locks has been used to automatically exit the container's entry point once all the active sessions have ended. Two locks are used - a global lock that's common for all containers, and a local lock that's specific to each container. The initialization stamp file is conveniently used as the local lock. The 'enter' and 'run' sessions acquire shared file locks and the container's entry point acquires ones that are exclusive. All attempts at acquiring the locks are blocking unless otherwise noted. The global lock is acquired at the beginning of 'enter' and 'run' before they inspect the container, negotiate the path to the local lock (ie., the initialization stamp file) with the entry point, and the local lock is created by the entry point. Once the local lock is known by 'enter' and 'run', they acquire it and only then release the global. The Toolbx container's entry point tries to acquire the global lock as it creates the initialization stamp file (ie., the local lock). This waits for the 'enter' and 'run' invocations to receive the location of the local lock, acquire it and release the global. Once the entry point acquires the global lock, it releases it, and waits trying to acquire the local lock. This sequence of acquiring and releasing the locks lets the entry point track the state of the 'enter' and 'run' invocations. It should only try to acquire the local lock after the 'enter' and 'run' invocations have acquired it before invoking 'podman exec'. The entry point is able to acquire the local lock after all 'enter' and 'run' sessions end and release their local locks. At this point, a new 'enter' or 'run' invocation might be in the process of starting. Both sides need to be careful not to race against each other and up in an invalid state. eg., a 'podman start' being invoked against a container whose entry point is just about to exit, or a 'podman exec' being invoked against a container whose entry point is about to exit or has already exited. Therefore, the entry point makes a non-blocking attempt to acquire the global lock while holding the local. If it fails, then it's because a new 'enter' or 'run' was invoked that is in the process of negotiating the path to the local lock with the entry point. In this case, the entry point releases the local lock and goes back trying to acquire the global lock, as it did when creating the initialization stamp file (ie., the local lock). If it succeeds, then no new 'enter' or 'run' is in the process of starting, and the entry point can exit. If this system of reference counting is simplified to just the global lock, then all the entry points of all Toolbx containers will exit only after all the 'enter' and 'run' sessions across all Toolbx containers have ended. The local lock makes it possible to do this for each container separately. This system will not work without the global lock. It will cause a few races if a new 'enter' or 'run' is invoked, just as the last of the previous batch of sessions end, letting the entry point acquire the local lock and prepare to exit. Sometimes, a Toolbx container's entry point is started directly with 'podman start', without going through the 'enter' or 'run' commands, for debugging. Care was taken to detect this case by making a non-blocking attempt to acquire the global lock from the entry point before creating the initialization stamp file (ie., the local lock). If it fails, then it's because an 'enter' or 'run' is waiting for the container to get initialized by the entry point, and things proceed as described above. If it succeeds, then it's because the entry point was started directly. In this case, the entry point releases the global lock, and adds a timeout after creating the initialization stamp file before trying to acquire any other locks to give the user time to invoke 'enter' or 'run'. A timeout of 25 seconds is used, as is the default for D-Bus method calls [1] and when waiting for the entry point to initialize the container. A variation of this system of reference counting can only use the advisory file locks in the 'enter' and 'run' commands, and invoke 'podman inspect --format {{.ExecIDs}} ...' after each 'podman exec' to find out if there are any remaining sessions [2]. This was not done because each podman(1) invocation is sufficiently expensive and there is a desire to keep them to minimum in the 'enter' and 'run' commands, because these are the most frequently used commands and users expect them to be as lean as possible [3,4]. A totally different approach could be to pass an AF_UNIX socket to the Toolbx container through the NOTIFY_SOCKET environment variable and 'podman create --sdnotify container ...', and do the reference counting by sending messages from the host to the entry point before and after each 'podman exec' [2]. One downside is that the reference counting will break if the host process crashes before sending the message to deduct the count after a 'podman exec' ends. Another downside is that it becomes complicated to directly call 'podman start', without going through the 'enter' or 'run' commands, for debugging. [1] https://docs.gtk.org/gio/property.DBusProxy.g-default-timeout.html [2] containers/podman#26589 [3] Commit 4536e2c containers@4536e2c8c28f6c4f containers#813 containers#654 [4] Commit 74d4fcf containers@74d4fcf00c6ec3d1 containers#1491 containers#1070 containers#114

debarshiray requested a review from martymichal as a code owner May 16, 2024 15:23

debarshiray changed the title ~~cmd/run: Optimize 'enter' and 'run' for already running containers, and turn IsToolboxContainer() into Container.IsToolbx()~~ [WIP] cmd/run: Optimize 'enter' and 'run' for already running containers, and turn IsToolboxContainer() into Container.IsToolbx() May 16, 2024

debarshiray marked this pull request as draft May 16, 2024 15:24

debarshiray added a commit to debarshiray/toolbox that referenced this pull request May 16, 2024

cmd/create: Style fixes

47be32d

containers#1491

debarshiray force-pushed the wip/rishi/cmd-pkg-podman-optimize-enter-run-is-toolbx branch from a33e656 to 47be32d Compare May 16, 2024 15:24

debarshiray added a commit to debarshiray/toolbox that referenced this pull request May 16, 2024

cmd/create: Style fixes

4578161

containers#1491

debarshiray force-pushed the wip/rishi/cmd-pkg-podman-optimize-enter-run-is-toolbx branch from 47be32d to 4ed3c72 Compare May 16, 2024 18:17

debarshiray force-pushed the wip/rishi/cmd-pkg-podman-optimize-enter-run-is-toolbx branch from 4ed3c72 to 13627cc Compare May 17, 2024 09:43

debarshiray changed the title ~~[WIP] cmd/run: Optimize 'enter' and 'run' for already running containers, and turn IsToolboxContainer() into Container.IsToolbx()~~ cmd/run: Optimize 'enter' and 'run' for already running containers, and turn IsToolboxContainer() into Container.IsToolbx() May 17, 2024

debarshiray marked this pull request as ready for review May 17, 2024 10:28

debarshiray mentioned this pull request May 17, 2024

Reduce toolbox enter start time #1070

Closed

debarshiray added 5 commits May 19, 2024 22:53

test/system: Test that old unsupported containers are correctly detected

defd838

containers#1493

cmd/run, test/system: Fix typo

af56286

Fallout from 238f245 containers#1493

cmd/create: Style fixes

c1d30f4

containers#1491

debarshiray force-pushed the wip/rishi/cmd-pkg-podman-optimize-enter-run-is-toolbx branch from a120b68 to c1d30f4 Compare May 19, 2024 21:35

debarshiray closed this in c1d30f4 May 20, 2024

debarshiray merged commit 74d4fcf into containers:main May 20, 2024

debarshiray deleted the wip/rishi/cmd-pkg-podman-optimize-enter-run-is-toolbx branch May 20, 2024 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/run: Optimize 'enter' and 'run' for already running containers, and turn IsToolboxContainer() into Container.IsToolbx()#1491

cmd/run: Optimize 'enter' and 'run' for already running containers, and turn IsToolboxContainer() into Container.IsToolbx()#1491
debarshiray merged 5 commits intocontainers:mainfrom
debarshiray:wip/rishi/cmd-pkg-podman-optimize-enter-run-is-toolbx

debarshiray commented May 16, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 16, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 16, 2024

Uh oh!

debarshiray commented May 17, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 17, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 17, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 19, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 19, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

debarshiray commented May 16, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 16, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 16, 2024

Uh oh!

debarshiray commented May 17, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 17, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 17, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 19, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 19, 2024

Uh oh!

softwarefactory-project-zuul bot commented May 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant