cmd/run: Optimize 'enter' and 'run' for already running containers, and turn IsToolboxContainer() into Container.IsToolbx()#1491
Conversation
a33e656 to
47be32d
Compare
|
Build failed. ✔️ unit-test SUCCESS in 6m 55s |
This makes it possible to confine the details of detecting a Toolbx container within the podman package, because it was not possible to use podman.IsToolboxContainer() when listing all the Toolbx containers. containers#1491
47be32d to
4ed3c72
Compare
|
Build failed. ✔️ unit-test SUCCESS in 7m 06s |
This makes it possible to confine the details of detecting a Toolbx container within the podman package, because it was not possible to use podman.IsToolboxContainer() when listing all the Toolbx containers. containers#1491
4ed3c72 to
13627cc
Compare
|
|
Build succeeded. ✔️ unit-test SUCCESS in 7m 05s |
|
Build succeeded. ✔️ unit-test SUCCESS in 6m 42s |
Fallout from 238f245 containers#1493
This makes it possible to confine the details of detecting a Toolbx container within the podman package, because it was not possible to use podman.IsToolboxContainer() when listing all the Toolbx containers. containers#1491
Currently, the 'enter' and 'run' commands always invoke 'podman start' even if the Toolbx container's entry point is already running. There's no need for that. The commands already invoke 'podman inspect' to find out if the org.freedesktop.Flatpak.SessionHelper D-Bus service needs to be started. Thus, they already have what is needed to find out if the container is stopped and 'podman start' is necessary before it can be used with 'podman exec', or if it's already running. The unconditional 'podman start' invocation was followed by a second 'podman inspect' invocation to find out if the 'podman start' managed to start the container's entry point. There's no need for this second 'podman inspect' either, just like the 'podman start', when it's already known from the first 'podman inspect' that the container is running. The extra 'podman start' and 'podman inspect' invocations are sufficiently expensive to add a noticeable overhead to the 'enter' and 'run' commands. It's common to use a container that's already running, just like having multiple terminals within the same working directory, and terminal emulation applications like Ptyxis try to make it easier to do so [1]. Therefore, it's worth optimizing this code path. [1] https://gitlab.gnome.org/chergert/ptyxis https://flathub.org/apps/app.devsuite.Ptyxis containers#1070
a120b68 to
c1d30f4
Compare
|
Build succeeded. ✔️ unit-test SUCCESS in 6m 50s |
|
Build succeeded. ✔️ unit-test SUCCESS in 6m 28s |
|
Build succeeded. ✔️ unit-test SUCCESS in 7m 02s |
Currently, once a Toolbx container gets started with 'podman start', as
part of the 'enter' or 'run' commands, it doesn't stop unless the host
is shut down or someone explicitly calls 'podman stop'. This becomes
annoying if someone tries to remove the container because commands like
'podman rm' and such don't work without the '--force' flag, even if all
active 'enter' and 'run' sessions have ended, and the lingering entry
points of those containers are can be considered a waste of resources.
A system of reference counting based on advisory file locks has been
used to automatically exit the container's entry point once all the
active sessions have ended. Two locks are used - a global lock that's
common for all containers, and a local lock that's specific to each
container. The initialization stamp file is conveniently used as the
local lock.
The 'enter' and 'run' sessions acquire shared file locks and the
container's entry point acquires ones that are exclusive. All attempts
at acquiring the locks are blocking unless otherwise noted.
The global lock is acquired at the beginning of 'enter' and 'run' before
they inspect the container, negotiate the path to the local lock (ie.,
the initialization stamp file) with the entry point, and the local lock
is created by the entry point. Once the local lock is known by 'enter'
and 'run', they acquire it and only then release the global.
The Toolbx container's entry point tries to acquire the global lock as
it creates the initialization stamp file (ie., the local lock). This
waits for the 'enter' and 'run' invocations to receive the location of
the local lock, acquire it and release the global. Once the entry point
acquires the global lock, it releases it, and waits trying to acquire
the local lock.
This sequence of acquiring and releasing the locks lets the entry point
track the state of the 'enter' and 'run' invocations. It should only
try to acquire the local lock after the 'enter' and 'run' invocations
have acquired it before invoking 'podman exec'.
The entry point is able to acquire the local lock after all 'enter' and
'run' sessions end and release their local locks.
At this point, a new 'enter' or 'run' invocation might be in the process
of starting. Both sides need to be careful not to race against each
other and up in an invalid state. eg., a 'podman start' being invoked
against a container whose entry point is just about to exit, or a
'podman exec' being invoked against a container whose entry point is
about to exit or has already exited.
Therefore, the entry point makes a non-blocking attempt to acquire the
global lock while holding the local. If it fails, then it's because a
new 'enter' or 'run' was invoked that is in the process of negotiating
the path to the local lock with the entry point. In this case, the
entry point releases the local lock and goes back trying to acquire the
global lock, as it did when creating the initialization stamp file (ie.,
the local lock). If it succeeds, then no new 'enter' or 'run' is in the
process of starting, and the entry point can exit.
If this system of reference counting is simplified to just the global
lock, then all the entry points of all Toolbx containers will exit only
after all the 'enter' and 'run' sessions across all Toolbx containers
have ended. The local lock makes it possible to do this for each
container separately.
This system will not work without the global lock. It will cause a few
races if a new 'enter' or 'run' is invoked, just as the last of the
previous batch of sessions end, letting the entry point acquire the
local lock and prepare to exit.
Sometimes, a Toolbx container's entry point is started directly with
'podman start', without going through the 'enter' or 'run' commands, for
debugging. Care was taken to detect this case by making a non-blocking
attempt to acquire the global lock from the entry point before creating
the initialization stamp file (ie., the local lock).
If it fails, then it's because an 'enter' or 'run' is waiting for the
container to get initialized by the entry point, and things proceed as
described above. If it succeeds, then it's because the entry point was
started directly. In this case, the entry point releases the global
lock, and adds a timeout after creating the initialization stamp file
before trying to acquire any other locks to give the user time to invoke
'enter' or 'run'. A timeout of 25 seconds is used, as is the default
for D-Bus method calls [1] and when waiting for the entry point to
initialize the container.
A variation of this system of reference counting can only use the
advisory file locks in the 'enter' and 'run' commands, and invoke
'podman inspect --format {{.ExecIDs}} ...' after each 'podman exec' to
find out if there are any remaining sessions [2]. This was not done
because each podman(1) invocation is sufficiently expensive and there is
a desire to keep them to minimum in the 'enter' and 'run' commands,
because these are the most frequently used commands and users expect
them to be as lean as possible [3,4].
A totally different approach could be to pass an AF_UNIX socket to the
Toolbx container through the NOTIFY_SOCKET environment variable and
'podman create --sdnotify container ...', and do the reference counting
by sending messages from the host to the entry point before and after
each 'podman exec' [2]. One downside is that the reference counting
will break if the host process crashes before sending the message to
deduct the count after a 'podman exec' ends. Another downside is that
it becomes complicated to directly call 'podman start', without going
through the 'enter' or 'run' commands, for debugging.
[1] https://docs.gtk.org/gio/property.DBusProxy.g-default-timeout.html
[2] containers/podman#26589
[3] Commit 4536e2c
containers@4536e2c8c28f6c4f
containers#813
containers#654
[4] Commit 74d4fcf
containers@74d4fcf00c6ec3d1
containers#1491
containers#1070
containers#114
Currently, once a Toolbx container gets started with 'podman start', as
part of the 'enter' or 'run' commands, it doesn't stop unless the host
is shut down or someone explicitly calls 'podman stop'. This becomes
annoying if someone tries to remove the container because commands like
'podman rm' and such don't work without the '--force' flag, even if all
active 'enter' and 'run' sessions have ended, and the lingering entry
points of those containers are can be considered a waste of resources.
A system of reference counting based on advisory file locks has been
used to automatically exit the container's entry point once all the
active sessions have ended. Two locks are used - a global lock that's
common for all containers, and a local lock that's specific to each
container. The initialization stamp file is conveniently used as the
local lock.
The 'enter' and 'run' sessions acquire shared file locks and the
container's entry point acquires ones that are exclusive. All attempts
at acquiring the locks are blocking unless otherwise noted.
The global lock is acquired at the beginning of 'enter' and 'run' before
they inspect the container, negotiate the path to the local lock (ie.,
the initialization stamp file) with the entry point, and the local lock
is created by the entry point. Once the local lock is known by 'enter'
and 'run', they acquire it and only then release the global.
The Toolbx container's entry point tries to acquire the global lock as
it creates the initialization stamp file (ie., the local lock). This
waits for the 'enter' and 'run' invocations to receive the location of
the local lock, acquire it and release the global. Once the entry point
acquires the global lock, it releases it, and waits trying to acquire
the local lock.
This sequence of acquiring and releasing the locks lets the entry point
track the state of the 'enter' and 'run' invocations. It should only
try to acquire the local lock after the 'enter' and 'run' invocations
have acquired it before invoking 'podman exec'.
The entry point is able to acquire the local lock after all 'enter' and
'run' sessions end and release their local locks.
At this point, a new 'enter' or 'run' invocation might be in the process
of starting. Both sides need to be careful not to race against each
other and up in an invalid state. eg., a 'podman start' being invoked
against a container whose entry point is just about to exit, or a
'podman exec' being invoked against a container whose entry point is
about to exit or has already exited.
Therefore, the entry point makes a non-blocking attempt to acquire the
global lock while holding the local. If it fails, then it's because a
new 'enter' or 'run' was invoked that is in the process of negotiating
the path to the local lock with the entry point. In this case, the
entry point releases the local lock and goes back trying to acquire the
global lock, as it did when creating the initialization stamp file (ie.,
the local lock). If it succeeds, then no new 'enter' or 'run' is in the
process of starting, and the entry point can exit.
If this system of reference counting is simplified to just the global
lock, then all the entry points of all Toolbx containers will exit only
after all the 'enter' and 'run' sessions across all Toolbx containers
have ended. The local lock makes it possible to do this for each
container separately.
This system will not work without the global lock. It will cause a few
races if a new 'enter' or 'run' is invoked, just as the last of the
previous batch of sessions end, letting the entry point acquire the
local lock and prepare to exit.
Sometimes, a Toolbx container's entry point is started directly with
'podman start', without going through the 'enter' or 'run' commands, for
debugging. Care was taken to detect this case by making a non-blocking
attempt to acquire the global lock from the entry point before creating
the initialization stamp file (ie., the local lock).
If it fails, then it's because an 'enter' or 'run' is waiting for the
container to get initialized by the entry point, and things proceed as
described above. If it succeeds, then it's because the entry point was
started directly. In this case, the entry point releases the global
lock, and adds a timeout after creating the initialization stamp file
before trying to acquire any other locks to give the user time to invoke
'enter' or 'run'. A timeout of 25 seconds is used, as is the default
for D-Bus method calls [1] and when waiting for the entry point to
initialize the container.
A variation of this system of reference counting can only use the
advisory file locks in the 'enter' and 'run' commands, and invoke
'podman inspect --format {{.ExecIDs}} ...' after each 'podman exec' to
find out if there are any remaining sessions [2]. This was not done
because each podman(1) invocation is sufficiently expensive and there is
a desire to keep them to minimum in the 'enter' and 'run' commands,
because these are the most frequently used commands and users expect
them to be as lean as possible [3,4].
A totally different approach could be to pass an AF_UNIX socket to the
Toolbx container through the NOTIFY_SOCKET environment variable and
'podman create --sdnotify container ...', and do the reference counting
by sending messages from the host to the entry point before and after
each 'podman exec' [2]. One downside is that the reference counting
will break if the host process crashes before sending the message to
deduct the count after a 'podman exec' ends. Another downside is that
it becomes complicated to directly call 'podman start', without going
through the 'enter' or 'run' commands, for debugging.
[1] https://docs.gtk.org/gio/property.DBusProxy.g-default-timeout.html
[2] containers/podman#26589
[3] Commit 4536e2c
containers@4536e2c8c28f6c4f
containers#813
containers#654
[4] Commit 74d4fcf
containers@74d4fcf00c6ec3d1
containers#1491
containers#1070
containers#114
Currently, once a Toolbx container gets started with 'podman start', as
part of the 'enter' or 'run' commands, it doesn't stop unless the host
is shut down or someone explicitly calls 'podman stop'. This becomes
annoying if someone tries to remove the container because commands like
'podman rm' and such don't work without the '--force' flag, even if all
active 'enter' and 'run' sessions have ended, and the lingering entry
points of those containers are can be considered a waste of resources.
A system of reference counting based on advisory file locks has been
used to automatically exit the container's entry point once all the
active sessions have ended. Two locks are used - a global lock that's
common for all containers, and a local lock that's specific to each
container. The initialization stamp file is conveniently used as the
local lock.
The 'enter' and 'run' sessions acquire shared file locks and the
container's entry point acquires ones that are exclusive. All attempts
at acquiring the locks are blocking unless otherwise noted.
The global lock is acquired at the beginning of 'enter' and 'run' before
they inspect the container, negotiate the path to the local lock (ie.,
the initialization stamp file) with the entry point, and the local lock
is created by the entry point. Once the local lock is known by 'enter'
and 'run', they acquire it and only then release the global.
The Toolbx container's entry point tries to acquire the global lock as
it creates the initialization stamp file (ie., the local lock). This
waits for the 'enter' and 'run' invocations to receive the location of
the local lock, acquire it and release the global. Once the entry point
acquires the global lock, it releases it, and waits trying to acquire
the local lock.
This sequence of acquiring and releasing the locks lets the entry point
track the state of the 'enter' and 'run' invocations. It should only
try to acquire the local lock after the 'enter' and 'run' invocations
have acquired it before invoking 'podman exec'.
The entry point is able to acquire the local lock after all 'enter' and
'run' sessions end and release their local locks.
At this point, a new 'enter' or 'run' invocation might be in the process
of starting. Both sides need to be careful not to race against each
other and up in an invalid state. eg., a 'podman start' being invoked
against a container whose entry point is just about to exit, or a
'podman exec' being invoked against a container whose entry point is
about to exit or has already exited.
Therefore, the entry point makes a non-blocking attempt to acquire the
global lock while holding the local. If it fails, then it's because a
new 'enter' or 'run' was invoked that is in the process of negotiating
the path to the local lock with the entry point. In this case, the
entry point releases the local lock and goes back trying to acquire the
global lock, as it did when creating the initialization stamp file (ie.,
the local lock). If it succeeds, then no new 'enter' or 'run' is in the
process of starting, and the entry point can exit.
If this system of reference counting is simplified to just the global
lock, then all the entry points of all Toolbx containers will exit only
after all the 'enter' and 'run' sessions across all Toolbx containers
have ended. The local lock makes it possible to do this for each
container separately.
This system will not work without the global lock. It will cause a few
races if a new 'enter' or 'run' is invoked, just as the last of the
previous batch of sessions end, letting the entry point acquire the
local lock and prepare to exit.
Sometimes, a Toolbx container's entry point is started directly with
'podman start', without going through the 'enter' or 'run' commands, for
debugging. Care was taken to detect this case by making a non-blocking
attempt to acquire the global lock from the entry point before creating
the initialization stamp file (ie., the local lock).
If it fails, then it's because an 'enter' or 'run' is waiting for the
container to get initialized by the entry point, and things proceed as
described above. If it succeeds, then it's because the entry point was
started directly. In this case, the entry point releases the global
lock, and adds a timeout after creating the initialization stamp file
before trying to acquire any other locks to give the user time to invoke
'enter' or 'run'. A timeout of 25 seconds is used, as is the default
for D-Bus method calls [1] and when waiting for the entry point to
initialize the container.
A variation of this system of reference counting can only use the
advisory file locks in the 'enter' and 'run' commands, and invoke
'podman inspect --format {{.ExecIDs}} ...' after each 'podman exec' to
find out if there are any remaining sessions [2]. This was not done
because each podman(1) invocation is sufficiently expensive and there is
a desire to keep them to minimum in the 'enter' and 'run' commands,
because these are the most frequently used commands and users expect
them to be as lean as possible [3,4].
A totally different approach could be to pass an AF_UNIX socket to the
Toolbx container through the NOTIFY_SOCKET environment variable and
'podman create --sdnotify container ...', and do the reference counting
by sending messages from the host to the entry point before and after
each 'podman exec' [2]. One downside is that the reference counting
will break if the host process crashes before sending the message to
deduct the count after a 'podman exec' ends. Another downside is that
it becomes complicated to directly call 'podman start', without going
through the 'enter' or 'run' commands, for debugging.
[1] https://docs.gtk.org/gio/property.DBusProxy.g-default-timeout.html
[2] containers/podman#26589
[3] Commit 4536e2c
containers@4536e2c8c28f6c4f
containers#813
containers#654
[4] Commit 74d4fcf
containers@74d4fcf00c6ec3d1
containers#1491
containers#1070
containers#114
Currently, once a Toolbx container gets started with 'podman start', as
part of the 'enter' or 'run' commands, it doesn't stop unless the host
is shut down or someone explicitly calls 'podman stop'. This becomes
annoying if someone tries to remove the container because commands like
'podman rm' and such don't work without the '--force' flag, even if all
active 'enter' and 'run' sessions have ended, and the lingering entry
points of those containers are can be considered a waste of resources.
A system of reference counting based on advisory file locks has been
used to automatically exit the container's entry point once all the
active sessions have ended. Two locks are used - a global lock that's
common for all containers, and a local lock that's specific to each
container. The initialization stamp file is conveniently used as the
local lock.
The 'enter' and 'run' sessions acquire shared file locks and the
container's entry point acquires ones that are exclusive. All attempts
at acquiring the locks are blocking unless otherwise noted.
The global lock is acquired at the beginning of 'enter' and 'run' before
they inspect the container, negotiate the path to the local lock (ie.,
the initialization stamp file) with the entry point, and the local lock
is created by the entry point. Once the local lock is known by 'enter'
and 'run', they acquire it and only then release the global.
The Toolbx container's entry point tries to acquire the global lock as
it creates the initialization stamp file (ie., the local lock). This
waits for the 'enter' and 'run' invocations to receive the location of
the local lock, acquire it and release the global. Once the entry point
acquires the global lock, it releases it, and waits trying to acquire
the local lock.
This sequence of acquiring and releasing the locks lets the entry point
track the state of the 'enter' and 'run' invocations. It should only
try to acquire the local lock after the 'enter' and 'run' invocations
have acquired it before invoking 'podman exec'.
The entry point is able to acquire the local lock after all 'enter' and
'run' sessions end and release their local locks.
At this point, a new 'enter' or 'run' invocation might be in the process
of starting. Both sides need to be careful not to race against each
other and up in an invalid state. eg., a 'podman start' being invoked
against a container whose entry point is just about to exit, or a
'podman exec' being invoked against a container whose entry point is
about to exit or has already exited.
Therefore, the entry point makes a non-blocking attempt to acquire the
global lock while holding the local. If it fails, then it's because a
new 'enter' or 'run' was invoked that is in the process of negotiating
the path to the local lock with the entry point. In this case, the
entry point releases the local lock and goes back trying to acquire the
global lock, as it did when creating the initialization stamp file (ie.,
the local lock). If it succeeds, then no new 'enter' or 'run' is in the
process of starting, and the entry point can exit.
If this system of reference counting is simplified to just the global
lock, then all the entry points of all Toolbx containers will exit only
after all the 'enter' and 'run' sessions across all Toolbx containers
have ended. The local lock makes it possible to do this for each
container separately.
This system will not work without the global lock. It will cause a few
races if a new 'enter' or 'run' is invoked, just as the last of the
previous batch of sessions end, letting the entry point acquire the
local lock and prepare to exit.
Sometimes, a Toolbx container's entry point is started directly with
'podman start', without going through the 'enter' or 'run' commands, for
debugging. Care was taken to detect this case by making a non-blocking
attempt to acquire the global lock from the entry point before creating
the initialization stamp file (ie., the local lock).
If it fails, then it's because an 'enter' or 'run' is waiting for the
container to get initialized by the entry point, and things proceed as
described above. If it succeeds, then it's because the entry point was
started directly. In this case, the entry point releases the global
lock, and adds a timeout after creating the initialization stamp file
before trying to acquire any other locks to give the user time to invoke
'enter' or 'run'. A timeout of 25 seconds is used, as is the default
for D-Bus method calls [1] and when waiting for the entry point to
initialize the container.
A variation of this system of reference counting can only use the
advisory file locks in the 'enter' and 'run' commands, and invoke
'podman inspect --format {{.ExecIDs}} ...' after each 'podman exec' to
find out if there are any remaining sessions [2]. This was not done
because each podman(1) invocation is sufficiently expensive and there is
a desire to keep them to minimum in the 'enter' and 'run' commands,
because these are the most frequently used commands and users expect
them to be as lean as possible [3,4].
A totally different approach could be to pass an AF_UNIX socket to the
Toolbx container through the NOTIFY_SOCKET environment variable and
'podman create --sdnotify container ...', and do the reference counting
by sending messages from the host to the entry point before and after
each 'podman exec' [2]. One downside is that the reference counting
will break if the host process crashes before sending the message to
deduct the count after a 'podman exec' ends. Another downside is that
it becomes complicated to directly call 'podman start', without going
through the 'enter' or 'run' commands, for debugging.
[1] https://docs.gtk.org/gio/property.DBusProxy.g-default-timeout.html
[2] containers/podman#26589
[3] Commit 4536e2c
containers@4536e2c8c28f6c4f
containers#813
containers#654
[4] Commit 74d4fcf
containers@74d4fcf00c6ec3d1
containers#1491
containers#1070
containers#114
#1070