[PATCH 0/3] installer: Run the installation inside a container.

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Mathieu Othacehe
Owner
unassigned
Submitted by
Mathieu Othacehe
Severity
normal
M
M
Mathieu Othacehe wrote on 13 Aug 2020 14:23
(address . guix-patches@gnu.org)(name . Mathieu Othacehe)(address . othacehe@gnu.org)
20200813122323.262805-1-othacehe@gnu.org
Hello,

When the store overlay is mounted, other processes such as kmscon, udev
and guix-daemon may open files from the store, preventing the
underlying install support from being umounted. See:

To avoid this situation, mount the store overlay inside a container,
and run the installation from within that container.

Thanks,

Mathieu

Mathieu Othacehe (3):
install: Factorize cow-store procedure.
linux-container: Add a jail? argument.
installer: Run the installation inside a container.

gnu/build/install.scm | 44 +++++++++++-
gnu/build/linux-container.scm | 20 +++---
gnu/installer/final.scm | 125 +++++++++++++++++-----------------
gnu/installer/newt/final.scm | 7 --
gnu/services/base.scm | 60 +++++++++-------
gnu/system/install.scm | 52 ++++----------
6 files changed, 166 insertions(+), 142 deletions(-)

--
2.28.0
M
M
Mathieu Othacehe wrote on 13 Aug 2020 14:34
[PATCH 1/3] install: Factorize cow-store procedure.
(address . 42849@debbugs.gnu.org)(name . Mathieu Othacehe)(address . othacehe@gnu.org)
20200813123419.263639-1-othacehe@gnu.org
Move the cow-store procedure from the service declaration in (gnu system
install) to (gnu build install), so that it can be called from within a
different context than Shepherd.

* gnu/build/install.scm (mount-cow-store, umount-cow-store): New procedures.
* gnu/system/install.scm (make-cow-store): Remove it,
(cow-store-service-type): adapt it accordingly.
---
gnu/build/install.scm | 44 ++++++++++++++++++++++++++++++++++-
gnu/system/install.scm | 52 ++++++++++--------------------------------
2 files changed, 55 insertions(+), 41 deletions(-)

Toggle diff (139 lines)
diff --git a/gnu/build/install.scm b/gnu/build/install.scm
index 87aa5d68da..91c7225c87 100644
--- a/gnu/build/install.scm
+++ b/gnu/build/install.scm
@@ -18,6 +18,7 @@
;;; along with GNU Guix. If not, see <http://www.gnu.org/licenses/>.
(define-module (gnu build install)
+ #:use-module (guix build syscalls)
#:use-module (guix build utils)
#:use-module (guix build store-copy)
#:use-module (srfi srfi-26)
@@ -26,7 +27,9 @@
evaluate-populate-directive
populate-root-file-system
install-database-and-gc-roots
- populate-single-profile-directory))
+ populate-single-profile-directory
+ mount-cow-store
+ umount-cow-store))
;;; Commentary:
;;;
@@ -229,4 +232,43 @@ This is used to create the self-contained tarballs with 'guix pack'."
(_
#t)))
+(define (mount-cow-store target backing-directory)
+ "Make the store copy-on-write, using TARGET as the backing store. This is
+useful when TARGET is on a hard disk, whereas the current store is on a RAM
+disk."
+ (define (set-store-permissions directory)
+ "Set the right perms on DIRECTORY to use it as the store."
+ (chown directory 0 30000) ;use the fixed 'guixbuild' GID
+ (chmod directory #o1775))
+
+ (let ((tmpdir (string-append target "/tmp")))
+ (mkdir-p tmpdir)
+ (mount tmpdir "/tmp" "none" MS_BIND))
+
+ (let* ((rw-dir (string-append target backing-directory))
+ (work-dir (string-append rw-dir "/../.overlayfs-workdir")))
+ (mkdir-p rw-dir)
+ (mkdir-p work-dir)
+ (mkdir-p "/.rw-store")
+ (set-store-permissions rw-dir)
+ (set-store-permissions "/.rw-store")
+
+ ;; Mount the overlay, then atomically make it the store.
+ (mount "none" "/.rw-store" "overlay" 0
+ (string-append "lowerdir=" (%store-directory) ","
+ "upperdir=" rw-dir ","
+ "workdir=" work-dir))
+ (mount "/.rw-store" (%store-directory) "" MS_MOVE)
+ (rmdir "/.rw-store")))
+
+(define (umount-cow-store target backing-directory)
+ "Umount copy-on-write store."
+ (let ((tmp-dir "/remove"))
+ (mkdir-p tmp-dir)
+ (mount (%store-directory) tmp-dir "" MS_MOVE)
+ (umount tmp-dir)
+ (rmdir tmp-dir)
+ (delete-file-recursively
+ (string-append target backing-directory))))
+
;;; install.scm ends here
diff --git a/gnu/system/install.scm b/gnu/system/install.scm
index a87c2f4207..be5a678cec 100644
--- a/gnu/system/install.scm
+++ b/gnu/system/install.scm
@@ -175,39 +175,6 @@ manual."
;; Sub-directory used as the backing store for copy-on-write.
"/tmp/guix-inst")
-(define (make-cow-store target)
- "Return a gexp that makes the store copy-on-write, using TARGET as the
-backing store. This is useful when TARGET is on a hard disk, whereas the
-current store is on a RAM disk."
-
- (define (set-store-permissions directory)
- ;; Set the right perms on DIRECTORY to use it as the store.
- #~(begin
- (chown #$directory 0 30000) ;use the fixed 'guixbuild' GID
- (chmod #$directory #o1775)))
-
- #~(begin
- ;; Bind-mount TARGET's /tmp in case we need space to build things.
- (let ((tmpdir (string-append #$target "/tmp")))
- (mkdir-p tmpdir)
- (mount tmpdir "/tmp" "none" MS_BIND))
-
- (let* ((rw-dir (string-append target #$%backing-directory))
- (work-dir (string-append rw-dir "/../.overlayfs-workdir")))
- (mkdir-p rw-dir)
- (mkdir-p work-dir)
- (mkdir-p "/.rw-store")
- #$(set-store-permissions #~rw-dir)
- #$(set-store-permissions "/.rw-store")
-
- ;; Mount the overlay, then atomically make it the store.
- (mount "none" "/.rw-store" "overlay" 0
- (string-append "lowerdir=" #$(%store-prefix) ","
- "upperdir=" rw-dir ","
- "workdir=" work-dir))
- (mount "/.rw-store" #$(%store-prefix) "" MS_MOVE)
- (rmdir "/.rw-store"))))
-
(define cow-store-service-type
(shepherd-service-type
'cow-store
@@ -222,13 +189,18 @@ the given target.")
;; This is meant to be explicitly started by the user.
(auto-start? #f)
- (start #~(case-lambda
- ((target)
- #$(make-cow-store #~target)
- target)
- (else
- ;; Do nothing, and mark the service as stopped.
- #f)))
+ (modules `((gnu build install)
+ ,@%default-modules))
+ (start
+ (with-imported-modules (source-module-closure
+ '((gnu build install)))
+ #~(case-lambda
+ ((target)
+ (mount-cow-store target #$%backing-directory)
+ target)
+ (else
+ ;; Do nothing, and mark the service as stopped.
+ #f))))
(stop #~(lambda (target)
;; Delete the temporary directory, but leave everything
;; mounted as there may still be processes using it since
--
2.28.0
M
M
Mathieu Othacehe wrote on 13 Aug 2020 14:34
[PATCH 2/3] linux-container: Add a jail? argument.
(address . 42849@debbugs.gnu.org)(name . Mathieu Othacehe)(address . othacehe@gnu.org)
20200813123419.263639-2-othacehe@gnu.org
We may want to run a container inside the MNT namespace, without jailing the
container. Add a "jail?" argument to "run-container" and "call-with-container"
methods.

* gnu/build/linux-container.scm (run-container): Add a "jail?" argument and
honor it,
(call-with-container): ditto, and pass the argument to "run-container".
---
gnu/build/linux-container.scm | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

Toggle diff (61 lines)
diff --git a/gnu/build/linux-container.scm b/gnu/build/linux-container.scm
index 87695c98fd..bb9fb0d799 100644
--- a/gnu/build/linux-container.scm
+++ b/gnu/build/linux-container.scm
@@ -218,12 +218,13 @@ corresponds to the symbols in NAMESPACES."
namespaces)))
(define* (run-container root mounts namespaces host-uids thunk
- #:key (guest-uid 0) (guest-gid 0))
+ #:key (guest-uid 0) (guest-gid 0) (jail? #t))
"Run THUNK in a new container process and return its PID. ROOT specifies
the root directory for the container. MOUNTS is a list of <file-system>
objects that specify file systems to mount inside the container. NAMESPACES
is a list of symbols that correspond to the possible Linux namespaces: mnt,
-ipc, uts, user, and net.
+ipc, uts, user, and net. If JAIL? is false, MOUNTS list is ignored and the
+container is not jailed.
HOST-UIDS specifies the number of host user identifiers to map into the user
namespace. GUEST-UID and GUEST-GID specify the first UID (respectively GID)
@@ -243,7 +244,7 @@ that host UIDs (respectively GIDs) map to in the namespace."
(match (read child)
('ready
(purify-environment)
- (when (memq 'mnt namespaces)
+ (when (and jail? (memq 'mnt namespaces))
(catch #t
(lambda ()
(mount-file-systems root mounts
@@ -300,13 +301,15 @@ delete it when leaving the dynamic extent of this call."
(define* (call-with-container mounts thunk #:key (namespaces %namespaces)
(host-uids 1) (guest-uid 0) (guest-gid 0)
- (process-spawned-hook (const #t)))
+ (process-spawned-hook (const #t))
+ (jail? #f))
"Run THUNK in a new container process and return its exit status; call
PROCESS-SPAWNED-HOOK with the PID of the new process that has been spawned.
MOUNTS is a list of <file-system> objects that specify file systems to mount
-inside the container. NAMESPACES is a list of symbols corresponding to
-the identifiers for Linux namespaces: mnt, ipc, uts, pid, user, and net. By
-default, all namespaces are used.
+inside the container. NAMESPACES is a list of symbols corresponding to the
+identifiers for Linux namespaces: mnt, ipc, uts, pid, user, and net. By
+default, all namespaces are used. If JAIL? is false, the MOUNTS list is
+ignored and the container is not jailed.
HOST-UIDS is the number of host user identifiers to map into the container's
user namespace, if there is one. By default, only a single uid/gid, that of
@@ -324,7 +327,8 @@ load path must be adjusted as needed."
(lambda (root)
(let ((pid (run-container root mounts namespaces host-uids thunk
#:guest-uid guest-uid
- #:guest-gid guest-gid)))
+ #:guest-gid guest-gid
+ #:jail? jail?)))
;; Catch SIGINT and kill the container process.
(sigaction SIGINT
(lambda (signum)
--
2.28.0
M
M
Mathieu Othacehe wrote on 13 Aug 2020 14:34
[PATCH 3/3] installer: Run the installation inside a container.
(address . 42849@debbugs.gnu.org)(name . Mathieu Othacehe)(address . othacehe@gnu.org)
20200813123419.263639-3-othacehe@gnu.org
When the store overlay is mounted, other processes such as kmscon, udev
and guix-daemon may open files from the store, preventing the
underlying install support from being umounted. See:

To avoid this situation, mount the store overlay inside a container,
and run the installation from within that container.

* gnu/services/base.scm (guix-shepherd-service): Support an optional PID
argument passed to the "start" method. If that argument is passed, ensure that
guix-daemon enters the given PID MNT namespace.
* gnu/installer/final.scm (umount-cow-store): Remove it,
(install-system): run the installation from within a container.
* gnu/installer/newt/final.scm (run-install-shell): Remove the display hack.
---
gnu/installer/final.scm | 125 +++++++++++++++++------------------
gnu/installer/newt/final.scm | 7 --
gnu/services/base.scm | 60 ++++++++++-------
3 files changed, 99 insertions(+), 93 deletions(-)

Toggle diff (257 lines)
diff --git a/gnu/installer/final.scm b/gnu/installer/final.scm
index 685aa81d89..a19018dc85 100644
--- a/gnu/installer/final.scm
+++ b/gnu/installer/final.scm
@@ -26,6 +26,8 @@
#:use-module (guix build syscalls)
#:use-module (guix build utils)
#:use-module (gnu build accounts)
+ #:use-module (gnu build install)
+ #:use-module (gnu build linux-container)
#:use-module ((gnu system shadow) #:prefix sys:)
#:use-module (rnrs io ports)
#:use-module (srfi srfi-1)
@@ -133,49 +135,18 @@ USERS."
(_ #f))))))
pids)))
-(define (umount-cow-store)
- "Remove the store overlay and the bind-mount on /tmp created by the
-cow-store service. This procedure is very fragile and a better approach would
-be much appreciated."
- (catch #t
- (lambda ()
- (let ((tmp-dir "/remove"))
- (syslog "Unmounting cow-store.~%")
-
- (mkdir-p tmp-dir)
- (mount (%store-directory) tmp-dir "" MS_MOVE)
-
- ;; The guix-daemon has possibly opened files from the cow-store,
- ;; restart it.
- (restart-service 'guix-daemon)
-
- (syslog "Killing cow users.")
-
- ;; Kill all processes started while the cow-store was active (logins
- ;; on other TTYs for instance).
- (kill-cow-users tmp-dir)
-
- ;; Try to umount the store overlay. Some process such as udevd
- ;; workers might still be active, so do some retries.
- (let loop ((try 5))
- (syslog "Umount try ~a~%" (- 5 try))
- (sleep 1)
- (let ((umounted? (false-if-exception (umount tmp-dir))))
- (if (and (not umounted?) (> try 0))
- (loop (- try 1))
- (if umounted?
- (syslog "Umounted ~a successfully.~%" tmp-dir)
- (syslog "Failed to umount ~a.~%" tmp-dir)))))
-
- (umount "/tmp")))
- (lambda args
- (syslog "~a~%" args))))
-
(define* (install-system locale #:key (users '()))
"Create /etc/shadow and /etc/passwd on the installation target for USERS.
Start COW-STORE service on target directory and launch guix install command in
a subshell. LOCALE must be the locale name under which that command will run,
or #f. Return #t on success and #f on failure."
+ (define backing-directory
+ ;; Sub-directory used as the backing store for copy-on-write.
+ "/tmp/guix-inst")
+
+ (define (assert-exit x)
+ (primitive-exit (if x 0 1)))
+
(let* ((options (catch 'system-error
(lambda ()
;; If this file exists, it can provide
@@ -188,7 +159,11 @@ or #f. Return #t on success and #f on failure."
"--fallback")
options
(list (%installer-configuration-file)
- (%installer-target-dir)))))
+ (%installer-target-dir))))
+ (database-dir "/var/guix/db")
+ (database-file (string-append database-dir "/db.sqlite"))
+ (saved-database (string-append database-dir "/db.save"))
+ (ret #f))
(mkdir-p (%installer-target-dir))
;; We want to initialize user passwords but we don't want to store them in
@@ -198,27 +173,51 @@ or #f. Return #t on success and #f on failure."
;; passwords that we've put in there.
(create-user-database users (%installer-target-dir))
- (dynamic-wind
- (lambda ()
- (start-service 'cow-store (list (%installer-target-dir))))
- (lambda ()
- ;; If there are any connected clients, assume that we are running
- ;; installation tests. In that case, dump the standard and error
- ;; outputs to syslog.
- (if (not (null? (current-clients)))
- (with-output-to-file "/dev/console"
- (lambda ()
- (with-error-to-file "/dev/console"
- (lambda ()
- (setvbuf (current-output-port) 'none)
- (setvbuf (current-error-port) 'none)
- (run-command install-command #:locale locale)))))
- (run-command install-command #:locale locale)))
- (lambda ()
- (stop-service 'cow-store)
- ;; Remove the store overlay created at cow-store service start.
- ;; Failing to do that will result in further umount calls to fail
- ;; because the target device is seen as busy. See:
- ;; https://lists.gnu.org/archive/html/guix-devel/2018-12/msg00161.html.
- (umount-cow-store)
- #f))))
+ ;; When the store overlay is mounted, other processes such as kmscon, udev
+ ;; and guix-daemon may open files from the store, preventing the
+ ;; underlying install support from being umounted. See:
+ ;; https://lists.gnu.org/archive/html/guix-devel/2018-12/msg00161.html.
+ ;;
+ ;; To avoid this situation, mount the store overlay inside a container,
+ ;; and run the installation from within that container.
+ (zero?
+ (call-with-container '()
+ (lambda ()
+ (dynamic-wind
+ (lambda ()
+ ;; Save the database, so that it can be restored once the
+ ;; cow-store is umounted.
+ (copy-file database-file saved-database)
+ (mount-cow-store (%installer-target-dir) backing-directory))
+ (lambda ()
+ ;; We need to drag the guix-daemon to the container MNT
+ ;; namespace, so that it can operate on the cow-store.
+ (stop-service 'guix-daemon)
+ (start-service 'guix-daemon (list (number->string (getpid))))
+
+ (setvbuf (current-output-port) 'none)
+ (setvbuf (current-error-port) 'none)
+
+ ;; If there are any connected clients, assume that we are running
+ ;; installation tests. In that case, dump the standard and error
+ ;; outputs to syslog.
+ (set! ret
+ (if (not (null? (current-clients)))
+ (with-output-to-file "/dev/console"
+ (lambda ()
+ (with-error-to-file "/dev/console"
+ (lambda ()
+ (run-command install-command
+ #:locale locale)))))
+ (run-command install-command #:locale locale))))
+ (lambda ()
+ ;; Restart guix-daemon so that it does no keep the MNT namespace
+ ;; alive.
+ (restart-service 'guix-daemon)
+ (copy-file saved-database database-file)
+
+ ;; Finally umount the cow-store and exit the container.
+ (umount-cow-store (%installer-target-dir) backing-directory)
+ (assert-exit ret))))
+ #:namespaces '(mnt)
+ #:jail? #f))))
diff --git a/gnu/installer/newt/final.scm b/gnu/installer/newt/final.scm
index fa8d6fea71..89684c4d8a 100644
--- a/gnu/installer/newt/final.scm
+++ b/gnu/installer/newt/final.scm
@@ -102,13 +102,6 @@ a specific step, or restart the installer."))
#:key (users '()))
(clear-screen)
(newt-suspend)
- ;; XXX: Force loading 'bold' font files before mouting the
- ;; cow-store. Otherwise, if the file is loaded by kmscon after the cow-store
- ;; in mounted, it will be necessary to kill kmscon to umount to cow-store.
- (display
- (colorize-string
- (format #f (G_ "Installing Guix System ...~%"))
- (color BOLD)))
(let ((install-ok? (install-system locale #:users users)))
(newt-resume)
install-ok?))
diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index 491f35702a..f62fd861ca 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -1558,36 +1558,50 @@ proxy of 'guix-daemon'...~%")
(provision '(guix-daemon))
(requirement '(user-processes))
(actions (list shepherd-set-http-proxy-action))
- (modules '((srfi srfi-1)))
+ (modules '((srfi srfi-1)
+ (ice-9 match)))
(start
- #~(lambda _
+ #~(lambda args
(define proxy
;; HTTP/HTTPS proxy. The 'http_proxy' variable is set by
;; the 'set-http-proxy' action.
(or (getenv "http_proxy") #$http-proxy))
(fork+exec-command
- (cons* #$(file-append guix "/bin/guix-daemon")
- "--build-users-group" #$build-group
- "--max-silent-time" #$(number->string max-silent-time)
- "--timeout" #$(number->string timeout)
- "--log-compression" #$(symbol->string log-compression)
- #$@(if use-substitutes?
- '()
- '("--no-substitutes"))
- "--substitute-urls" #$(string-join substitute-urls)
- #$@extra-options
-
- ;; Add CHROOT-DIRECTORIES and all their dependencies
- ;; (if these are store items) to the chroot.
- (append-map (lambda (file)
- (append-map (lambda (directory)
- (list "--chroot-directory"
- directory))
- (call-with-input-file file
- read)))
- '#$(map references-file
- chroot-directories)))
+ ;; When running the installer, we need guix-daemon to operate
+ ;; from within the same MNT namespace as the installation
+ ;; container. In that case only, enter the namespace of the
+ ;; process PID passed as start argument.
+ (append
+ (match args
+ ((pid)
+ (list #$(file-append util-linux "/bin/nsenter")
+ "-t" pid "-m"))
+ (else '()))
+ (cons* #$(file-append guix "/bin/guix-daemon")
+ "--build-users-group" #$build-group
+ "--max-silent-time"
+ #$(number->string max-silent-time)
+ "--timeout" #$(number->string timeout)
+ "--log-compression"
+ #$(symbol->string log-compression)
+ #$@(if use-substitutes?
+ '()
+ '("--no-substitutes"))
+ "--substitute-urls" #$(string-join substitute-urls)
+ #$@extra-options
+
+ ;; Add CHROOT-DIRECTORIES and all their dependencies
+ ;; (if these are store items) to the chroot.
+ (append-map
+ (lambda (file)
+ (append-map (lambda (directory)
+ (list "--chroot-directory"
+ directory))
+ (call-with-input-file file
+ read)))
+ '#$(map references-file
+ chroot-directories))))
#:environment-variables
(append (list #$@(if tmpdir
--
2.28.0
L
L
Ludovic Courtès wrote on 30 Aug 2020 21:51
Re: [bug#42849] [PATCH 1/3] install: Factorize cow-store procedure.
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 42849@debbugs.gnu.org)
87wo1fhpyn.fsf@gnu.org
Hi!

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (8 lines)
> Move the cow-store procedure from the service declaration in (gnu system
> install) to (gnu build install), so that it can be called from within a
> different context than Shepherd.
>
> * gnu/build/install.scm (mount-cow-store, umount-cow-store): New procedures.
> * gnu/system/install.scm (make-cow-store): Remove it,
> (cow-store-service-type): adapt it accordingly.

Nitpick: Unlike K&R back then, I think we can afford the ‘n’ in
‘unmount’. :-)

Other than that LGTM!

Ludo’.
L
L
Ludovic Courtès wrote on 30 Aug 2020 21:53
Re: [bug#42849] [PATCH 2/3] linux-container: Add a jail? argument.
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 42849@debbugs.gnu.org)
87sgc3hpvp.fsf@gnu.org
Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (29 lines)
> We may want to run a container inside the MNT namespace, without jailing the
> container. Add a "jail?" argument to "run-container" and "call-with-container"
> methods.
>
> * gnu/build/linux-container.scm (run-container): Add a "jail?" argument and
> honor it,
> (call-with-container): ditto, and pass the argument to "run-container".
> ---
> gnu/build/linux-container.scm | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/gnu/build/linux-container.scm b/gnu/build/linux-container.scm
> index 87695c98fd..bb9fb0d799 100644
> --- a/gnu/build/linux-container.scm
> +++ b/gnu/build/linux-container.scm
> @@ -218,12 +218,13 @@ corresponds to the symbols in NAMESPACES."
> namespaces)))
>
> (define* (run-container root mounts namespaces host-uids thunk
> - #:key (guest-uid 0) (guest-gid 0))
> + #:key (guest-uid 0) (guest-gid 0) (jail? #t))
> "Run THUNK in a new container process and return its PID. ROOT specifies
> the root directory for the container. MOUNTS is a list of <file-system>
> objects that specify file systems to mount inside the container. NAMESPACES
> is a list of symbols that correspond to the possible Linux namespaces: mnt,
> -ipc, uts, user, and net.
> +ipc, uts, user, and net. If JAIL? is false, MOUNTS list is ignored and the
> +container is not jailed.

Why not just change the caller to pass #:mounts '() then? Am I missing
something?

I’m reluctant to introducing “jail” because that’s undefined in this
context (reminds me of FreeBSD).

Ludo’.
L
L
Ludovic Courtès wrote on 30 Aug 2020 22:40
Re: [bug#42849] [PATCH 3/3] installer: Run the installation inside a container.
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 42849@debbugs.gnu.org)
87eennhnpz.fsf@gnu.org
Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (15 lines)
> When the store overlay is mounted, other processes such as kmscon, udev
> and guix-daemon may open files from the store, preventing the
> underlying install support from being umounted. See:
> https://lists.gnu.org/archive/html/guix-devel/2018-12/msg00161.html.
>
> To avoid this situation, mount the store overlay inside a container,
> and run the installation from within that container.
>
> * gnu/services/base.scm (guix-shepherd-service): Support an optional PID
> argument passed to the "start" method. If that argument is passed, ensure that
> guix-daemon enters the given PID MNT namespace.
> * gnu/installer/final.scm (umount-cow-store): Remove it,
> (install-system): run the installation from within a container.
> * gnu/installer/newt/final.scm (run-install-shell): Remove the display hack.

Smart!

Toggle quote (47 lines)
> + ;; When the store overlay is mounted, other processes such as kmscon, udev
> + ;; and guix-daemon may open files from the store, preventing the
> + ;; underlying install support from being umounted. See:
> + ;; https://lists.gnu.org/archive/html/guix-devel/2018-12/msg00161.html.
> + ;;
> + ;; To avoid this situation, mount the store overlay inside a container,
> + ;; and run the installation from within that container.
> + (zero?
> + (call-with-container '()
> + (lambda ()
> + (dynamic-wind
> + (lambda ()
> + ;; Save the database, so that it can be restored once the
> + ;; cow-store is umounted.
> + (copy-file database-file saved-database)
> + (mount-cow-store (%installer-target-dir) backing-directory))
> + (lambda ()
> + ;; We need to drag the guix-daemon to the container MNT
> + ;; namespace, so that it can operate on the cow-store.
> + (stop-service 'guix-daemon)
> + (start-service 'guix-daemon (list (number->string (getpid))))
> +
> + (setvbuf (current-output-port) 'none)
> + (setvbuf (current-error-port) 'none)
> +
> + ;; If there are any connected clients, assume that we are running
> + ;; installation tests. In that case, dump the standard and error
> + ;; outputs to syslog.
> + (set! ret
> + (if (not (null? (current-clients)))
> + (with-output-to-file "/dev/console"
> + (lambda ()
> + (with-error-to-file "/dev/console"
> + (lambda ()
> + (run-command install-command
> + #:locale locale)))))
> + (run-command install-command #:locale locale))))
> + (lambda ()
> + ;; Restart guix-daemon so that it does no keep the MNT namespace
> + ;; alive.
> + (restart-service 'guix-daemon)
> + (copy-file saved-database database-file)
> +
> + ;; Finally umount the cow-store and exit the container.
> + (umount-cow-store (%installer-target-dir) backing-directory)
> + (assert-exit ret))))

Should ‘mount-cow-store’ also make an overlay for /var/guix/db? That
way, changes to that directory would go to /mnt/var/guix/db and the
original database would remain unchanged.

Toggle quote (49 lines)
> --- a/gnu/services/base.scm
> +++ b/gnu/services/base.scm
> @@ -1558,36 +1558,50 @@ proxy of 'guix-daemon'...~%")
> (provision '(guix-daemon))
> (requirement '(user-processes))
> (actions (list shepherd-set-http-proxy-action))
> - (modules '((srfi srfi-1)))
> + (modules '((srfi srfi-1)
> + (ice-9 match)))
> (start
> - #~(lambda _
> + #~(lambda args
> (define proxy
> ;; HTTP/HTTPS proxy. The 'http_proxy' variable is set by
> ;; the 'set-http-proxy' action.
> (or (getenv "http_proxy") #$http-proxy))
>
> (fork+exec-command
> - (cons* #$(file-append guix "/bin/guix-daemon")
> - "--build-users-group" #$build-group
> - "--max-silent-time" #$(number->string max-silent-time)
> - "--timeout" #$(number->string timeout)
> - "--log-compression" #$(symbol->string log-compression)
> - #$@(if use-substitutes?
> - '()
> - '("--no-substitutes"))
> - "--substitute-urls" #$(string-join substitute-urls)
> - #$@extra-options
> -
> - ;; Add CHROOT-DIRECTORIES and all their dependencies
> - ;; (if these are store items) to the chroot.
> - (append-map (lambda (file)
> - (append-map (lambda (directory)
> - (list "--chroot-directory"
> - directory))
> - (call-with-input-file file
> - read)))
> - '#$(map references-file
> - chroot-directories)))
> + ;; When running the installer, we need guix-daemon to operate
> + ;; from within the same MNT namespace as the installation
> + ;; container. In that case only, enter the namespace of the
> + ;; process PID passed as start argument.
> + (append
> + (match args
> + ((pid)
> + (list #$(file-append util-linux "/bin/nsenter")
> + "-t" pid "-m"))

We should use ‘container-excursion’ instead of nsenter.

Toggle quote (25 lines)
> + (else '()))
> + (cons* #$(file-append guix "/bin/guix-daemon")
> + "--build-users-group" #$build-group
> + "--max-silent-time"
> + #$(number->string max-silent-time)
> + "--timeout" #$(number->string timeout)
> + "--log-compression"
> + #$(symbol->string log-compression)
> + #$@(if use-substitutes?
> + '()
> + '("--no-substitutes"))
> + "--substitute-urls" #$(string-join substitute-urls)
> + #$@extra-options
> +
> + ;; Add CHROOT-DIRECTORIES and all their dependencies
> + ;; (if these are store items) to the chroot.
> + (append-map
> + (lambda (file)
> + (append-map (lambda (directory)
> + (list "--chroot-directory"
> + directory))
> + (call-with-input-file file
> + read)))
> + '#$(map references-file

Hmm, that seems quite complex, and it’s not great that we have to tweak
guix-daemon-service “just” for this.

*scratches head*

Is there a way we can identify processes that have open overlay files,
so we could terminate them?

Alternately, something that might simplify the code would be to always
run guix-daemon in a separate mount namespace. We could add a
‘fork+exec-command/container’ procedure in (gnu build shepherd) to help
with that.

That way, all we’d need to do is to run ‘guix system init’ in that same
mount namespace, which can be achieved using ‘container-excursion’.

Too bad we can’t use setns for a process other than the calling process.
:-/

Thoughts?

Ludo’.
M
M
Mathieu Othacehe wrote on 31 Aug 2020 08:27
Re: [bug#42849] [PATCH 2/3] linux-container: Add a jail? argument.
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 42849@debbugs.gnu.org)
87k0xfib4b.fsf@gnu.org
Hey Ludo,

Toggle quote (6 lines)
> Why not just change the caller to pass #:mounts '() then? Am I missing
> something?
>
> I’m reluctant to introducing “jail” because that’s undefined in this
> context (reminds me of FreeBSD).

The purpose here is to avoid the "pivot-root" call that is done
unconditionally in "mount-file-systems". This way containerized process
can share the parent root file-system.

Maybe something like that would make more sense:

Toggle snippet (8 lines)
(lambda ()
(unless (null? mounts)
(mount-file-systems root mounts
#:mount-/proc? (memq 'pid namespaces)
#:mount-/sys? (memq 'net
namespaces))))

Thanks,

Mathieu
M
M
Mathieu Othacehe wrote on 31 Aug 2020 08:44
Re: [bug#42849] [PATCH 3/3] installer: Run the installation inside a container.
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 42849@debbugs.gnu.org)
87a6ybiab6.fsf@gnu.org
Toggle quote (4 lines)
> Should ‘mount-cow-store’ also make an overlay for /var/guix/db? That
> way, changes to that directory would go to /mnt/var/guix/db and the
> original database would remain unchanged.

I took the lazy path because it's just one file that keeps reasonably
small. Adding an extra overlay for /var/guix/db would make
sense here.

Toggle quote (4 lines)
>
> Hmm, that seems quite complex, and it’s not great that we have to tweak
> guix-daemon-service “just” for this.

Yes I can't say I'm satisfied with all of this but I'm trying different
angles for this problem since months, with no proper outcome.

Toggle quote (3 lines)
> Is there a way we can identify processes that have open overlay files,
> so we could terminate them?

That's the current approach but it breaks very ofter because kmscon,
udev or any other processes that can't be killed, opens an overlay file.
I'd really like to avoid relying on this kind of solution.

Toggle quote (8 lines)
> Alternately, something that might simplify the code would be to always
> run guix-daemon in a separate mount namespace. We could add a
> ‘fork+exec-command/container’ procedure in (gnu build shepherd) to help
> with that.
>
> That way, all we’d need to do is to run ‘guix system init’ in that same
> mount namespace, which can be achieved using ‘container-excursion’.

Yes I tried that at first but there's a catch. While running guix-daemon
in it's own mount namespace, it won't 'see' the mounted file-systems
such as /mnt.

So that would mean that we would have to do spawn a containerized
process that would:

* Join guix-daemon mnt namespace
* Call "with-mounted-partitions"
* Mount the cow-store
* Run 'guix system init'

In this is end it still seem overly complex, but I can give it another
try. WDYT?

Thanks a lot for reviewing this!

Mathieu
L
L
Ludovic Courtès wrote on 31 Aug 2020 15:36
Re: [bug#42849] [PATCH 2/3] linux-container: Add a jail? argument.
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 42849@debbugs.gnu.org)
87h7sjc4zd.fsf@gnu.org
Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (10 lines)
>> Why not just change the caller to pass #:mounts '() then? Am I missing
>> something?
>>
>> I’m reluctant to introducing “jail” because that’s undefined in this
>> context (reminds me of FreeBSD).
>
> The purpose here is to avoid the "pivot-root" call that is done
> unconditionally in "mount-file-systems". This way containerized process
> can share the parent root file-system.

Oh, I see.

Toggle quote (9 lines)
> Maybe something like that would make more sense:
>
> (lambda ()
> (unless (null? mounts)
> (mount-file-systems root mounts
> #:mount-/proc? (memq 'pid namespaces)
> #:mount-/sys? (memq 'net
> namespaces))))

Should it be (and (null? mounts) (null? namespaces)) or…?

Ludo’.
L
L
Ludovic Courtès wrote on 1 Sep 2020 10:48
Re: [bug#42849] [PATCH 3/3] installer: Run the installation inside a container.
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 42849@debbugs.gnu.org)
87k0xdanng.fsf@gnu.org
Hi Mathieu!

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (8 lines)
>> Should ‘mount-cow-store’ also make an overlay for /var/guix/db? That
>> way, changes to that directory would go to /mnt/var/guix/db and the
>> original database would remain unchanged.
>
> I took the lazy path because it's just one file that keeps reasonably
> small. Adding an extra overlay for /var/guix/db would make
> sense here.

Yeah, no big deal.

Toggle quote (6 lines)
>> Hmm, that seems quite complex, and it’s not great that we have to tweak
>> guix-daemon-service “just” for this.
>
> Yes I can't say I'm satisfied with all of this but I'm trying different
> angles for this problem since months, with no proper outcome.

Yeah… (I must say I really appreciate your commitment tackling hard
problems like this one, kudos!)

Toggle quote (7 lines)
>> Is there a way we can identify processes that have open overlay files,
>> so we could terminate them?
>
> That's the current approach but it breaks very ofter because kmscon,
> udev or any other processes that can't be killed, opens an overlay file.
> I'd really like to avoid relying on this kind of solution.

OK, makes sense!

Toggle quote (23 lines)
>> Alternately, something that might simplify the code would be to always
>> run guix-daemon in a separate mount namespace. We could add a
>> ‘fork+exec-command/container’ procedure in (gnu build shepherd) to help
>> with that.
>>
>> That way, all we’d need to do is to run ‘guix system init’ in that same
>> mount namespace, which can be achieved using ‘container-excursion’.
>
> Yes I tried that at first but there's a catch. While running guix-daemon
> in it's own mount namespace, it won't 'see' the mounted file-systems
> such as /mnt.
>
> So that would mean that we would have to do spawn a containerized
> process that would:
>
> * Join guix-daemon mnt namespace
> * Call "with-mounted-partitions"
> * Mount the cow-store
> * Run 'guix system init'
>
> In this is end it still seem overly complex, but I can give it another
> try. WDYT?

It does seem complex indeed.

So perhaps we can settle on the solution you sent, but let’s see if we
can move complexity out of sight. For example, if we can arrange to
have a ‘fork+exec-command/container’ procedure that can be passed the
PID of a namespace, such that the ‘start’ method of guix-daemon is just
a few more lines its current definition, I’ll be happy.

How does that sound?

Thank you!

Ludo’.
M
M
Mathieu Othacehe wrote on 2 Sep 2020 17:15
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 42849-done@debbugs.gnu.org)
87imcwtdkd.fsf@gnu.org
Hey Ludo,

Toggle quote (8 lines)
> So perhaps we can settle on the solution you sent, but let’s see if we
> can move complexity out of sight. For example, if we can arrange to
> have a ‘fork+exec-command/container’ procedure that can be passed the
> PID of a namespace, such that the ‘start’ method of guix-daemon is just
> a few more lines its current definition, I’ll be happy.
>
> How does that sound?

Sounds fine! I added a "fork+exec-command/container" in (gnu build
shepherd) module, that uses "container-excursion" to enter the
namespaces of the process passed as argument.

I also took your other remarks into account and pushed this
serie. Thanks a lot for diving into this harsh stuff :).

Locally, the installer tests behave fine, but I'll monitor the CI to see
how it goes.

Now, the only installation test failure I'm aware of is
https://issues.guix.gnu.org/41948.The good news, is that I have a Guile
patch that seem to solve it[1].

Thanks,

Mathieu

Closed
L
L
Ludovic Courtès wrote on 2 Sep 2020 22:17
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 42849-done@debbugs.gnu.org)
8736403pco.fsf@gnu.org
Hi!

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (15 lines)
>> So perhaps we can settle on the solution you sent, but let’s see if we
>> can move complexity out of sight. For example, if we can arrange to
>> have a ‘fork+exec-command/container’ procedure that can be passed the
>> PID of a namespace, such that the ‘start’ method of guix-daemon is just
>> a few more lines its current definition, I’ll be happy.
>>
>> How does that sound?
>
> Sounds fine! I added a "fork+exec-command/container" in (gnu build
> shepherd) module, that uses "container-excursion" to enter the
> namespaces of the process passed as argument.
>
> I also took your other remarks into account and pushed this
> serie. Thanks a lot for diving into this harsh stuff :).

Awesome!

Toggle quote (7 lines)
> Locally, the installer tests behave fine, but I'll monitor the CI to see
> how it goes.
>
> Now, the only installation test failure I'm aware of is
> https://issues.guix.gnu.org/41948. The good news, is that I have a Guile
> patch that seem to solve it[1].

Yes, it’s still on my radar (started looking into it the other day,
without any tangible result).

Ludo’.
Closed
L
L
Ludovic Courtès wrote on 2 Sep 2020 23:25
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 42849-done@debbugs.gnu.org)
87h7sf3m7q.fsf@gnu.org
Hey,

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (4 lines)
> Sounds fine! I added a "fork+exec-command/container" in (gnu build
> shepherd) module, that uses "container-excursion" to enter the
> namespaces of the process passed as argument.

There’s a catch that I just discovered as I reconfigured berlin:

Toggle snippet (13 lines)
root@berlin ~/maintenance/hydra# herd restart guix-daemon
Service cuirass-web has been stopped.
Service cuirass has been stopped.
Service guix-publish has been stopped.
Service guix-daemon has been stopped.
herd: exception caught while executing 'start' on service 'guix-daemon':
Unbound variable: fork+exec-command/container
root@berlin ~/maintenance/hydra# herd restart guix-daemon
Service guix-daemon is not running.
herd: exception caught while executing 'start' on service 'guix-daemon':
Unbound variable: fork+exec-command/container

The workaround I found, which works nicely, is to run:

herd eval root "(reload-module (resolve-module '(gnu build shepherd)))"

and then:

herd restart guix-daemon
herd restart guix-publish

Not sure if there’s anything we can or should do. It’s probably not too
common to restart guix-daemon after an upgrade, though.

Ludo’.
Closed
L
L
Ludovic Courtès wrote on 8 Sep 2020 00:02
Re: [bug#42849] [PATCH 2/3] linux-container: Add a jail? argument.
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 42849@debbugs.gnu.org)
878sdlfdpj.fsf@gnu.org
Hi!

Commit 5316dfc0f125b658e4a2acf7f00f49501663d943 breaks two tests in
tests/containers.scm, related to ‘container-excursion’ because the
#:namespaces argument is no longer really honored.

Thoughts on how to fix it?

Thanks,
Ludo’.
M
M
Mathieu Othacehe wrote on 10 Sep 2020 09:46
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 42849@debbugs.gnu.org)
871rja5b26.fsf@gnu.org
Hey Ludo,

Toggle quote (6 lines)
> Commit 5316dfc0f125b658e4a2acf7f00f49501663d943 breaks two tests in
> tests/containers.scm, related to ‘container-excursion’ because the
> #:namespaces argument is no longer really honored.
>
> Thoughts on how to fix it?

Oops, sorry about that. After more thoughts, the only case we do not
want to "jail" the container is when the requested root is already
"/". I fixed it with b3a83f1ece4b6c8bfcc2a9875df51142c0e39904.

Thanks,

--
Mathieu
L
L
Ludovic Courtès wrote on 11 Sep 2020 17:07
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 42849@debbugs.gnu.org)
87a6xwtkrv.fsf@gnu.org
Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (10 lines)
>> Commit 5316dfc0f125b658e4a2acf7f00f49501663d943 breaks two tests in
>> tests/containers.scm, related to ‘container-excursion’ because the
>> #:namespaces argument is no longer really honored.
>>
>> Thoughts on how to fix it?
>
> Oops, sorry about that. After more thoughts, the only case we do not
> want to "jail" the container is when the requested root is already
> "/". I fixed it with b3a83f1ece4b6c8bfcc2a9875df51142c0e39904.

Great, thanks!

Ludo’.
?