cryptsetup-static aborts opening LUKS2 volume with Argon2i PBKDF

OpenSubmitted by Simon South.
Details
4 participants
  • angry rectangle
  • Leo Famulari
  • Simon South
  • Tom Fitzhenry
Owner
unassigned
Severity
serious
S
S
Simon South wrote on 4 Jan 15:36 +0100
(address . bug-guix@gnu.org)
87v8yz1sae.fsf@simonsouth.net
Currently cryptsetup from the "cryptsetup-static" package is unable to
open LUKS2 encrypted volumes that use the Argon2i key-derivation
algorithm, the default for LUKS2. It catches SIGABRT and exits without
opening the volume.

This appears to be a regression following the merge of the
core-updates-frozen branch and because of it, I'm unable to boot into an
up-to-date system as there is no way to get past the "Enter passphrase"
prompt at startup.

I've verified this on both AArch64 and x86-64. To reproduce:

1. Ensure the "cryptsetup" package is installed in your profile and that
"cryptsetup-static", the statically-linked equivalent added to the
initrd and used during startup, is available on your system:

guix install cryptsetup
guix build --verbosity=2 cryptsetup-static

2. Create a file containing a dummy LUKS2 volume:

truncate -s 32M ./dummy-luks-volume
cryptsetup luksFormat --type luks2 ./dummy-luks-volume

Make sure the Argon2i PBKDF algorithm was selected during formatting:

cryptsetup luksDump ./dummy-luks-volume | grep argon

This should output "PBKDF: argon2i".

3. Verify the volume can be opened using the regular cryptsetup tool:

sudo cryptsetup open --type luks ./dummy-luks-volume dummy-volume
ls /dev/mapper/dummy-volume
sudo cryptsetup close /dev/mapper/dummy-volume

4. Now try opening the volume using the statically-linked cryptsetup:

sudo `guix build cryptsetup-static`/sbin/cryptsetup open \
--type luks ./dummy-luks-volume dummy-volume
ls /dev/mapper/dummy-volume

You should find (on most runs, at least) after you enter the passphrase
the tool exits with "Aborted" and with no entry added beneath
/dev/mapper.

--
Simon South
simon@simonsouth.net
S
S
Simon South wrote on 4 Jan 15:44 +0100
(address . 53005@debbugs.gnu.org)
87sfu31rx7.fsf@simonsouth.net
This seems to be caused by the symbol "__pthread_key_create" being
absent from cryptsetup's symbol table during linking, even though it is
meant to be exported explicitly by glibc.

The fundamental issue is that cryptsetup's Argon2i implementation (and
thus cryptsetup itself) is multithreaded, but libgcc[0], a low-level
runtime library used by the compiler, is unaware of this fact.
Consequently libgcc's stack-unwinding code skips the use of a mutex that
would normally synchronize data access among multiple threads, allowing
a race condition to develop when two or more threads try to exit
simultaneously.

One thread will fail to locate its frame-descriptor entry, causing NULL
to be returned by libgcc's _Unwind_Find_FDE() function
(libgcc/unwind-dw2-fde.c:1029), leading to an assertion failing (in
uw_init_context_1() at libgcc/unwind-dw2.c:1593) and abort() being
called, terminating the process.

The underlying failure is indicated in a comment in gcc's code, at
libgcc/gthr-posix.h:215. libgcc uses the presence of specific symbols
in a program's symbol table to infer whether or not the program is
multithreaded:

For a program to be multi-threaded the only thing that it certainly
must be using is pthread_create. However, there may be other
libraries that intercept pthread_create with their own definitions to
wrap pthreads functionality for some purpose. In those cases,
pthread_create being defined might not necessarily mean that
libpthread is actually linked in.

For the GNU C library, we can use a known internal name. This is
always available in the ABI, but no other library would define it.
That is ideal, since any public pthread function might be intercepted
just as pthread_create might be. __pthread_key_create is an
"internal" implementation symbol, but it is part of the public
exported ABI. Also, it's among the symbols that the static
libpthread.a always links in whenever pthread_create is used, so
there is no danger of a false negative result in any
statically-linked, multi-threaded program.

It seems the final sentence is no longer true, at least in recent
versions of Guix.

Building the "cryptsetup-static" package with "#:strip-binaries? #f" and
dumping the resulting binary's symbol table with "objdump -t" shows
"pthread_create" is present but not "__pthread_key_create". This seems
to be why libgcc's __gthread_active_p() always returns false, turning
wrapper functions like __gthread_mutex_lock() into no-ops and
effectively disabling the use of the mutex in _Unwind_Find_FDE().

The question then is why this symbol either is no longer being exported
by glibc or is being dropped at some point during cryptsetup's
compilation. (Other packages may be affected as well: Even the regular,
dynamically-linked cryptsetup shows this problem, but avoids a crash by
not invoking libgcc's stack-unwinding routines.)

I'll keep working on this but having gotten this far, I'm hoping someone
else has some insight.


--
Simon South
simon@simonsouth.net
S
S
Simon South wrote on 8 Jan 02:52 +0100
(address . 53005@debbugs.gnu.org)
87ee5jt2n4.fsf@simonsouth.net
After some testing I've found the regression was introduced in commits
f32a6055a5 and e0f31baacc, "build-system/gnu: strip with
--strip-unneeded", which replace "--strip-debug" with "--strip-unneeded"
for packages that use the GNU build system. It seems this is now
stripping a bit too much.

The solution may be as simple as undoing this change in (at least) the
"static-package" function (guix/build-system/gnu.scm:211).
Alternatively we may need to add a "--keep-symbol" flag in a few places.

I'm continuing to investigate.

--
Simon South
simon@simonsouth.net
L
L
Leo Famulari wrote on 8 Jan 04:14 +0100
(no subject)
(address . control@debbugs.gnu.org)
YdkBjOXsKQxBfhAG@jasmine.lan
severity 53005 serious
S
S
Simon South wrote on 11 Jan 00:34 +0100
[PATCH 1/1] gnu: glibc: Preserve "__pthread_key_create" symbol.
(address . 53005@debbugs.gnu.org)
7e85b4a230a68240759120e8440ea77cf1d6e927.1641856285.git.simon@simonsouth.net
Avoid a potential crash in multithreaded applications by preserving the
pthread library's "__pthread_key_create" symbol, used by libgcc to detect the
use of threads in an application.


* gnu/packages/base.scm (glibc)[arguments]: Add "#:strip-flags" with
"--keep-symbol=__pthread_key_create" appended to the default set.
---
gnu/packages/base.scm | 10 ++++++++++
1 file changed, 10 insertions(+)

Toggle diff (23 lines)
diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm
index 12e4de52d4..68c85dcdd5 100644
--- a/gnu/packages/base.scm
+++ b/gnu/packages/base.scm
@@ -795,6 +795,16 @@ (define-public glibc
                   '()))
 
       #:tests? #f                                 ; XXX
+
+      #:strip-flags '("--strip-unneeded"
+                      "--enable-deterministic-archives"
+
+                      ;; Preserve the symbol "__pthread_key_create" in the
+                      ;; pthread library as this is used by libgcc to detect
+                      ;; the use of threads in an application.
+                      ;; See https://issues.guix.gnu.org/53005.
+                      "--keep-symbol=__pthread_key_create")
+
       #:phases (modify-phases %standard-phases
                  (add-before
                   'configure 'pre-configure
-- 
2.34.0
S
S
Simon South wrote on 11 Jan 00:34 +0100
Re: bug#53005: cryptsetup-static aborts opening LUKS2 volume with Argon2i PBKDF
(address . 53005@debbugs.gnu.org)
cover.1641856285.git.simon@simonsouth.net
Here's a patch that fixes this problem by modifying the glibc package so the
"__pthread_key_create" symbol in its pthread library is preserved, as opposed
to being stripped off as it is today.

This tests fine for me on both AArch64 and x86-64: Stepping through the code
in gdb I can see libgcc's __gthread_active_p() is now returning true, and both
cryptsetup tools now open a LUKS2 volume without issue. So far nothing else
seems to be affected.

This is the smallest and least-intrusive fix I can think of though I expect it
will still result in a large number of packages being rebuilt.

--
Simon South
simon@simonsouth.net


Simon South (1):
gnu: glibc: Preserve "__pthread_key_create" symbol.

gnu/packages/base.scm | 10 ++++++++++
1 file changed, 10 insertions(+)


base-commit: e2d8125a5c6d4338749e6bf8882f220395b25275
--
2.34.0
S
S
Simon South wrote on 11 Jan 00:39 +0100
control message for bug #53005
(address . control@debbugs.gnu.org)
87fspvduui.fsf@simonsouth.net
tags 53005 + patch
quit
L
L
Leo Famulari wrote on 12 Jan 20:21 +0100
Re: bug#53005: [PATCH 1/1] gnu: glibc: Preserve "__pthread_key_create" symbol.
(name . Simon South)(address . simon@simonsouth.net)(address . 53005@debbugs.gnu.org)
Yd8qUwGKBIZPG0P9@jasmine.lan
On Mon, Jan 10, 2022 at 06:34:26PM -0500, Simon South wrote:
Toggle quote (9 lines)
> Avoid a potential crash in multithreaded applications by preserving the
> pthread library's "__pthread_key_create" symbol, used by libgcc to detect the
> use of threads in an application.
>
> Fixes <https://issues.guix.gnu.org/53005>.
>
> * gnu/packages/base.scm (glibc)[arguments]: Add "#:strip-flags" with
> "--keep-symbol=__pthread_key_create" appended to the default set.

Thanks for analysing this bug and sending a patch.

Because the proposed fix changes glibc, it will require rebuilding the
entire distro. That's expensive, so, we need to think about it some more
before deciding what to do.

First, how was the LUKS2 volume created? Was it created by Guix System?
Is it the default type of LUKS volume created by Guix? I see that our
cryptsetup package has "with-default-luks-format=LUKS1". I'm trying to
understand how many users will be affected by this bug.

Second, do other distros have to apply the same workaround with
'--keep-symbol'? Like, is this problem widespread? Is Guix doing
something wrong that requires the workaround?

Sorry if you already answered these questions in your previous messages.
S
S
Simon South wrote on 12 Jan 22:22 +0100
(name . Leo Famulari)(address . leo@famulari.name)(address . 53005@debbugs.gnu.org)
878rvkodik.fsf@simonsouth.net
Leo Famulari <leo@famulari.name> writes:
Toggle quote (3 lines)
> First, how was the LUKS2 volume created? Was it created by Guix
> System?

No, this would've been a volume I created myself; I expect only users
who are partitioning their drives manually or replacing an existing
system would encounter this. The Guix manual actually instructs users
to select a different PBKDF algorithm[0] for compatibility with GRUB,
one that by coincidence does not appear to be affected[1].

However, remember this problem potentially affects _all_ packages that
use threads in C or C++. It appears that dynamically-linked executables
(i.e. the vast majority) generally sidestep the problem by avoiding any
"dangerous" methods in libgcc, but there could still be crashing bugs
elsewhere waiting to be discovered.

Toggle quote (2 lines)
> Is Guix doing something wrong that requires the workaround?

This is all a consequence of stripping libraries with the
"--strip-unneeded" option instead of "--strip-debug"[2], in the interest
of reducing store sizes. The man page for "strip" describes it like
this:

--strip-unneeded
Remove all symbols that are not needed for relocation processing.

My personal opinion is that this option makes sense for executables but
is too aggressive to use on libraries.

- Unlike executables, we generally want to do more with a library than
just relocate it.

- Providing a rich set of symbols is normally a desirable feature of a
library and not a bug.

- Only at link time is it possible to say which of a library's symbols
are truly relevant; therefore, the selection of symbols to retain
should logically be performed by the linker, not an automated step
during the library's packaging (outside of any link context).

Specifically, it's impossible for us to predict cases like this one,
where a symbol not obviously needed to use a library is nonetheless
relied on by an application.

However, Guix's gnu-build-system provides only a single "#:strip-flags"
argument that is applied to all of a package's binaries. So, as an
alternative solution, we could either extend that mechanism to allow
different sets of flags to be applied to different types of binaries, or
revert the changes (commits f32a6055a5 and e0f31baacc) altogether. I
didn't expect either of those options would be popular, and truthfully,
everything does appear to work okay (for now) with only this one change
to glibc.

[1] But then, neither GRUB nor the Guix installer are commonly used on
non-Intel systems.
[2] Originally proposed in https://issues.guix.gnu.org/42555

--
Simon South
simon@simonsouth.net
T
T
Tom Fitzhenry wrote on 1 May 11:49 +0200
(address . bug-guix@gnu.org)
e09677c4-78de-c095-55c8-9ff3c974885f@tom-fitzhenry.me.uk
On 13/1/22 08:22, Simon South wrote:
Toggle quote (4 lines)
> No, this would've been a volume I created myself; I expect only users
> who are partitioning their drives manually or replacing an existing
> system would encounter this.

+1. I encountered this while installing Guix on my pre-existing LUKS
volume that I share with another distro.

As a workaround to the immediate issue, I replaced my LUKS volume's
argon2 key with a pbkdf2 key, via `cryptsetup luksAddKey --pbkdf pbkdf2
$DEV` and `cryptsetup luksRemoveKey $DEV`. Changing cryptographic
algorithms introduces risk, which might not be right for everyone, however.
A
A
angry rectangle wrote on 19 Aug 05:31 +0200
cryptsetup-static aborts opening LUKS2 volume with Argon2i PBKDF
(address . 53005@debbugs.gnu.org)
87zgg0q4xq.fsf@cock.li
I too have been affected by this bug for a long time and hope it will be fixed one day.
For end-users, attached is a drop-in workaround you can use to make guix utilize the old cryptsetup build. It's sloppy, but at least the system can boot again.
Thanks for the research, Simon.
;; workaround for https://issues.guix.gnu.org/53005 ;; ;; How to use: ;; Run this code in your operating system config file. ;; Then replace `luks-device-mapping' with `alt-luks-device-mapping'. ;; ;; i may have forgotten some use-modules. use your brain if necessary. ;; to-do: it would be better to use package transformation procedures to fix cryptsetup from the current guix (use-modules (guix inferior) (guix channels) (srfi srfi-1)) (define channels (list (channel (name 'guix) (url "https://git.savannah.gnu.org/git/guix.git") (commit "0996fcc657593955845c2761d7eb0f656149fe11")))) (define inferior (inferior-for-channels channels)) (define old-cryptsetup-static (first (lookup-inferior-packages inferior "cryptsetup-static"))) (use-modules (gnu system uuid)) (use-modules (ice-9 match)) (use-modules (guix modules)) ;; copied from guix. ;; the whole point is to edit the `file-append' line. ;; (if i knew a way to modify gexp, this could simply modify the output of the old procedure.) (define (my-open-luks-device source targets) "Return a gexp that maps SOURCE to TARGET as a LUKS device, using 'cryptsetup'." (with-imported-modules (source-module-closure '((gnu build file-systems) (guix build utils))) ;; For mkdir-p (match targets ((target) #~(let ((source #$(if (uuid? source) (uuid-bytevector source) source))) ;; XXX: 'use-modules' should be at the top level. (use-modules (rnrs bytevectors) ;bytevector? ((gnu build file-systems) #:select (find-partition-by-luks-uuid system*/tty)) ((guix build utils) #:select (mkdir-p))) (mkdir-p "/run/cryptsetup/") (zero? (system*/tty #$(file-append old-cryptsetup-static "/sbin/cryptsetup") "open" "--type" "luks" (if (bytevector? source) (or (let loop ((tries-left 10)) (and (positive? tries-left) (or (find-partition-by-luks-uuid source) (begin (sleep 1) (loop (- tries-left 1)))))) (error "LUKS partition not found" source)) source) #$target))))))) (define alt-luks-device-mapping (mapped-device-kind (inherit luks-device-mapping) (open my-open-luks-device)))
?