ant-bootstrap@1.7.1 sometimes fails to build on core-updates

  • Done
  • quality assurance status badge
Details
4 participants
  • Björn Höfling
  • Gábor Boskovits
  • Chris Marusich
  • Ricardo Wurmus
Owner
unassigned
Submitted by
Chris Marusich
Severity
normal
C
C
Chris Marusich wrote on 14 Jan 2018 07:58
(address . bug-guix@gnu.org)
87o9lxapul.fsf@gmail.com
Hi,

At commit 1b321229f4653c5daa873813e24910789c0b2918 (i.e., the current
tip of the core-updates branch), ant-bootstrap@1.7.1 sometimes fails to
build. This package is defined in gnu/packages/java.scm, but it is not
exported (i.e., it is used privately within the module). Note that
according to 'guix refresh', currently 215 packages depend on this
package.

I tried to build this package 147 times. It failed 5 times, and it
succeeded 142 times. That's about a 3% failure rate. All 5 failures
produced the same log output, which I've attached. My machine is an
x86-64 machine.

--
Chris
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEy/WXVcvn5+/vGD+x3UCaFdgiRp0FAlpa/6IACgkQ3UCaFdgi
Rp22qA//cvuWMxHY48dPnmvyhd1B46YM6de9/e98Clo7ZOBI5VbROdraSzHd4tFg
XpauguJkOF1fnLqnYxAc3Ovq5sqcRAQAyaip1SqNcwJVKKBBmVA55e/ODIu1G1WD
/rVuBzgg1q9rqBNVSTu+GUwLl/EfMLmZpc9lwk/NZaGJGareJt7TGoKmFeOKGuIk
kZTZoltcNJFYZ2XKsqjgASl77rvIFw7qWAU8eP6jkeh4+xoIg029V4o31+bXU7Ud
4Tv+jKx9dIVcdK/fAPdpWR9RZpSvPkWDsKV953LS6D5fC+S7/Syzc7LJHh5ClFbB
/vfN9IzgKope8YC6sXLIQYH9AiNtHhgW1i1+hdGRzQVWSdsAtHJLIUhNek8YrkLl
u89Y6oCVfUPbBUuEkByE8kcFsNmJS7GM+N4WnXpY0jEVeRQSZ0r5pQew3Vmc06wG
Ncpk0GpDulJJLsVFasW8+F58Jyllnk2rtof0KscGxr80RJs28n9mLgfPeDrhf0gi
lDlovnpuVJHIhgkptQb2VJROgCuzo6SKhfIbU84JD7SChNjswhy0rREA9hR7hUy6
UHKMm22eW/7iCN1YN/h0GBM7XVt1LkkDzRVFwhgHiNCRHBpMJZ99ATdpPtEkPheC
ZsH+QiKFmTBaJK38nhJ6rDz1FIe28UIqzpUg/OFofavSVFs61ek=
=ezBb
-----END PGP SIGNATURE-----

G
G
Gábor Boskovits wrote on 14 Jan 2018 10:32
(name . Chris Marusich)(address . cmmarusich@gmail.com)(address . 30107@debbugs.gnu.org)
CAE4v=pg9Jk+xTj74zW1v1egzK_=w-JNkyXQoWyDXAdXB90No4w@mail.gmail.com
I've also seen this once. No idea so far.
I've tried to look around, but found no other mention of this issue.

2018-01-14 7:58 GMT+01:00 Chris Marusich <cmmarusich@gmail.com>:

Toggle quote (17 lines)
> Hi,
>
> At commit 1b321229f4653c5daa873813e24910789c0b2918 (i.e., the current
> tip of the core-updates branch), ant-bootstrap@1.7.1 sometimes fails to
> build. This package is defined in gnu/packages/java.scm, but it is not
> exported (i.e., it is used privately within the module). Note that
> according to 'guix refresh', currently 215 packages depend on this
> package.
>
> I tried to build this package 147 times. It failed 5 times, and it
> succeeded 142 times. That's about a 3% failure rate. All 5 failures
> produced the same log output, which I've attached. My machine is an
> x86-64 machine.
>
> --
> Chris
>
Attachment: file
G
G
Gábor Boskovits wrote on 18 Jan 2018 10:02
Another kind of failure
(address . 30107@debbugs.gnu.org)
CAE4v=pj1W=D-q4PEunrET2xPGZ-gZjHwQfC__QPJ2gHmBkb0SA@mail.gmail.com
I'm now on 6d49ca16be22e3fb95823ac1780ad9460a18b180.
I also observe another kind of failure now.

After output
Buildfile: build.xml

the build process hangs.

This is also indeterministic, however it is harder to specify the failure
rate here...
Attachment: file
B
B
Björn Höfling wrote on 26 Jan 2018 11:30
Backtrace
(address . 30107@debbugs.gnu.org)
20180126113029.16199095@alma-ubu
I managed to get a coredump and backtrace, but I'm not able to
extract any useful information. I never went that deep into C
programming. If anyone can get more out of this, attached is the
backtrace, register state and some disassembly.

Björn
Attachment: gdb.txt
B
B
Björn Höfling wrote on 3 Feb 2018 09:36
How I got the core dump
(address . 30107@debbugs.gnu.org)
20180203093626.6c927477@alma-ubu
On request, I'm writing here how I got to that coredump:

My first step was to investigate the build.sh, and I just patched it to
output the full command, stripping of the rest:

Toggle diff (106 lines)
diff --git a/bootstrap.sh b/bootstrap.x.sh
index bc54db4..f8c0720 100755
--- a/bootstrap.sh
+++ b/bootstrap.x.sh
@@ -151,18 +151,7 @@ cp src/script/antRun bin
chmod +x bin/antRun
echo ... Building Ant Distribution
-
-"${JAVACMD}" -classpath "${CLASSPATH}" -Dant.home=. $ANT_OPTS
org.apache.tools.ant.Main -emacs "$@" bootstrap -ret=$?
-if [ $ret != 0 ]; then
- echo ... Failed Building Ant Distribution !
- exit $ret
-fi
-
-
-echo ... Cleaning Up Build Directories
-
-rm -rf ${CLASSDIR}
-rm -rf bin
+echo I would do:
+echo "${JAVACMD}" -classpath "${CLASSPATH}" -Dant.home=. $ANT_OPTS
org.apache.tools.ant.Main -emacs "$@" bootstrap echo ... Done
Bootstrapping Ant Distribution

I added the patch into the package definition.

As I have learned yesterday, I could just repack the sources and use

guix --with-source=modified-ant.tar.gz ...

Anyway, I found out it calls:

/gnu/store/088bg6n5llvqn9j7d2740hhhilbqai4a-sablevm-1.13/bin/java-sablevm
-classpath build/classes:src/main:lib/xercesImpl.jar:lib/xml-apis.jar:
-Dant.home=. -Dbuild.compiler=jikes org.apache.tools.ant.Main -emacs
-Ddist.dir=/gnu/store/dxdsdsj4nz7fig92b2xjb7jf7swm5rni-ant-bootstrap-1.7.1
bootstrap

Next, I realized that my Ubuntu+Guix-on-top is eating up my core dumps:

$> cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c %d %P

So instead I went into my QEMU machine and continued there.

Set ulimit to unlimited:

ulimit -c unlimited

In sablevm, we need to get debugging infos into it:

Add to it's package definition's #arguments this one:

#:strip-binaries? #f

Rebuild it:

./pre-inst-env guix build -e '(@@ (gnu packages java) sablevm)'
--no-grafts --no-substitutes -K > sablevm.log 2>&1

Remove your failed builds /tmp/guix-build-* directories, if you have
any.

Then I looped through with this little bash script:

#!/bin/sh

ROUNDS=100

for i in `seq -w 0 $ROUNDS`; do
# DATE=${date}
# echo $DATE
echo -n $i..
./pre-inst-env guix build -e '(@@ (gnu packages java)
ant-bootstrap)' --no-grafts --no-substitutes --check -k -K >log-$i.log
2>&1

done;
echo

Then search in the logs:

grep Segementation log-*.log

Hopefully it finds one. Otherwise, repeat step above.
Check that it not onle Segfaults,but also has a "(core dumped)" behind
it. Otherwise, check your ulimit and corefile settings.

The coredump is in the /tmp/guix-build-ant..-n, where n coresponds to
your log-file number.

Finally, exporting the stack trace:

set logging on
set logging file backtrace.log
show logging
bt
info reg
quit


That's it.

Björn
R
R
Ricardo Wurmus wrote on 12 Feb 2018 16:15
control message for bug #30107
(address . control@debbugs.gnu.org)
E1elFpg-0005e3-Do@debbugs.gnu.org
tags 30107 fixed
close 30107
?