[PATCH] gnu: libtorrent-rasterbar: Work around hang in test_ssl.

  • Open
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Tomas Volf
Owner
unassigned
Submitted by
Tomas Volf
Severity
normal
T
T
Tomas Volf wrote on 9 Dec 2023 01:31
[PATCH] gnu: libtorrent-rasterbar: Remove timeout for tests.
(address . guix-patches@gnu.org)(name . Tomas Volf)(address . ~@wolfsden.cz)
d60c351dfffa2240ebe7efbf392f5aafe97fd9a0.1702081885.git.~@wolfsden.cz
The timeout is still enforced by the build farm for the build as a whole, so
it should not cause any builds to be permanently stuck.

* gnu/packages/bittorrent.scm
(libtorrent-rasterbar)[arguments]<#:phases>['check]: Remote test timeout.

Change-Id: I535c72fec24658a4b2151d2e8794319055c9a278
---
gnu/packages/bittorrent.scm | 17 +++++------------
1 file changed, 5 insertions(+), 12 deletions(-)

Toggle diff (46 lines)
diff --git a/gnu/packages/bittorrent.scm b/gnu/packages/bittorrent.scm
index 8c032940d4..5d7d05178b 100644
--- a/gnu/packages/bittorrent.scm
+++ b/gnu/packages/bittorrent.scm
@@ -470,7 +470,6 @@ (define-public libtorrent-rasterbar
(exclude-regex (string-append "^("
(string-join disabled-tests "|")
")$"))
- (timeout "600")
(jobs (if parallel-tests?
(number->string (parallel-job-count))
"1")))
@@ -478,7 +477,6 @@ (define-public libtorrent-rasterbar
(invoke "ctest"
"-E" exclude-regex
"-j" jobs
- "--timeout" timeout
"--output-on-failure")
;; test_ssl relies on bundled TLS certificates with a fixed
;; expiry date. To ensure succesful builds in the future,
@@ -488,16 +486,11 @@ (define-public libtorrent-rasterbar
;; test_fast_extension, test_privacy and test_resolve_links
;; to hang, even with FAKETIME_ONLY_CMDS. Not sure why. So
;; execute only test_ssl under faketime.
- ;;
- ;; Note: The test_ssl test times out in the ci.
- ;; Temporarily disable it until that is resolved.
- ;; (invoke "faketime" "2022-10-24"
- ;; "ctest"
- ;; "-R" "^test_ssl$"
- ;; "-j" jobs
- ;; "--timeout" timeout
- ;; "--output-on-failure")
- )))))))
+ (invoke "faketime" "2022-10-24"
+ "ctest"
+ "-R" "^test_ssl$"
+ "-j" jobs
+ "--output-on-failure"))))))))
(inputs (list boost openssl))
(native-inputs `(("libfaketime" ,libfaketime)
("python-wrapper" ,python-wrapper)

base-commit: 5e4c31518aba62b2cca7c346bcc56cfa9a4d10d0
--
2.41.0
T
T
Tomas Volf wrote on 9 Dec 2023 20:34
[PATCH v2] gnu: libtorrent-rasterbar: Remove timeout for tests.
(address . 67722@debbugs.gnu.org)(name . Tomas Volf)(address . ~@wolfsden.cz)
efda402a30daf66e5987b28bd124dbf6f552055e.1702150495.git.~@wolfsden.cz
The timeout is still enforced by the build farm for the build as a whole, so
it should not cause any builds to be permanently stuck.

* gnu/packages/bittorrent.scm
(libtorrent-rasterbar)[arguments]<#:phases>['check]: Remote test timeout.

Change-Id: I535c72fec24658a4b2151d2e8794319055c9a278
---
gnu/packages/bittorrent.scm | 3 ---
1 file changed, 3 deletions(-)

Toggle diff (32 lines)
diff --git a/gnu/packages/bittorrent.scm b/gnu/packages/bittorrent.scm
index 731c8e1c20..5d7d05178b 100644
--- a/gnu/packages/bittorrent.scm
+++ b/gnu/packages/bittorrent.scm
@@ -470,7 +470,6 @@ (define-public libtorrent-rasterbar
(exclude-regex (string-append "^("
(string-join disabled-tests "|")
")$"))
- (timeout "600")
(jobs (if parallel-tests?
(number->string (parallel-job-count))
"1")))
@@ -478,7 +477,6 @@ (define-public libtorrent-rasterbar
(invoke "ctest"
"-E" exclude-regex
"-j" jobs
- "--timeout" timeout
"--output-on-failure")
;; test_ssl relies on bundled TLS certificates with a fixed
;; expiry date. To ensure succesful builds in the future,
@@ -492,7 +490,6 @@ (define-public libtorrent-rasterbar
"ctest"
"-R" "^test_ssl$"
"-j" jobs
- "--timeout" timeout
"--output-on-failure"))))))))
(inputs (list boost openssl))
(native-inputs `(("libfaketime" ,libfaketime)

base-commit: 61f2d84e75c340c2ba528d392f522c51b8843f34
--
2.41.0
L
L
Ludovic Courtès wrote on 12 Dec 2023 09:11
(name . Tomas Volf)(address . ~@wolfsden.cz)(address . 67722@debbugs.gnu.org)
87fs07263o.fsf@gnu.org
Hello,

Tomas Volf <~@wolfsden.cz> skribis:

Toggle quote (8 lines)
> The timeout is still enforced by the build farm for the build as a whole, so
> it should not cause any builds to be permanently stuck.
>
> * gnu/packages/bittorrent.scm
> (libtorrent-rasterbar)[arguments]<#:phases>['check]: Remote test timeout.
>
> Change-Id: I535c72fec24658a4b2151d2e8794319055c9a278

[...]

Toggle quote (10 lines)
> - (timeout "600")
> (jobs (if parallel-tests?
> (number->string (parallel-job-count))
> "1")))
> @@ -478,7 +477,6 @@ (define-public libtorrent-rasterbar
> (invoke "ctest"
> "-E" exclude-regex
> "-j" jobs
> - "--timeout" timeout

What’s the rationale though?

If we know that tests, individually, are meant to take less than 10mn,
it still seems nicer to stop at 10mn rather than wait for the 1h
max-silent timeout, no?

Thanks,
Ludo’.
T
T
Tomas Volf wrote on 13 Dec 2023 00:03
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 67722@debbugs.gnu.org)
ZXjmqSjfm_ScMjuj@ws
On 2023-12-12 09:11:07 +0100, Ludovic Courtès wrote:
Toggle quote (6 lines)
> What’s the rationale though?
>
> If we know that tests, individually, are meant to take less than 10mn,
> it still seems nicer to stop at 10mn rather than wait for the 1h
> max-silent timeout, no?

Originally the rationale was to just try it out and see if it works in the CI or
not. The timeout was not originally there, I added it during fixing of the
tests so I wondered if it was a mistake. Since the QA is still in "Pending",
jury is still out on that one.

I do not know enough about the architecture and utilization of the build
machines to be sure, so one of my hypotheses was that the machine might have
been overloaded during the test run causing the timeout. So I wanted to test
it. (I sent this as #67693 at first, but I think the CI got confused by the WIP
prefix. I am not sure what is a way to mark patches intended to check the CI,
but not necessarily intended to be merged. Sorry you wasted the time reviewing
this, I did not expect anyone to look until the CI passes.)

In the mean time I was running the build locally to see if I can reproduce the
hang and I can. After running for couple of hours in a loop the build failed
with the timeout (not sure on what round, guix build --rounds does not tell
that).

So it seems like the test_ssl is just prone to sporadic failures. Since we both
succeeded in building locally, I assume we just got unlucky (or lucky?) in the
CI.

I am currently testing a v3, and once it passes --rounds=64 (which will take a
while) I will sent it as an updated patch.

Tomas

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmV45p8ACgkQL7/ufbZ/
wankyg//V8LTJwM0Vi7x93+zUP8uog5xRmZ1Q8RSXJ7uhG2kTspOAT79p8lTdgZX
qtqslC0BcPPV+m3V4wAJTV4lrdGLHmlW5qb7NZ29Ls/ommQYn4bc5rlJaBcJ9blF
z8pZxrrkLPWXm9AX1XOumohpdEf3AeaZqgUJQAZZzcBESzqBKhUas9yXrNsP6v4e
CyFjWiZOhEBPJIGym1BkCJidJw0r1RKawmzz+2ZaL8aIgMT9+4dm4FQK1LUyVngL
htICdhz+dKzcHcdDLf3eNs6pekXzNz0105kcKPwJxMKLY7i5jn56ojd2sxN+jB7r
DADFnIppq1W4cghxSjrk0GW84bxhHM/TkHgzZ6vVn188ZxF6454OQgnQoRqSHcQY
BQB/nw22E6Z37DlE6DI/W3ngHTLi/G3E49GHa3vWjcrYLZCbG2heM5CGft1Ct35Y
+TLvLaJDL4iEtPZrVgMBRZnYmZ4qK921widLZu0jHOfFC6YrY7XzHdsZJFR9+Tyv
kkydK3X1ICo3Afu6hCzCCA6u9zzxI6l9LU7vmPKJ2czbeMoYTtoOHK1y9OyjQf8P
5CFjYQfOBzOD2F2sjH9CKDJ7kmOrI6JOZZdeoE4z40M4t6DdYLGnbUQ8flJOjk77
D3jQH2WQTzFtUlydO/HEJrocJZG2Wx/dr5Fh7XpI9Hk6EMELrQY=
=m03P
-----END PGP SIGNATURE-----


T
T
Tomas Volf wrote on 13 Dec 2023 17:38
[PATCH v3] gnu: libtorrent-rasterbar: Work around hang in test_ssl.
(address . 67722@debbugs.gnu.org)(name . Tomas Volf)(address . ~@wolfsden.cz)
b0445de56c3dea2dfc3ca128d812a8e1bace3caa.1702485537.git.~@wolfsden.cz
test_ssl does sometimes hang (at least when executed under faketime). It is
somewhat unlikely to happen, and (on my machine) required a build with
--rounds=32 to reproduce it.

The workaround is to set somewhat lower timeout of 240s (expected test
duration * 5 rounded up to whole minutes) and retry few times on failure. In
this way, --rounds=64 finished successfully (on my machine).

At the same time remove the timeout from the other tests, since it is not
necessary (they do not hang), and one of them runs for ~270s (almost half the
original timeout), so it could posse a problem on slow/overloaded machine.

* gnu/packages/bittorrent.scm
(libtorrent-rasterbar)[arguments]<#:phases>['check]: Remote test timeout for
most tests. Lower the timeout for test_ssl. Retry test_ssl on failure.

Change-Id: I535c72fec24658a4b2151d2e8794319055c9a278
---
gnu/packages/bittorrent.scm | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

Toggle diff (37 lines)
diff --git a/gnu/packages/bittorrent.scm b/gnu/packages/bittorrent.scm
index 731c8e1c20..4585c3b088 100644
--- a/gnu/packages/bittorrent.scm
+++ b/gnu/packages/bittorrent.scm
@@ -470,7 +470,6 @@ (define-public libtorrent-rasterbar
(exclude-regex (string-append "^("
(string-join disabled-tests "|")
")$"))
- (timeout "600")
(jobs (if parallel-tests?
(number->string (parallel-job-count))
"1")))
@@ -478,7 +477,6 @@ (define-public libtorrent-rasterbar
(invoke "ctest"
"-E" exclude-regex
"-j" jobs
- "--timeout" timeout
"--output-on-failure")
;; test_ssl relies on bundled TLS certificates with a fixed
;; expiry date. To ensure succesful builds in the future,
@@ -492,7 +490,11 @@ (define-public libtorrent-rasterbar
"ctest"
"-R" "^test_ssl$"
"-j" jobs
- "--timeout" timeout
+ ;; test_ssl sometimes hangs (at least when run under
+ ;; faketime), therefore set a time limit and retry
+ ;; few times on failure.
+ "--timeout" "240"
+ "--repeat" "until-pass:5"
"--output-on-failure"))))))))
(inputs (list boost openssl))
(native-inputs `(("libfaketime" ,libfaketime)

base-commit: 1b2505217cf222d98cc960b8510660976a01cfa1
--
2.41.0
T
T
Tomas Volf wrote on 15 Dec 2023 12:32
[PATCH v4] gnu: libtorrent-rasterbar: Work around hang in test_ssl.
(address . 67722@debbugs.gnu.org)(name . Tomas Volf)(address . ~@wolfsden.cz)
50226d59dc4fa9deb686733dbbb6622c6194dcc1.1702639938.git.~@wolfsden.cz
test_ssl does sometimes hang (at least when executed under faketime). It is
somewhat unlikely to happen, and (on my machine) required a build with
--rounds=32 to reproduce it.

The workaround is to set somewhat lower timeout of 240s (expected test
duration * 5 rounded up to whole minutes) and retry few times on failure. In
this way, --rounds=64 finished successfully (on my machine).

At the same time remove the timeout from the other tests, since it is not
necessary (they do not hang), and one of them runs for ~270s (almost half the
original timeout), so it could posse a problem on slow/overloaded machine.

* gnu/packages/bittorrent.scm
(libtorrent-rasterbar)[arguments]<#:phases>['check]: Remote test timeout for
most tests. Lower the timeout for test_ssl. Retry test_ssl on failure.

Change-Id: I535c72fec24658a4b2151d2e8794319055c9a278
---
No changes, just rebase, resolving a merge conflict.

gnu/packages/bittorrent.scm | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

Toggle diff (49 lines)
diff --git a/gnu/packages/bittorrent.scm b/gnu/packages/bittorrent.scm
index 8c032940d4..4585c3b088 100644
--- a/gnu/packages/bittorrent.scm
+++ b/gnu/packages/bittorrent.scm
@@ -470,7 +470,6 @@ (define-public libtorrent-rasterbar
(exclude-regex (string-append "^("
(string-join disabled-tests "|")
")$"))
- (timeout "600")
(jobs (if parallel-tests?
(number->string (parallel-job-count))
"1")))
@@ -478,7 +477,6 @@ (define-public libtorrent-rasterbar
(invoke "ctest"
"-E" exclude-regex
"-j" jobs
- "--timeout" timeout
"--output-on-failure")
;; test_ssl relies on bundled TLS certificates with a fixed
;; expiry date. To ensure succesful builds in the future,
@@ -488,16 +486,16 @@ (define-public libtorrent-rasterbar
;; test_fast_extension, test_privacy and test_resolve_links
;; to hang, even with FAKETIME_ONLY_CMDS. Not sure why. So
;; execute only test_ssl under faketime.
- ;;
- ;; Note: The test_ssl test times out in the ci.
- ;; Temporarily disable it until that is resolved.
- ;; (invoke "faketime" "2022-10-24"
- ;; "ctest"
- ;; "-R" "^test_ssl$"
- ;; "-j" jobs
- ;; "--timeout" timeout
- ;; "--output-on-failure")
- )))))))
+ (invoke "faketime" "2022-10-24"
+ "ctest"
+ "-R" "^test_ssl$"
+ "-j" jobs
+ ;; test_ssl sometimes hangs (at least when run under
+ ;; faketime), therefore set a time limit and retry
+ ;; few times on failure.
+ "--timeout" "240"
+ "--repeat" "until-pass:5"
+ "--output-on-failure"))))))))
(inputs (list boost openssl))
(native-inputs `(("libfaketime" ,libfaketime)
("python-wrapper" ,python-wrapper)

base-commit: b681e339fa37f2a26763458ee56b31af1d6a7ec5
--
2.41.0
T
T
Tomas Volf wrote on 16 Jan 14:25 +0100
control message for bug #67722
(address . control@debbugs.gnu.org)
0ea26b76803477f9289e853c97730d23@wolfsden.cz
retitle 67722 [PATCH] gnu: libtorrent-rasterbar: Work around hang in test_ssl.
quit
T
T
Tomas Volf wrote on 16 Jan 14:27 +0100
Re: [bug#67722] [PATCH v4] gnu: libtorrent-rasterbar: Work around hang in test_ssl.
(address . 67722@debbugs.gnu.org)
87ply1qugv.fsf@wolfsden.cz
Polite ping. Would anyone have time to look into this?
?