nnpack is not reproducible

OpenSubmitted by Ludovic Courtès.
Details
3 participants
  • Ludovic Courtès
  • Ludovic Courtès
  • zimoun
Owner
unassigned
Severity
normal
L
L
Ludovic Courtès wrote on 19 Sep 11:57 +0200
python-pytorch is not reproducible
(address . bug-guix@gnu.org)
875yuwsxad.fsf@inria.fr
Bad news!
Toggle snippet (20 lines)$ guix challenge python-pytorch/gnu/store/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0 contents differ: no local build for '/gnu/store/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0' https://ci.guix.gnu.org/nar/lzip/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0: 0i55iwy3z4da4lhn93dnrmz775s9ga5kyfli6cmrchacacf9xfpq https://bordeaux.guix.gnu.org/nar/lzip/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0: 1fl2v4pd0gcw7wp5k662q0zd4lvvzsggcm5ii8b4kq4v6synhkic differing file: /lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
1 store items were analyzed: - 0 (0.0%) were identical - 1 (100.0%) differed - 0 (0.0%) were inconclusive$ guix describe Generacio 189 Aug 30 2021 12:09:27 (nuna) guix f91ae94 repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: f91ae9425bb385b60396a544afe27933896b8fa3
The file is 165┬áMiB and Diffoscope (which reads the output of ‘objdump’)takes forever on it.
However, by comparing the output of ‘strings’ on each file, we get ahint:
diff -ubBr --show-c-function /tmp/str2 /tmp/str1--- /tmp/str2 2021-09-19 11:14:47.806798779 +0200+++ /tmp/str1 2021-09-19 11:14:41.962761127 +0200@@ -1100584,472 +1100584,472 @@ compute_fast_convolution_input_gradient compute_grad_kernel_transform compute_fast_convolution_kernel_gradient.isra.0 compute_fast_convolution_output-nnp_fft8x8_with_offset_and_stream__avx2.__local0-nnp_fft8x8_with_offset_and_stream__avx2.__local13-nnp_fft8x8_with_offset_and_stream__avx2.__local18-nnp_fft8x8_with_offset_and_stream__avx2.__local1+nnp_fft8x8_with_offset_and_stream__avx2.__local5 nnp_fft8x8_with_offset_and_stream__avx2.__local16+nnp_fft8x8_with_offset_and_stream__avx2.__local6+nnp_fft8x8_with_offset_and_stream__avx2.__local11+nnp_fft8x8_with_offset_and_stream__avx2.__local0 nnp_fft8x8_with_offset_and_stream__avx2.__local2 nnp_fft8x8_with_offset_and_stream__avx2.__local7-nnp_fft8x8_with_offset_and_stream__avx2.__local17-nnp_fft8x8_with_offset_and_stream__avx2.__local10-nnp_fft8x8_with_offset_and_stream__avx2.__local8 nnp_fft8x8_with_offset_and_stream__avx2.__local15+nnp_fft8x8_with_offset_and_stream__avx2.__local8 nnp_fft8x8_with_offset_and_stream__avx2.__local3-nnp_fft8x8_with_offset_and_stream__avx2.__local6-nnp_fft8x8_with_offset_and_stream__avx2.__local14-nnp_fft8x8_with_offset_and_stream__avx2.__local9+nnp_fft8x8_with_offset_and_stream__avx2.__local1 nnp_fft8x8_with_offset_and_stream__avx2.__local4[…] nnp_shdotxf8__avx2.__local13-nnp_shdotxf8__avx2.__local15 nnp_shdotxf8__avx2.__local0+nnp_shdotxf8__avx2.__local9+nnp_shdotxf8__avx2.__local10+nnp_shdotxf8__avx2.__local11+nnp_shdotxf8__avx2.__local12+nnp_shdotxf8__avx2.__local2
This appears to come from NNPACK, one of the libraries that are stillbundled. These functions seem to be generated by Python scripts thatuse PeachPy, such as NNPACK/src/x86_64-fma/2d-fourier-8x8.py:
Toggle snippet (8 lines)for post_operation in ["stream", "store"]: fft8x8_arguments = (arg_t_pointer, arg_f_pointer, arg_t_stride, arg_f_stride, arg_row_count, arg_column_count, arg_row_offset, arg_column_offset) with Function("nnp_fft8x8_with_offset_and_{post_operation}__avx2".format(post_operation=post_operation), fft8x8_arguments, target=uarch.default + isa.fma3 + isa.avx2):[…]

The ‘__local’ bit in the name comes from PeachPy, in peachpy/name.py:
Toggle snippet (8 lines) suffixed_name = "__local" + str(suffix) for name_object in iter(unnamed_objects): # Generate a non-conflicting name by appending a suffix while suffixed_name in self.names: suffix += 1 suffixed_name = "__local" + str(suffix)
So the problem may be that these things get generated in parallel, andthus numbering is non-deterministic.
NNPACK/CMakeLists.txt has this bit to generate targets to build allthat:
Toggle snippet (10 lines) ADD_CUSTOM_COMMAND( OUTPUT ${obj} COMMAND "PYTHONPATH=${PEACHPY_PYTHONPATH}" ${PYTHON_EXECUTABLE} -m peachpy.x86_64 -mabi=sysv -g4 -mimage-format=${PEACHPY_IMAGE_FORMAT} "-I${PROJECT_SOURCE_DIR}/src" "-I${PROJECT_SOURCE_DIR}/src/x86_64-fma" "-I${FP16_SOURCE_DIR}/include" -o ${obj} "${PROJECT_SOURCE_DIR}/${src}" DEPENDS ${NNPACK_BACKEND_PEACHPY_OBJS})
It might be that building just those targets sequentially would solvethe problem.
To be continued…
Ludo’.
L
L
Ludovic Courtès wrote on 21 Sep 17:17 +0200
(address . 50672@debbugs.gnu.org)
87a6k6j6uu.fsf@gnu.org
Ludovic Courtès <ludovic.courtes@inria.fr> skribis:
Toggle quote (19 lines)> $ guix challenge python-pytorch> /gnu/store/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0 contents differ:> no local build for '/gnu/store/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0'> https://ci.guix.gnu.org/nar/lzip/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0: 0i55iwy3z4da4lhn93dnrmz775s9ga5kyfli6cmrchacacf9xfpq> https://bordeaux.guix.gnu.org/nar/lzip/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0: 1fl2v4pd0gcw7wp5k662q0zd4lvvzsggcm5ii8b4kq4v6synhkic> differing file:> /lib/python3.8/site-packages/torch/lib/libtorch_cpu.so>> 1 store items were analyzed:> - 0 (0.0%) were identical> - 1 (100.0%) differed> - 0 (0.0%) were inconclusive> $ guix describe > Generacio 189 Aug 30 2021 12:09:27 (nuna)> guix f91ae94> repository URL: https://git.savannah.gnu.org/git/guix.git> branch: master> commit: f91ae9425bb385b60396a544afe27933896b8fa3
L
L
Ludovic Courtès wrote on 24 Sep 16:04 +0200
(address . 50672@debbugs.gnu.org)
87tuiam5ni.fsf@gnu.org
Ludovic Courtès <ludovic.courtes@inria.fr> skribis:
Toggle quote (2 lines)> Reported upstream: https://github.com/pytorch/pytorch/issues/65404.
PyTorch upstream noted that the problem is in NNPACK, not PyTorchproper.
Having unbundled NNPACK in d326dec8115cf5e2cac9497633dc11ecc970361b, Ican confirm that PyTorch itself is now reproducible, but NNPACK isn’t.
Reported at https://github.com/Maratyszcza/NNPACK/issues/206.
Ludo’.
L
L
Ludovic Courtès wrote on 26 Sep 22:27 +0200
control message for bug #50672
(address . control@debbugs.gnu.org)
87h7e7krpx.fsf@gnu.org
retitle 50672 nnpack is not reproduciblequit
Z
Z
zimoun wrote on 27 Sep 15:25 +0200
Re: bug#50672: python-pytorch is not reproducible
(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)(address . 50672@debbugs.gnu.org)
CAJ3okZ203LfGLBsymxRTiZpJ2FQrQAFa-di4sZUytyjnz0KTsA@mail.gmail.com
Hi,
On Fri, 24 Sept 2021 at 16:11, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
Toggle quote (3 lines)> Having unbundled NNPACK in d326dec8115cf5e2cac9497633dc11ecc970361b, I> can confirm that PyTorch itself is now reproducible, but NNPACK isn’t.
I reproduce: "guix build nnpack --no-grafts --check" differs. Pytorch, not.
Toggle quote (3 lines)> PyTorch upstream noted that the problem is in NNPACK, not PyTorch> proper.
Closing this report?
However, I notice 2 things:
1- Unbundled dependencies are still fetched 2- Does the Git submodule mechanism work with the SWH fallback?
Toggle snippet (57 lines)Initialized empty Git repository in/gnu/store/…-python-pytorch-1.9.0-checkout/.git/From https://github.com/pytorch/pytorch * tag v1.9.0 -> FETCH_HEAD
[...]
HEAD is now at d69c22d [docs] Add torch.package documentation for betarelease (#59886)/gnu/store/…-bash-minimal-5.0.16/bin/sh: warning: setlocale: LC_ALL:cannot change locale (en_US.utf8)Submodule 'android/libs/fbjni'(https://github.com/facebookincubator/fbjni.git) registered for path'android/libs/fbjni'Submodule 'third_party/NNPACK_deps/FP16'(https://github.com/Maratyszcza/FP16.git) registered for path'third_party/FP16'Submodule 'third_party/NNPACK_deps/FXdiv'(https://github.com/Maratyszcza/FXdiv.git) registered for path'third_party/FXdiv'Submodule 'third_party/NNPACK'(https://github.com/Maratyszcza/NNPACK.git) registered for path'third_party/NNPACK'Submodule 'third_party/QNNPACK' (https://github.com/pytorch/QNNPACK)registered for path 'third_party/QNNPACK'Submodule 'third_party/XNNPACK'(https://github.com/google/XNNPACK.git) registered for path'third_party/XNNPACK'
[...]
Submodule 'third_party/NNPACK_deps/psimd'(https://github.com/Maratyszcza/psimd.git) registered for path'third_party/psimd'Submodule 'third_party/NNPACK_deps/pthreadpool'(https://github.com/Maratyszcza/pthreadpool.git) registered for path'third_party/pthreadpool'
[...]
Cloning into '/gnu/store/…-python-pytorch-1.9.0-checkout/third_party/NNPACK'...Cloning into '/gnu/store/…-python-pytorch-1.9.0-checkout/third_party/QNNPACK'...Cloning into '/gnu/store/…-python-pytorch-1.9.0-checkout/third_party/XNNPACK'...
[...]
Submodule path 'third_party/NNPACK': checked out'c07e3a0400713d546e0dea2d5466dd22ea389c73'Submodule path 'third_party/QNNPACK': checked out'7d2a4e9931a82adc3814275b6219a03e24e36b4c'Submodule path 'third_party/XNNPACK': checked out'55d53a4e7079d38e90acd75dd9e4f9e781d2da35'
[...]

Cheers,simon
L
L
Ludovic Courtès wrote on 28 Sep 11:24 +0200
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 50672@debbugs.gnu.org)
87ilylf3yn.fsf@inria.fr
Hi,
zimoun <zimon.toutoune@gmail.com> skribis:
Toggle quote (12 lines)> On Fri, 24 Sept 2021 at 16:11, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:>>> Having unbundled NNPACK in d326dec8115cf5e2cac9497633dc11ecc970361b, I>> can confirm that PyTorch itself is now reproducible, but NNPACK isn’t.>> I reproduce: "guix build nnpack --no-grafts --check" differs. Pytorch, not.>>> PyTorch upstream noted that the problem is in NNPACK, not PyTorch>> proper.>> Closing this report?
No, I’ve retitled it. Now looking at PeachPy:
https://github.com/Maratyszcza/PeachPy/issues/88
Toggle quote (4 lines)> However, I notice 2 things:>> 1- Unbundled dependencies are still fetched
Yes but the snippet wipes them right after.
Toggle quote (2 lines)> 2- Does the Git submodule mechanism work with the SWH fallback?
No, not yet; there’s a comment in (guix git-download). Fixing it shouldbe doable.
Thanks,Ludo’.
L
L
Ludovic Courtès wrote 21 minutes ago
Re: bug#50672: nnpack is not reproducible
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 50672@debbugs.gnu.org)
87pmrxt6tf.fsf_-_@gnu.org
Hi,
Ludovic Courtès <ludovic.courtes@inria.fr> skribis:
Toggle quote (4 lines)> No, I’ve retitled it. Now looking at PeachPy:>> https://github.com/Maratyszcza/PeachPy/issues/88
For the record, I tried the attached patch in an attempt to sort thingsas discussed in the issue above, but it doesn’t have the intendedeffect. There must be other unsorted dictionaries elsewhere.
Suggestions welcome!
Ludo’.
Toggle diff (15 lines)diff --git a/peachpy/name.py b/peachpy/name.pyindex b6a03dc..c069fc2 100644--- a/peachpy/name.py+++ b/peachpy/name.py@@ -95,6 +95,10 @@ class Namespace: self.prenames[scope_name.prename].add(scope) def assign_names(self):+ # Step 0: sort the dictionary for deterministic output+ self.prenames = dict(sorted(self.prenames.items(),+ key=lambda item: "" if item[0] == None else item[0]))+ # Step 1: assign names to symbols with prenames with no conflicts for prename in six.iterkeys(self.prenames): if prename is not None:
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send email to 50672@debbugs.gnu.org