[PATCH 00/10] Tuning packages for CPU micro-architectures

DoneSubmitted by Ludovic Courtès.
Details
5 participants
  • Thiago Jung Bauermann
  • Ludovic Courtès
  • Ludovic Courtès
  • Mathieu Othacehe
  • zimoun
Owner
unassigned
Severity
normal
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:34
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211204203447.15200-1-ludo@gnu.org
Hello Guix!

This patch series is an attempt to allow users to build or
substitute packages for the very CPU they are using, as opposed
to using a generic binary that targets the baseline
architecture—e.g., x86_64 without AVX extensions.

As a reminder, my take on this is that The Right Thing is for
code to select optimized implementations for the host CPU at
load time, using (possibly hand-crafted) “function multi-versioning”:


Now, there’s at least one situation where developers don’t do
“the right thing”: C++ header-only libraries. It turns out
header-only libraries with #ifdef’d SIMD code are quite common:
Eigen, xsimd, xtensor, etc. Every user of those libs has to be
compiled with ‘-march=native’ to take advantage of those
SIMD-optimized routines and there’s little hope of seeing those
libraries implement load-time or run-time selection¹.

This patch set implements “package multi-versioning”, where a package
can have different variants users may choose from: baseline, haswell,
skylake, etc. This is implemented as a package transformation option,
‘--tune’. Without any argument, ‘--tune’ grafts tuned package variants
for each package that has the ‘tunable?’ property. For example:

guix shell eigen-benchmarks --tune -- benchBlasGemm 16 16 16 100 100

runs one of the Eigen benchmarks tuned for the host CPU, because
‘eigen-benchmarks’ is marked as “tunable”.

This is achieved not by passing ‘-march=native’, because the daemon
might be running on a separate machine with a different CPU, but by
identifying the ‘-march’ value corresponding to the host CPU and
passing ‘-march’ to the compiler, via a wrapper.

On my skylake laptop, that gives a noticeable difference on the GEMM
benchmark of Eigen and good results on the xtensor benchmarks too,
unsurprisingly. I don’t have figures for higher-level applications,
but it’d be nice to benchmark some of Eigen’s dependents for instance,
as shown by:

guix graph -M2 -t reverse-package eigen | xdot -f fdp -

If you could run such benchmarks, that’d be great! :-)
Things like Fenics may benefit from it.

Nix people chose to introduce separate system types for the various
x86_64 micro-architecture levels: x86_64-linux-v1, x86_64-linux-v2,
etc.² I think this is somewhat wasteful and unpractical though.
It’s also unclear whether those levels, defined in the new x86_64
psABI³, are a viable abstraction: vendors seem to be mixing features
rather than really following the accumulative pattern that those
levels imply.

Thoughts?

Ludo’.


Ludovic Courtès (10):
Add (guix cpu).
transformations: Add '--tune'.
ci: Add extra jobs for tunable packages.
gnu: Add eigen-benchmarks.
gnu: Add xsimd-benchmark.
gnu: Add xtensor-benchmark.
gnu: ceres-solver: Mark as tunable.
gnu: Add ceres-solver-benchmarks.
gnu: libfive: Mark as tunable.
gnu: prusa-slicer: Mark as tunable.

Makefile.am | 1 +
doc/guix.texi | 54 ++++++++++++++
gnu/ci.scm | 43 ++++++++---
gnu/packages/algebra.scm | 79 ++++++++++++++++++++
gnu/packages/cpp.scm | 23 ++++++
gnu/packages/engineering.scm | 10 ++-
gnu/packages/maths.scm | 49 ++++++++++++-
guix/cpu.scm | 137 +++++++++++++++++++++++++++++++++++
guix/transformations.scm | 134 ++++++++++++++++++++++++++++++++++
tests/transformations.scm | 20 +++++
10 files changed, 538 insertions(+), 12 deletions(-)
create mode 100644 guix/cpu.scm


base-commit: 052f56e5a614854636563278ee5a2248b3609d87
prerequisite-patch-id: 7e5c2bb5942496daf01a7f6dfc1b0b5b214f1584
--
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:49
[PATCH 01/10] Add (guix cpu).
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211204204924.15581-1-ludo@gnu.org
* guix/cpu.scm: New file.
* Makefile.am (MODULES): Add it.
---
Makefile.am | 1 +
guix/cpu.scm | 137 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 138 insertions(+)
create mode 100644 guix/cpu.scm

Toggle diff (157 lines)
diff --git a/Makefile.am b/Makefile.am
index f7e7b5184f..0846818cc2 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -81,6 +81,7 @@ MODULES =					\
   guix/base64.scm				\
   guix/ci.scm					\
   guix/cpio.scm					\
+  guix/cpu.scm					\
   guix/deprecation.scm				\
   guix/docker.scm	   			\
   guix/records.scm				\
diff --git a/guix/cpu.scm b/guix/cpu.scm
new file mode 100644
index 0000000000..77efac92a2
--- /dev/null
+++ b/guix/cpu.scm
@@ -0,0 +1,137 @@
+;;; GNU Guix --- Functional package management for GNU
+;;; Copyright © 2021 Ludovic Courtès <ludo@gnu.org>
+;;;
+;;; This file is part of GNU Guix.
+;;;
+;;; GNU Guix is free software; you can redistribute it and/or modify it
+;;; under the terms of the GNU General Public License as published by
+;;; the Free Software Foundation; either version 3 of the License, or (at
+;;; your option) any later version.
+;;;
+;;; GNU Guix is distributed in the hope that it will be useful, but
+;;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;;; GNU General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU General Public License
+;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.
+
+(define-module (guix cpu)
+  #:use-module (guix sets)
+  #:use-module (guix memoization)
+  #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-9)
+  #:use-module (ice-9 match)
+  #:use-module (ice-9 rdelim)
+  #:export (current-cpu
+            cpu?
+            cpu-architecture
+            cpu-family
+            cpu-model
+            cpu-flags
+
+            cpu->gcc-architecture))
+
+;;; Commentary:
+;;;
+;;; This module provides tools to determine the micro-architecture supported
+;;; by the CPU and to map it to a name known to GCC's '-march'.
+;;;
+;;; Code:
+
+;; CPU description.
+(define-record-type <cpu>
+  (cpu architecture family model flags)
+  cpu?
+  (architecture cpu-architecture)                 ;string, from 'uname'
+  (family       cpu-family)                       ;integer
+  (model        cpu-model)                        ;integer
+  (flags        cpu-flags))                       ;set of strings
+
+(define current-cpu
+  (mlambda ()
+    "Return a <cpu> record representing the host CPU."
+    (define (prefix? prefix)
+      (lambda (str)
+        (string-prefix? prefix str)))
+
+    (call-with-input-file "/proc/cpuinfo"
+      (lambda (port)
+        (let loop ((family #f)
+                   (model #f))
+          (match (read-line port)
+            ((? eof-object?)
+             #f)
+            ((? (prefix? "cpu family") str)
+             (match (string-tokenize str)
+               (("cpu" "family" ":" family)
+                (loop (string->number family) model))))
+            ((? (prefix? "model") str)
+             (match (string-tokenize str)
+               (("model" ":" model)
+                (loop family (string->number model)))
+               (_
+                (loop family model))))
+            ((? (prefix? "flags") str)
+             (match (string-tokenize str)
+               (("flags" ":" flags ...)
+                (cpu (utsname:machine (uname))
+                     family model (list->set flags)))))
+            (_
+             (loop family model))))))))
+
+(define (cpu->gcc-architecture cpu)
+  "Return the architecture name, suitable for GCC's '-march' flag, that
+corresponds to CPU, a record as returned by 'current-cpu'."
+  (match (cpu-architecture cpu)
+    ("x86_64"
+     ;; Transcribed from GCC's 'host_detect_local_cpu' in driver-i386.c.
+     (or (and (= 6 (cpu-family cpu))              ;the "Pentium Pro" family
+              (letrec-syntax ((model (syntax-rules (=>)
+                                       ((_) #f)
+                                       ((_ (candidate => integers ...) rest
+                                           ...)
+                                        (or (and (= (cpu-model cpu) integers)
+                                                 candidate)
+                                            ...
+                                            (model rest ...))))))
+                (model ("bonnel" => #x1c #x26)
+                       ("silvermont" => #x37 #x4a #x4d #x5a #x5d)
+                       ("core2" => #x0f #x17 #x1d)
+                       ("nehalem" => #x1a #x1e #x1f #x2e)
+                       ("westmere" => #x25 #x2c #x2f)
+                       ("sandybridge" => #x2a #x2d)
+                       ("ivybridge" => #x3a #x3e)
+                       ("haswell" => #x3c #x3f #x45 #x46)
+                       ("broadwell" => #x3d #x47 #x4f #x56)
+                       ("skylake" => #x4e #x5e #x8e #x9e)
+                       ("skylake-avx512" => #x55) ;TODO: cascadelake
+                       ("knl" => #x57)
+                       ("cannonlake" => #x66)
+                       ("knm" => #x85))))
+
+         ;; Fallback case for non-Intel processors or for Intel processors not
+         ;; recognized above.
+         (letrec-syntax ((if-flags (syntax-rules (=>)
+                                     ((_)
+                                      #f)
+                                     ((_ (flags ... => name) rest ...)
+                                      (if (every (lambda (flag)
+                                                   (set-contains? (cpu-flags cpu)
+                                                                  flag))
+                                                 '(flags ...))
+                                          name
+                                          (if-flags rest ...))))))
+           (if-flags ("avx512" => "knl")
+                     ("adx" => "broadwell")
+                     ("avx2" => "haswell")
+                     ;; TODO: tigerlake, cooperlake, etc.
+                     ("avx" => "sandybridge")
+                     ("sse4_2" "movbe" => "silvermont")
+                     ("sse4_2" => "nehalem")
+                     ("ssse3" "movbe" => "bonnell")
+                     ("ssse3" => "core2")))
+         "x86_64"))
+    (architecture
+     ;; TODO: AArch64.
+     architecture)))
-- 
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:49
[PATCH 03/10] ci: Add extra jobs for tunable packages.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211204204924.15581-3-ludo@gnu.org
This allows us to provide substitutes for tuned package variants.

* gnu/ci.scm (package-job): Add #:suffix and honor it.
(package->job): Add #:suffix and honor it.
(%x86-64-micro-architectures): New variable.
(tuned-package-jobs): New procedure.
(cuirass-jobs): Add jobs for tunable packages.
---
gnu/ci.scm | 43 ++++++++++++++++++++++++++++++++++---------
1 file changed, 34 insertions(+), 9 deletions(-)

Toggle diff (90 lines)
diff --git a/gnu/ci.scm b/gnu/ci.scm
index e1011355db..2f56554d93 100644
--- a/gnu/ci.scm
+++ b/gnu/ci.scm
@@ -28,6 +28,7 @@ (define-module (gnu ci)
   #:use-module (guix grafts)
   #:use-module (guix profiles)
   #:use-module (guix packages)
+  #:autoload   (guix transformations) (tunable-package? tuned-package)
   #:use-module (guix channels)
   #:use-module (guix config)
   #:use-module (guix derivations)
@@ -108,9 +109,9 @@ (define* (derivation->job name drv
     (#:timeout . ,timeout)))
 
 (define* (package-job store job-name package system
-                      #:key cross? target)
+                      #:key cross? target (suffix ""))
   "Return a job called JOB-NAME that builds PACKAGE on SYSTEM."
-  (let ((job-name (string-append job-name "." system)))
+  (let ((job-name (string-append job-name "." system suffix)))
     (parameterize ((%graft? #f))
       (let* ((drv (if cross?
                       (package-cross-derivation store package target system
@@ -395,21 +396,39 @@ (define package->job
                            (((_ inputs _ ...) ...)
                             inputs))))
                       (%final-inputs)))))
-    (lambda (store package system)
+    (lambda* (store package system #:key (suffix ""))
       "Return a job for PACKAGE on SYSTEM, or #f if this combination is not
-valid."
+valid.  Append SUFFIX to the job name."
       (cond ((member package base-packages)
              (package-job store (string-append "base." (job-name package))
-                          package system))
+                          package system #:suffix suffix))
             ((supported-package? package system)
              (let ((drv (package-derivation store package system
                                             #:graft? #f)))
                (and (substitutable-derivation? drv)
                     (package-job store (job-name package)
-                                 package system))))
+                                 package system #:suffix suffix))))
             (else
              #f)))))
 
+(define %x86-64-micro-architectures
+  ;; Micro-architectures for which we build tuned variants.
+  '("westmere" "ivybridge" "haswell" "skylake" "skylake-avx512"))
+
+(define (tuned-package-jobs store package system)
+  "Return a list of jobs for PACKAGE tuned for SYSTEM's micro-architectures."
+  (filter-map (lambda (micro-architecture)
+                (define suffix
+                  (string-append "." micro-architecture))
+
+                (package->job store
+                              (tuned-package package micro-architecture)
+                              system
+                              #:suffix suffix))
+              (match system
+                ("x86_64-linux" %x86-64-micro-architectures)
+                (_ '()))))
+
 (define (all-packages)
   "Return the list of packages to build."
   (define (adjust package result)
@@ -527,10 +546,16 @@ (define source
          ('all
           ;; Build everything, including replacements.
           (let ((all (all-packages))
-                (job (lambda (package)
-                       (package->job store package system))))
+                (jobs (lambda (package)
+                        (match (package->job store package system)
+                          (#f '())
+                          (main-job
+                           (cons main-job
+                                 (if (tunable-package? package)
+                                     (tuned-package-jobs store package system)
+                                     '())))))))
             (append
-             (filter-map job all)
+             (append-map jobs all)
              (cross-jobs store system))))
          ('core
           ;; Build core packages only.
-- 
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:49
[PATCH 02/10] transformations: Add '--tune'.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
20211204204924.15581-2-ludo@gnu.org
From: Ludovic Courtès <ludovic.courtes@inria.fr>

* guix/transformations.scm (tuning-compiler)
(tuned-package, tunable-package?, package-tuning)
(transform-package-tuning): New procedures.
(%transformations): Add 'tune'.
(%transformation-options): Add "--tune".
* tests/transformations.scm ("options->transformation, tune"): New
test.
* doc/guix.texi (Package Transformation Options): Document '--tune'.
---
doc/guix.texi | 54 +++++++++++++++
guix/transformations.scm | 134 ++++++++++++++++++++++++++++++++++++++
tests/transformations.scm | 20 ++++++
3 files changed, 208 insertions(+)

Toggle diff (271 lines)
diff --git a/doc/guix.texi b/doc/guix.texi
index a675631b79..e3aca8fd3b 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -10906,6 +10906,60 @@ available options and a synopsis (these options are not shown in the
 
 @table @code
 
+@cindex performance, tuning code
+@cindex optimization, of package code
+@cindex tuning, of package code
+@cindex SIMD support
+@cindex tunable packages
+@cindex package multi-versioning
+@item --tune[=@var{cpu}]
+Use versions of the packages marked as ``tunable'' optimized for
+@var{cpu}.  When @var{cpu} is @code{native}, or when it is omitted, tune
+for the CPU on which the @command{guix} command is running.
+
+Valid @var{cpu} names are those recognized by GCC, the GNU Compiler
+Collection.  On x86_64 processors, this includes CPU names such as
+@code{nehalem}, @code{haswell}, and @code{skylake} (@pxref{x86 Options,
+@code{-march},, gcc, Using the GNU Compiler Collection (GCC)}).
+
+As new generations of CPUs come out, they augment the standard
+instruction set architecture (ISA) with additional instructions, in
+particular instructions for single-instruction/multiple-data (SIMD)
+parallel processing.  For example, while Core2 and Skylake CPUs both
+implement the x86_64 ISA, only the latter supports AVX2 SIMD
+instructions.
+
+The primary gain one can expect from @option{--tune} is for programs
+that can make use of those SIMD capabilities @emph{and} that do not
+already have a mechanism to select the right optimized code at run time.
+Packages that have the @code{tunable?} property set are considered
+@dfn{tunable packages} by the @option{--tune} option; a package
+definition with the property set looks like this:
+
+@lisp
+(package
+  (name "hello-simd")
+  ;; ...
+
+  ;; This package may benefit from SIMD extensions so
+  ;; mark it as "tunable".
+  (properties '((tunable? . #t))))
+@end lisp
+
+Other packages are not considered tunable.  This allows Guix to use
+generic binaries in the cases where tuning for a specific CPU is
+unlikely to provide any gain.
+
+Tuned packages are @emph{grafted} onto packages that depend on them
+(@pxref{Security Updates, grafts}).  Thus, using @option{--no-grafts}
+annihilates the effect of @option{--tune}.
+
+We call this technique @dfn{package multi-versioning}: several variants
+of tunable packages may be built, one for each CPU variant.  It is the
+coarse-grain counterpart of @dfn{function multi-versioning} as
+implemented by the GNU tool chain (@pxref{Function Multiversioning,,,
+gcc, Using the GNU Compiler Collection (GCC)}).
+
 @item --with-source=@var{source}
 @itemx --with-source=@var{package}=@var{source}
 @itemx --with-source=@var{package}@@@var{version}=@var{source}
diff --git a/guix/transformations.scm b/guix/transformations.scm
index 5ae1977cb2..3be02179ef 100644
--- a/guix/transformations.scm
+++ b/guix/transformations.scm
@@ -29,6 +29,7 @@ (define-module (guix transformations)
   #:autoload   (guix upstream) (package-latest-release
                                 upstream-source-version
                                 upstream-source-signature-urls)
+  #:autoload   (guix cpu) (current-cpu cpu->gcc-architecture)
   #:use-module (guix utils)
   #:use-module (guix memoization)
   #:use-module (guix gexp)
@@ -49,6 +50,9 @@ (define-module (guix transformations)
   #:export (options->transformation
             manifest-entry-with-transformations
 
+            tunable-package?
+            tuned-package
+
             show-transformation-options-help
             %transformation-options))
 
@@ -419,6 +423,120 @@ (define replacements
             obj)
         obj)))
 
+(define tuning-compiler
+  (mlambda (micro-architecture)
+    "Return a compiler wrapper that passes '-march=MICRO-ARCHITECTURE' to the
+actual compiler."
+    (define wrapper
+      #~(begin
+          (use-modules (ice-9 match))
+
+          (define* (search-next command
+                                #:optional
+                                (path (string-split (getenv "PATH")
+                                                    #\:)))
+            ;; Search the next COMMAND on PATH, a list of
+            ;; directories representing the executable search path.
+            (define this
+              (stat (car (command-line))))
+
+            (let loop ((path path))
+              (match path
+                (()
+                 (match command
+                   ("cc" (search-next "gcc"))
+                   (_ #f)))
+                ((directory rest ...)
+                 (let* ((file (string-append
+                               directory "/" command))
+                        (st   (stat file #f)))
+                   (if (and st (not (equal? this st)))
+                       file
+                       (loop rest)))))))
+
+          (match (command-line)
+            ((command arguments ...)
+             (match (search-next (basename command))
+               (#f (exit 127))
+               (next
+                (apply execl next
+                       (append (cons next arguments)
+                           (list (string-append "-march="
+                                                #$micro-architecture))))))))))
+
+    (define program
+      (program-file (string-append "tuning-compiler-wrapper-" micro-architecture)
+                    wrapper))
+
+    (computed-file (string-append "tuning-compiler-" micro-architecture)
+                   (with-imported-modules '((guix build utils))
+                     #~(begin
+                         (use-modules (guix build utils))
+
+                         (define bin (string-append #$output "/bin"))
+                         (mkdir-p bin)
+
+                         (for-each (lambda (program)
+                                     (symlink #$program
+                                              (string-append bin "/" program)))
+                                   '("cc" "gcc" "clang" "g++" "c++" "clang++")))))))
+
+(define (tuned-package p micro-architecture)
+  "Return package P tuned for MICRO-ARCHITECTURE."
+  (define compiler
+    (tuning-compiler micro-architecture))
+
+  (package
+    (inherit p)
+    (native-inputs
+     ;; Arrange so that COMPILER comes first in $PATH.
+     `(("tuning-compiler" ,compiler)
+       ,@(package-native-inputs p)))
+    (arguments
+     (substitute-keyword-arguments (package-arguments p)
+       ((#:tests? _ #f) #f)))
+    (properties
+     `((cpu-tuning . ,micro-architecture)
+       ,@(package-properties p)))))
+
+(define (tunable-package? package)
+  "Return true if package PACKAGE is \"tunable\"--i.e., if tuning it for the
+host CPU is worthwhile."
+  (assq 'tunable? (package-properties package)))
+
+(define package-tuning
+  (mlambda (micro-architecture)
+    "Return a procedure that maps the given package to its counterpart tuned
+for MICRO-ARCHITECTURE, a string suitable for GCC's '-march'."
+    (define rewriting-property
+      (gensym " package-tuning"))
+
+    (package-mapping (lambda (p)
+                       (cond ((assq rewriting-property (package-properties p))
+                              p)
+                             ((assq 'tunable? (package-properties p))
+                              (package/inherit p
+                                (replacement (tuned-package p micro-architecture))
+                                (properties `((,rewriting-property . #t)
+                                              ,@(package-properties p)))))
+                             (else
+                              p)))
+                     (lambda (p)
+                       (assq rewriting-property (package-properties p)))
+                     #:deep? #t)))
+
+(define (transform-package-tuning micro-architectures)
+  "Return a procedure that, when "
+  (match micro-architectures
+    ((micro-architecture _ ...)
+     (info (G_ "tuning for CPU micro-architecture ~a~%")
+           micro-architecture)
+     (let ((rewrite (package-tuning micro-architecture)))
+       (lambda (obj)
+         (if (package? obj)
+             (rewrite obj)
+             obj))))))
+
 (define (transform-package-with-debug-info specs)
   "Return a procedure that, when passed a package, set its 'replacement' field
 to the same package but with #:strip-binaries? #f in its 'arguments' field."
@@ -601,6 +719,7 @@ (define %transformations
     (with-commit . ,transform-package-source-commit)
     (with-git-url . ,transform-package-source-git-url)
     (with-c-toolchain . ,transform-package-toolchain)
+    (tune . ,transform-package-tuning)
     (with-debug-info . ,transform-package-with-debug-info)
     (without-tests . ,transform-package-tests)
     (with-patch  . ,transform-package-patches)
@@ -640,6 +759,21 @@ (define %transformation-options
                   (parser 'with-git-url))
           (option '("with-c-toolchain") #t #f
                   (parser 'with-c-toolchain))
+          (option '("tune") #f #t
+                  (lambda (opt name arg result . rest)
+                    (define micro-architecture
+                      (match arg
+                        ((or #f "native")
+                         (cpu->gcc-architecture (current-cpu)))
+                        ("generic" #f)
+                        (_ arg)))
+
+                    (apply values
+                           (if micro-architecture
+                               (alist-cons 'tune micro-architecture
+                                           result)
+                               (alist-delete 'tune result))
+                           rest)))
           (option '("with-debug-info") #t #f
                   (parser 'with-debug-info))
           (option '("without-tests") #t #f
diff --git a/tests/transformations.scm b/tests/transformations.scm
index 09839dc1c5..760b523e6e 100644
--- a/tests/transformations.scm
+++ b/tests/transformations.scm
@@ -465,6 +465,26 @@ (define (package-name* obj)
                    `((with-latest . "foo")))))
           (package-version (t p)))))
 
+(test-equal "options->transformation, tune"
+  '(cpu-tuning . "superfast")
+  (let* ((p0 (dummy-package "p0"))
+         (p1 (dummy-package "p1"
+               (inputs `(("p0" ,p0)))
+               (properties '((tunable? . #t)))))
+         (p2 (dummy-package "p2"
+               (inputs `(("p1" ,p1)))))
+         (t  (options->transformation '((tune . "superfast"))))
+         (p3 (t p2)))
+    (and (not (package-replacement p3))
+         (match (package-inputs p3)
+           ((("p1" tuned))
+            (match (package-inputs tuned)
+              ((("p0" p0))
+               (and (not (package-replacement p0))
+                    (assq 'cpu-tuning
+                          (package-properties
+                           (package-replacement tuned)))))))))))
+
 (test-equal "options->transformation + package->manifest-entry"
   '((transformations . ((without-tests . "foo"))))
   (let* ((p (dummy-package "foo"))
-- 
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:49
[PATCH 04/10] gnu: Add eigen-benchmarks.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211204204924.15581-4-ludo@gnu.org
* gnu/packages/algebra.scm (eigen-benchmarks): New variable.
---
gnu/packages/algebra.scm | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)

Toggle diff (52 lines)
diff --git a/gnu/packages/algebra.scm b/gnu/packages/algebra.scm
index b704d98dde..a782f8b1be 100644
--- a/gnu/packages/algebra.scm
+++ b/gnu/packages/algebra.scm
@@ -1074,6 +1074,45 @@ (define-public eigen
     ;; See 'COPYING.README' for details.
     (license license:mpl2.0)))
 
+(define-public eigen-benchmarks
+  (package
+    (inherit eigen)
+    (name "eigen-benchmarks")
+    (arguments
+     '(#:phases (modify-phases %standard-phases
+                  (delete 'configure)
+                  (replace 'build
+                    (lambda* (#:key outputs #:allow-other-keys)
+                      (let* ((out (assoc-ref outputs "out"))
+                             (bin (string-append out "/bin")))
+                        (define (compile file)
+                          (format #t "compiling '~a'...~%" file)
+                          (let ((target
+                                 (string-append bin "/"
+                                                (basename file ".cpp"))))
+                            (invoke "g++" "-o" target file
+                                    "-I" ".." "-O2" "-g"
+                                    "-lopenblas" "-Wl,--as-needed")))
+
+                        (mkdir-p bin)
+                        (with-directory-excursion "bench"
+                          ;; There are more benchmarks, of varying quality.
+                          ;; Here we pick some that appear to be useful.
+                          (for-each compile
+                                    '("benchBlasGemm.cpp"
+                                      "benchCholesky.cpp"
+                                      ;;"benchEigenSolver.cpp"
+                                      "benchFFT.cpp"
+                                      "benchmark-blocking-sizes.cpp"))))))
+                  (delete 'install))))
+    (inputs `(("boost" ,boost)
+              ("openblas" ,openblas)))
+
+    ;; Mark as tunable to take advantage of SIMD code in Eigen.
+    (properties '((tunable? . #t)))
+
+    (synopsis "Micro-benchmarks of the Eigen linear algebra library")))
+
 (define-public eigen-for-tensorflow
   (let ((changeset "fd6845384b86")
         (revision "1"))
-- 
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:49
[PATCH 05/10] gnu: Add xsimd-benchmark.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211204204924.15581-5-ludo@gnu.org
* gnu/packages/cpp.scm (xsmimd-benchmark): New variable.
---
gnu/packages/cpp.scm | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)

Toggle diff (36 lines)
diff --git a/gnu/packages/cpp.scm b/gnu/packages/cpp.scm
index e2f2279418..0bf65ed364 100644
--- a/gnu/packages/cpp.scm
+++ b/gnu/packages/cpp.scm
@@ -300,6 +300,29 @@ (define-public xsimd
 operating on batches.")
     (license license:bsd-3)))
 
+(define-public xsmimd-benchmark
+  (package
+    (inherit xsimd)
+    (name "xsimd-benchmark")
+    (arguments
+     `(#:configure-flags (list "-DBUILD_BENCHMARK=ON")
+       #:tests? #f
+       #:phases (modify-phases %standard-phases
+                  (add-after 'unpack 'remove-march=native
+                    (lambda _
+                      (substitute* "benchmark/CMakeLists.txt"
+                        (("-march=native") ""))))
+                  (replace 'install
+                    (lambda* (#:key outputs #:allow-other-keys)
+                      ;; Install nothing but the executable.
+                      (let ((out (assoc-ref outputs "out")))
+                        (install-file "benchmark/benchmark_xsimd"
+                                      (string-append out "/bin"))))))))
+    (synopsis "Benchmark of the xsimd library")
+
+    ;; Mark as tunable to take advantage of SIMD code in xsimd/xtensor.
+    (properties '((tunable? . #t)))))
+
 (define-public chaiscript
   (package
     (name "chaiscript")
-- 
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:49
[PATCH 06/10] gnu: Add xtensor-benchmark.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211204204924.15581-6-ludo@gnu.org
* gnu/packages/algebra.scm (xtensor-benchmark): New variable.
---
gnu/packages/algebra.scm | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

Toggle diff (53 lines)
diff --git a/gnu/packages/algebra.scm b/gnu/packages/algebra.scm
index a782f8b1be..4e3f8298a2 100644
--- a/gnu/packages/algebra.scm
+++ b/gnu/packages/algebra.scm
@@ -1200,6 +1200,46 @@ (define-public xtensor
 @end itemize")
     (license license:bsd-3)))
 
+(define-public xtensor-benchmark
+  (package
+    (inherit xtensor)
+    (name "xtensor-benchmark")
+    (arguments
+     `(#:configure-flags (list "-DBUILD_BENCHMARK=ON"
+                               "-DDOWNLOAD_GBENCHMARK=OFF")
+       #:tests? #f
+       #:phases (modify-phases %standard-phases
+                  (add-after 'unpack 'remove-march=native
+                    (lambda _
+                      (substitute* "benchmark/CMakeLists.txt"
+                        (("-march=native") ""))))
+                  (add-after 'unpack 'link-with-googlebenchmark
+                    (lambda _
+                      (substitute* "benchmark/CMakeLists.txt"
+                        (("find_package\\(benchmark.*" all)
+                         (string-append
+                          all "\n"
+                          "set(GBENCHMARK_LIBRARIES benchmark)\n")))))
+                  (replace 'build
+                    (lambda _
+                      (invoke "make" "benchmark_xtensor" "-j"
+                              (number->string (parallel-job-count)))))
+                  (replace 'install
+                    (lambda* (#:key outputs #:allow-other-keys)
+                      ;; Install nothing but the executable.
+                      (let ((out (assoc-ref outputs "out")))
+                        (install-file "benchmark/benchmark_xtensor"
+                                      (string-append out "/bin"))))))))
+    (synopsis "Benchmarks of the xtensor library")
+    (native-inputs '())
+    (inputs
+     `(("googlebenchmark" ,googlebenchmark)
+       ("xsimd" ,xsimd)
+       ,@(package-native-inputs xtensor)))
+
+    ;; Mark as tunable to take advantage of SIMD code in xsimd/xtensor.
+    (properties '((tunable? . #t)))))
+
 (define-public gap
   (package
     (name "gap")
-- 
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:49
[PATCH 07/10] gnu: ceres-solver: Mark as tunable.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211204204924.15581-7-ludo@gnu.org
* gnu/packages/maths.scm (ceres)[properties]: New field.
---
gnu/packages/maths.scm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Toggle diff (18 lines)
diff --git a/gnu/packages/maths.scm b/gnu/packages/maths.scm
index 83f31c1396..19dab598c6 100644
--- a/gnu/packages/maths.scm
+++ b/gnu/packages/maths.scm
@@ -2396,7 +2396,10 @@ (define-public ceres
 @item non-linear least squares problems with bounds constraints;
 @item general unconstrained optimization problems.
 @end enumerate\n")
-    (license license:bsd-3)))
+    (license license:bsd-3)
+
+    ;; Mark as tunable to take advantage of SIMD code in Eigen.
+    (properties `((tunable? . #t)))))
 
 ;; For a fully featured Octave, users are strongly recommended also to install
 ;; the following packages: less, ghostscript, gnuplot.
-- 
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:49
[PATCH 08/10] gnu: Add ceres-solver-benchmarks.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211204204924.15581-8-ludo@gnu.org
* gnu/packages/maths.scm (ceres-solver-benchmarks): New variable.
---
gnu/packages/maths.scm | 44 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)

Toggle diff (57 lines)
diff --git a/gnu/packages/maths.scm b/gnu/packages/maths.scm
index 19dab598c6..06bdb8ed92 100644
--- a/gnu/packages/maths.scm
+++ b/gnu/packages/maths.scm
@@ -2401,6 +2401,50 @@ (define-public ceres
     ;; Mark as tunable to take advantage of SIMD code in Eigen.
     (properties `((tunable? . #t)))))
 
+(define-public ceres-solver-benchmarks
+  (package
+    (inherit ceres)
+    (name "ceres-solver-benchmarks")
+    (arguments
+     '(#:modules ((ice-9 popen)
+                  (ice-9 rdelim)
+                  (guix build utils)
+                  (guix build cmake-build-system))
+
+       #:phases (modify-phases %standard-phases
+                  (delete 'configure)
+                  (replace 'build
+                    (lambda* (#:key outputs #:allow-other-keys)
+                      (let* ((out (assoc-ref outputs "out"))
+                             (bin (string-append out "/bin")))
+                        (define flags
+                          (string-tokenize
+                           (read-line (open-pipe* OPEN_READ
+                                                  "pkg-config" "eigen3"
+                                                  "--cflags"))))
+
+                        (define (compile-file file)
+                          (let ((source (string-append file ".cc")))
+                            (format #t "building '~a'...~%" file)
+                            (apply invoke "c++" "-fopenmp" "-O2" "-g" "-DNDEBUG"
+                                   source "-lceres" "-lbenchmark" "-lglog"
+                                   "-pthread"
+                                   "-o" (string-append bin "/" file)
+                                   "-I" ".." flags)))
+
+                        (mkdir-p bin)
+                        (with-directory-excursion "internal/ceres"
+                          (for-each compile-file
+                                    '("small_blas_gemm_benchmark"
+                                      "small_blas_gemv_benchmark"
+                                      "autodiff_cost_function_benchmark"))))))
+                  (delete 'check)
+                  (delete 'install))))
+    (inputs `(("googlebenchmark" ,googlebenchmark)
+              ("ceres-solver" ,ceres)
+              ,@(package-inputs ceres)))
+    (synopsis "Benchmarks of the Ceres optimization problem solver")))
+
 ;; For a fully featured Octave, users are strongly recommended also to install
 ;; the following packages: less, ghostscript, gnuplot.
 (define-public octave-cli
-- 
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:49
[PATCH 09/10] gnu: libfive: Mark as tunable.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211204204924.15581-9-ludo@gnu.org
* gnu/packages/engineering.scm (libfive)[properties]: New field.
---
gnu/packages/engineering.scm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Toggle diff (18 lines)
diff --git a/gnu/packages/engineering.scm b/gnu/packages/engineering.scm
index 50f265f085..4a749f5ecf 100644
--- a/gnu/packages/engineering.scm
+++ b/gnu/packages/engineering.scm
@@ -837,7 +837,10 @@ (define-public libfive
 Even fundamental, primitive shapes are represented as code in the user-level
 language.")
       (license (list license:mpl2.0               ;library
-                     license:gpl2+)))))           ;Guile bindings and GUI
+                     license:gpl2+))              ;Guile bindings and GUI
+
+      ;; Mark as tunable to take advantage of SIMD code in Eigen.
+      (properties '((tunable? . #t))))))
 
 (define-public inspekt3d
   (let ((commit "703f52ccbfedad2bf5240bf8183d1b573c9d54ef")
-- 
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 21:49
[PATCH 10/10] gnu: prusa-slicer: Mark as tunable.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211204204924.15581-10-ludo@gnu.org
* gnu/packages/engineering.scm (prusa-slicer)[properties]: New field.
---
gnu/packages/engineering.scm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Toggle diff (15 lines)
diff --git a/gnu/packages/engineering.scm b/gnu/packages/engineering.scm
index 4a749f5ecf..7e8c042653 100644
--- a/gnu/packages/engineering.scm
+++ b/gnu/packages/engineering.scm
@@ -3070,4 +3070,7 @@ (define-public prusa-slicer
     (synopsis "G-code generator for 3D printers (RepRap, Makerbot, Ultimaker etc.)")
     (description "PrusaSlicer takes 3D models (STL, OBJ, AMF) and converts them into
 G-code instructions for FFF printers or PNG layers for mSLA 3D printers.")
-    (license license:agpl3)))
+    (license license:agpl3)
+
+    ;; Mark as tunable to take advantage of SIMD code in Eigen and in libigl.
+    (properties '((tunable? . #t)))))
-- 
2.33.0
L
L
Ludovic Courtès wrote on 4 Dec 2021 22:11
Re: bug#52283: [PATCH 00/10] Tuning packages for CPU micro-architectures
(address . 52283@debbugs.gnu.org)
87tufoulbe.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (5 lines)
> This is achieved not by passing ‘-march=native’, because the daemon
> might be running on a separate machine with a different CPU, but by
> identifying the ‘-march’ value corresponding to the host CPU and
> passing ‘-march’ to the compiler, via a wrapper.

Another argument in favor of this approach is verifiability, because
manifests record the argument to the ‘tune’ transformation option:

Toggle snippet (19 lines)
$ ./pre-inst-env guix shell eigen-benchmarks --tune
guix shell: tuning for CPU micro-architecture skylake
[env]$ guix package --export-manifest -p $GUIX_ENVIRONMENT
;; This "manifest" file can be passed to 'guix package -m' to reproduce
;; the content of your profile. This is "symbolic": it only specifies
;; package names. To reproduce the exact same profile, you also need to
;; capture the channels being used, as returned by "guix describe".
;; See the "Replicating Guix" section in the manual.

(use-modules (guix transformations))

(define transform1
(options->transformation '((tune . "skylake"))))

(packages->manifest
(list (transform1
(specification->package "eigen-benchmarks"))))

Ludo’.
M
M
Mathieu Othacehe wrote on 5 Dec 2021 09:37
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 52283@debbugs.gnu.org)
87czmb1m8a.fsf_-_@gnu.org
Hey Ludo,

Wooh, nice addition!

Toggle quote (8 lines)
> +(define-record-type <cpu>
> + (cpu architecture family model flags)
> + cpu?
> + (architecture cpu-architecture) ;string, from 'uname'
> + (family cpu-family) ;integer
> + (model cpu-model) ;integer
> + (flags cpu-flags)) ;set of strings

When using the "--tune" transformation option with "native", we can
expect the current-cpu method to fill the <cpu> record correctly.

However, when the user is passing a custom cpu name, it might be
incorrect. I think we should check the user input against a list of
valid/supported cpu architectures.

That's something we should also enforce for the system and target
fields. Currently, this command "guix build -s arch64-linux hello" is
failing with an unpleasant backtrace, while it could warn that the
given system is not supported.

Maybe the (guix cpu) and (gnu platform) modules should be merged somehow
to define the supported CPU micro-architectures:

Toggle snippet (8 lines)
(define armv7-linux
(platform
(target "arm-linux-gnueabihf")
(system "armhf-linux")
(linux-architecture "arm")
(supported-march '("armv7" "armv7-a" "armv7ve"))

we could then use those platform records in the (gnu ci) module to build
packages against all the supported micro architectures and remove the
"%x86-64-micro-architecture" variable you propose to introduce there.

WDYT?

Thanks,

Mathieu
L
L
Ludovic Courtès wrote on 6 Dec 2021 11:38
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 52283@debbugs.gnu.org)
87zgpeqaq5.fsf_-_@gnu.org
Hello!

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (2 lines)
> Wooh, nice addition!

Glad you like it. :-)

Toggle quote (20 lines)
>> +(define-record-type <cpu>
>> + (cpu architecture family model flags)
>> + cpu?
>> + (architecture cpu-architecture) ;string, from 'uname'
>> + (family cpu-family) ;integer
>> + (model cpu-model) ;integer
>> + (flags cpu-flags)) ;set of strings
>
> When using the "--tune" transformation option with "native", we can
> expect the current-cpu method to fill the <cpu> record correctly.
>
> However, when the user is passing a custom cpu name, it might be
> incorrect. I think we should check the user input against a list of
> valid/supported cpu architectures.
>
> That's something we should also enforce for the system and target
> fields. Currently, this command "guix build -s arch64-linux hello" is
> failing with an unpleasant backtrace, while it could warn that the
> given system is not supported.

Right. I’m a bit torn because I agree with the usability issue and
solution you propose, but at the same time I know that maintaining a
list of existing CPU names will be tedious and it’ll be annoying for
users if they can’t just specify their CPU name (which they might want
to do precisely when ‘--tune=native’ doesn’t determine the right name
because it doesn’t know about it yet.)

Maybe it’s an acceptable limitation though.

I’ll see how I can tweak the code so that the CPU detection code and the
micro-architecture name validation code can share a single list of
names.

Toggle quote (14 lines)
> Maybe the (guix cpu) and (gnu platform) modules should be merged somehow
> to define the supported CPU micro-architectures:
>
> (define armv7-linux
> (platform
> (target "arm-linux-gnueabihf")
> (system "armhf-linux")
> (linux-architecture "arm")
> (supported-march '("armv7" "armv7-a" "armv7ve"))
>
> we could then use those platform records in the (gnu ci) module to build
> packages against all the supported micro architectures and remove the
> "%x86-64-micro-architecture" variable you propose to introduce there.

Hmm yeah, but it should be (guix platforms) then…

Maybe that’s a broader refactoring we can keep for later? I agree it
would be logical but I’m not sure how to nicely factorize things.

Thanks,
Ludo’.
Z
Z
zimoun wrote on 6 Dec 2021 13:47
Re: [bug#52283] [PATCH 00/10] Tuning packages for CPU micro-architectures
(address . 52283@debbugs.gnu.org)
86zgpdoq7n.fsf@gmail.com
Hi Ludo,

Really cool! Thanks!

On Mon, 06 Dec 2021 at 11:38, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
Toggle quote (17 lines)
> Mathieu Othacehe <othacehe@gnu.org> skribis:

>>> +(define-record-type <cpu>
>>> + (cpu architecture family model flags)
>>> + cpu?
>>> + (architecture cpu-architecture) ;string, from 'uname'
>>> + (family cpu-family) ;integer
>>> + (model cpu-model) ;integer
>>> + (flags cpu-flags)) ;set of strings
>>
>> When using the "--tune" transformation option with "native", we can
>> expect the current-cpu method to fill the <cpu> record correctly.
>>
>> However, when the user is passing a custom cpu name, it might be
>> incorrect. I think we should check the user input against a list of
>> valid/supported cpu architectures.

[...]

Toggle quote (7 lines)
> Right. I’m a bit torn because I agree with the usability issue and
> solution you propose, but at the same time I know that maintaining a
> list of existing CPU names will be tedious and it’ll be annoying for
> users if they can’t just specify their CPU name (which they might want
> to do precisely when ‘--tune=native’ doesn’t determine the right name
> because it doesn’t know about it yet.)

I have not looked at all the details but this list of existing CPU name
is not somehow already maintained, no?

Toggle snippet (32 lines)
+(define (cpu->gcc-architecture cpu)
+ "Return the architecture name, suitable for GCC's '-march' flag, that
+corresponds to CPU, a record as returned by 'current-cpu'."
+ (match (cpu-architecture cpu)
+ ("x86_64"
+ ;; Transcribed from GCC's 'host_detect_local_cpu' in driver-i386.c.
+ (or (and (= 6 (cpu-family cpu)) ;the "Pentium Pro" family
+ (letrec-syntax ((model (syntax-rules (=>)
+ ((_) #f)
+ ((_ (candidate => integers ...) rest
+ ...)
+ (or (and (= (cpu-model cpu) integers)
+ candidate)
+ ...
+ (model rest ...))))))
+ (model ("bonnel" => #x1c #x26)
+ ("silvermont" => #x37 #x4a #x4d #x5a #x5d)
+ ("core2" => #x0f #x17 #x1d)
+ ("nehalem" => #x1a #x1e #x1f #x2e)
+ ("westmere" => #x25 #x2c #x2f)
+ ("sandybridge" => #x2a #x2d)
+ ("ivybridge" => #x3a #x3e)
+ ("haswell" => #x3c #x3f #x45 #x46)
+ ("broadwell" => #x3d #x47 #x4f #x56)
+ ("skylake" => #x4e #x5e #x8e #x9e)
+ ("skylake-avx512" => #x55) ;TODO: cascadelake
+ ("knl" => #x57)
+ ("cannonlake" => #x66)
+ ("knm" => #x85))))


Toggle quote (19 lines)
>> Maybe the (guix cpu) and (gnu platform) modules should be merged somehow
>> to define the supported CPU micro-architectures:
>>
>> (define armv7-linux
>> (platform
>> (target "arm-linux-gnueabihf")
>> (system "armhf-linux")
>> (linux-architecture "arm")
>> (supported-march '("armv7" "armv7-a" "armv7ve"))
>>
>> we could then use those platform records in the (gnu ci) module to build
>> packages against all the supported micro architectures and remove the
>> "%x86-64-micro-architecture" variable you propose to introduce there.
>
> Hmm yeah, but it should be (guix platforms) then…
>
> Maybe that’s a broader refactoring we can keep for later? I agree it
> would be logical but I’m not sure how to nicely factorize things.

Yeah, I am always annoyed for the arguments of ’-s’ vs ’-t’, aside the
ugly backtrace. :-) The same (as we do elsewhere) is to somehow have
options ’--list-systems’ and ’--list-targets’ and handle incorrect
values; similar to “guix lint” for checkers or “guix graph” for types or
backends, etc. With potentially some hints. :-)

I also agree that’s unrelated to the current series. :-) This
refactoring could happen later, IMHO.


Cheers,
simon
L
L
Ludovic Courtès wrote on 6 Dec 2021 17:48
Re: bug#52283: [PATCH 00/10] Tuning packages for CPU micro-architectures
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 52283@debbugs.gnu.org)
87o85tllw8.fsf_-_@gnu.org
Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (4 lines)
> However, when the user is passing a custom cpu name, it might be
> incorrect. I think we should check the user input against a list of
> valid/supported cpu architectures.

BTW, there’s another constraint: the list of valid names depends on the
compiler used. GCC 11 recognizes ‘x86-64-v[1234]’ for instance, but
earlier versions do not.

We could hard-code the list of known identifiers for the default GCC,
but if users resort to ‘--with-c-toolchain’ to get a newer toolchain,
they won’t be able to use the newer CPU identifiers.

Maybe an acceptable drawback.

Ludo’.
T
T
Thiago Jung Bauermann wrote on 7 Dec 2021 00:18
Re: [bug#52283] [PATCH 02/10] transformations: Add '--tune'.
7364829.rrRS6mQaNJ@popigai
Hello Ludo,

Awesome series! I only have comments about this patch, and then only minor
ones:

Em sábado, 4 de dezembro de 2021, às 17:49:16 -03, Ludovic Courtès
escreveu:
Toggle quote (4 lines)
> +Tuned packages are @emph{grafted} onto packages that depend on them
> +(@pxref{Security Updates, grafts}). Thus, using @option{--no-grafts}
> +annihilates the effect of @option{--tune}.

Perhaps this is because English isn’t my first language, but annihilation
seems like a violent and dramatic effect in a package transformation. :-)

Perhaps reword as “cancels”, “invalidates” or "nullifies"?

Toggle quote (15 lines)
> +(define (tuned-package p micro-architecture)
> + "Return package P tuned for MICRO-ARCHITECTURE."
> + (define compiler
> + (tuning-compiler micro-architecture))
> +
> + (package
> + (inherit p)
> + (native-inputs
> + ;; Arrange so that COMPILER comes first in $PATH.
> + `(("tuning-compiler" ,compiler)
> + ,@(package-native-inputs p)))
> + (arguments
> + (substitute-keyword-arguments (package-arguments p)
> + ((#:tests? _ #f) #f)))

Perhaps I’m reading this wrong, but it looks like tuned packages don’t run
their testsuites? If so, this is a surprising side-effect and thus it would
be nice to have it mentioned in the manual, possibly also in a comment
here. It would be nice to also mention the rationale for disabling the
tests (not sure whether only in a comment here or if in the manual as
well). I assume it’s for convenience, but I’m not sure.

--
Thanks,
Thiago
L
L
Ludovic Courtès wrote on 7 Dec 2021 09:04
(name . Thiago Jung Bauermann)(address . bauermann@kolabnow.com)(address . 52283@debbugs.gnu.org)
871r2olu1j.fsf@inria.fr
Hi Thiago,

Thiago Jung Bauermann <bauermann@kolabnow.com> skribis:

Toggle quote (11 lines)
> Em sábado, 4 de dezembro de 2021, às 17:49:16 -03, Ludovic Courtès
> escreveu:
>> +Tuned packages are @emph{grafted} onto packages that depend on them
>> +(@pxref{Security Updates, grafts}). Thus, using @option{--no-grafts}
>> +annihilates the effect of @option{--tune}.
>
> Perhaps this is because English isn’t my first language, but annihilation
> seems like a violent and dramatic effect in a package transformation. :-)
>
> Perhaps reword as “cancels”, “invalidates” or "nullifies"?

Not a native speaker either but yes, “cancels” sounds better; I’ll
change that.

Toggle quote (22 lines)
>> +(define (tuned-package p micro-architecture)
>> + "Return package P tuned for MICRO-ARCHITECTURE."
>> + (define compiler
>> + (tuning-compiler micro-architecture))
>> +
>> + (package
>> + (inherit p)
>> + (native-inputs
>> + ;; Arrange so that COMPILER comes first in $PATH.
>> + `(("tuning-compiler" ,compiler)
>> + ,@(package-native-inputs p)))
>> + (arguments
>> + (substitute-keyword-arguments (package-arguments p)
>> + ((#:tests? _ #f) #f)))
>
> Perhaps I’m reading this wrong, but it looks like tuned packages don’t run
> their testsuites? If so, this is a surprising side-effect and thus it would
> be nice to have it mentioned in the manual, possibly also in a comment
> here. It would be nice to also mention the rationale for disabling the
> tests (not sure whether only in a comment here or if in the manual as
> well). I assume it’s for convenience, but I’m not sure.

I agree, a comment and maybe a sentence in the manual would be welcome.

The reason the test suite is skipped is because we cannot know for sure
whether the machine that hosts the daemon is able to run code for this
specific micro-architecture.

The test suite runs in the “baseline” package build anyway, so assuming
the compiler works fine, skipping the test suite on tuned builds is
okay.

Thanks for your feedback!

Ludo’.
M
M
Mathieu Othacehe wrote on 7 Dec 2021 09:39
Re: bug#52283: [PATCH 00/10] Tuning packages for CPU micro-architectures
(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)(address . 52283@debbugs.gnu.org)
87lf0wst9i.fsf_-_@gnu.org
Hey,

Toggle quote (5 lines)
> Hmm yeah, but it should be (guix platforms) then…
>
> Maybe that’s a broader refactoring we can keep for later? I agree it
> would be logical but I’m not sure how to nicely factorize things.

Yes sure, I agree that this refactoring can be done later just something
that we can keep in mind. Having a look to Nix, looks like they are
also maintaining some kind of architecture list:


Thanks,

Mathieu
L
L
Ludovic Courtès wrote on 7 Dec 2021 10:02
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 52283@debbugs.gnu.org)
87sfv4kcsp.fsf@inria.fr
Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (6 lines)
> Yes sure, I agree that this refactoring can be done later just something
> that we can keep in mind. Having a look to Nix, looks like they are
> also maintaining some kind of architecture list:
>
> https://github.com/NixOS/nixpkgs/blob/master/lib/systems/architectures.nix

Interesting. The list of features might be an idealized view, compared
what I’ve seen in GCC.

I wonder how this is supposed to be used. Their compiler wrapper
(build-support/cc-wrapper/default.nix) passes
‘-march=${targetPlatform.gcc.arch}’ so maybe users can somehow override
that ‘gcc.arch’ attribute? Any Nix-savvy person here?

It also has a nice compatibility list:

Toggle snippet (32 lines)
# older compilers (for example bootstrap's GCC 5) fail with -march=too-modern-cpu
isGccArchSupported = arch:
if isGNU then
{ # Intel
skylake = versionAtLeast ccVersion "6.0";
skylake-avx512 = versionAtLeast ccVersion "6.0";
cannonlake = versionAtLeast ccVersion "8.0";
icelake-client = versionAtLeast ccVersion "8.0";
icelake-server = versionAtLeast ccVersion "8.0";
cascadelake = versionAtLeast ccVersion "9.0";
cooperlake = versionAtLeast ccVersion "10.0";
tigerlake = versionAtLeast ccVersion "10.0";
knm = versionAtLeast ccVersion "8.0";
# AMD
znver1 = versionAtLeast ccVersion "6.0";
znver2 = versionAtLeast ccVersion "9.0";
znver3 = versionAtLeast ccVersion "11.0";
}.${arch} or true
else if isClang then
{ # Intel
cannonlake = versionAtLeast ccVersion "5.0";
icelake-client = versionAtLeast ccVersion "7.0";
icelake-server = versionAtLeast ccVersion "7.0";
knm = versionAtLeast ccVersion "7.0";
# AMD
znver1 = versionAtLeast ccVersion "4.0";
znver2 = versionAtLeast ccVersion "9.0";
}.${arch} or true
else
false;

The compiler wrapper in this patch series doesn’t know what compiler
it’s wrapping (it’s just calling the next one in $PATH), so it can’t
really do this sort of things.

We could do it differently but I liked the simplicity of just dropping
the wrapper in front of $PATH.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 7 Dec 2021 10:13
(address . 52283@debbugs.gnu.org)
87lf0wkcam.fsf@gnu.org
Hi!

To make it easier to test, I pushed this v1 as ‘wip-cpu-tuning’ so one
can run, say:

guix time-machine --branch=wip-cpu-tuning -- \
shell eigen-benchmarks --tune -- \
benchBlasGemm 16 16 16 100 100

Ludo’.
Z
Z
zimoun wrote on 7 Dec 2021 11:32
Re: [bug#52283] [PATCH 02/10] transformations: Add '--tune'.
(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
CAJ3okZ1ZH0zQV=cp3dRRL3AfLTmMccJHGL_sHyN4nEi6=pjRUA@mail.gmail.com
Hi,

On Tue, 7 Dec 2021 at 09:06, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:

Toggle quote (4 lines)
> The reason the test suite is skipped is because we cannot know for sure
> whether the machine that hosts the daemon is able to run code for this
> specific micro-architecture.

Naive question: is it possible to effectively run it via emulation?

Toggle quote (4 lines)
> The test suite runs in the “baseline” package build anyway, so assuming
> the compiler works fine, skipping the test suite on tuned builds is
> okay.

I miss if the test suite is effectively run somewhere? And "baseline"
package build means the package built for generic architecture, right?


Cheers,
simon

PS:
My questions are coming from Julia packages in mind, where the test
suite is the only way to know all is fine. And many times, add System
Image for Julia had been discussed and basically this System Image is
precompilation (generic one or specialized for micro-architecture).
Therefore, maybe this new 'tune' transformation would fit the bill.
:-)

L
L
Ludovic Courtès wrote on 7 Dec 2021 15:52
(name . zimoun)(address . zimon.toutoune@gmail.com)
875ys0jwlf.fsf@inria.fr
zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (8 lines)
> On Tue, 7 Dec 2021 at 09:06, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
>
>> The reason the test suite is skipped is because we cannot know for sure
>> whether the machine that hosts the daemon is able to run code for this
>> specific micro-architecture.
>
> Naive question: is it possible to effectively run it via emulation?

Not to my knowledge.

Toggle quote (6 lines)
>> The test suite runs in the “baseline” package build anyway, so assuming
>> the compiler works fine, skipping the test suite on tuned builds is
>> okay.
>
> I miss if the test suite is effectively run somewhere?

Yes, for the default/generic/baseline package, when not using ‘--tune’.

Toggle quote (3 lines)
> And "baseline" package build means the package built for generic
> architecture, right?

Correct.

Toggle quote (9 lines)
> My questions are coming from Julia packages in mind, where the test
> suite is the only way to know all is fine. And many times, add System
> Image for Julia had been discussed and basically this System Image is
> precompilation (generic one or specialized for micro-architecture).
> Therefore, maybe this new 'tune' transformation would fit the bill.
> :-)
>
> https://docs.julialang.org/en/v1/devdocs/sysimg/

According to this page, ‘--tune’ won’t be necessary here because Julia
supports function multi-versioning for its “system image”:

The system image can be compiled simultaneously for multiple CPU
microarchitectures under the same instruction set architecture (ISA).
Multiple versions of the same function may be created with minimum
dispatch point inserted into shared functions in order to take
advantage of different ISA extensions or other microarchitecture
features. The version that offers the best performance will be
selected automatically at runtime based on available CPU features.

I guess we should follow the instructions at
to build a system image that contains multiple versions of each
function.

Thanks,
Ludo’.
Z
Z
zimoun wrote on 7 Dec 2021 16:52
(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
CAJ3okZ3KFgwvSQ-=vg=RSRLKKLJYgggB+33uoc4uL+MdPBFiMA@mail.gmail.com
Hi,

On Tue, 7 Dec 2021 at 15:52, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
Toggle quote (10 lines)
> zimoun <zimon.toutoune@gmail.com> skribis:

> >> The test suite runs in the “baseline” package build anyway, so assuming
> >> the compiler works fine, skipping the test suite on tuned builds is
> >> okay.
> >
> > I miss if the test suite is effectively run somewhere?
>
> Yes, for the default/generic/baseline package, when not using ‘--tune’.

Assuming, the default/generic/baseline package is effectively built. :-)

I imagine the scenario: I develop a new simulation tool, I package it
for Guix, I share it; usually I run "guix shell -D" and do loop over
"make" and "make check", then deploy using "guix build --tune". My
colleague fetches it and want to run it on another cluster, i.e., they
run "guix build --tune". The test suite for the generic/baseline is
never run inside a clean environment. And as we know, this isolated
part allows to detect many common issues; which are often source of
"it works for me, why does it not work for you?". ;-)


Toggle quote (12 lines)
> > My questions are coming from Julia packages in mind, where the test
> > suite is the only way to know all is fine. And many times, add System
> > Image for Julia had been discussed and basically this System Image is
> > precompilation (generic one or specialized for micro-architecture).
> > Therefore, maybe this new 'tune' transformation would fit the bill.
> > :-)
> >
> > https://docs.julialang.org/en/v1/devdocs/sysimg/
>
> According to this page, ‘--tune’ won’t be necessary here because Julia
> supports function multi-versioning for its “system image”:

Yes, but from my understanding, the "baseline" cannot provide an image
for all the micro-architectures, but only 'generic'. Moreover, as you
described elsewhere, we cannot know for sure whether the machine that
hosts the daemon is able to run code for this specific
micro-architecture. Anyway. That's off topic. ;-) Thanks for
explaining and let discuss elsewhere this Julia machinery. :-)


Cheers,
simon
L
L
Ludovic Courtès wrote on 9 Dec 2021 10:19
(name . zimoun)(address . zimon.toutoune@gmail.com)
8735n2gmph.fsf@inria.fr
zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (3 lines)
> On Tue, 7 Dec 2021 at 15:52, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
>> zimoun <zimon.toutoune@gmail.com> skribis:

[...]

Toggle quote (15 lines)
>> > I miss if the test suite is effectively run somewhere?
>>
>> Yes, for the default/generic/baseline package, when not using ‘--tune’.
>
> Assuming, the default/generic/baseline package is effectively built. :-)
>
> I imagine the scenario: I develop a new simulation tool, I package it
> for Guix, I share it; usually I run "guix shell -D" and do loop over
> "make" and "make check", then deploy using "guix build --tune". My
> colleague fetches it and want to run it on another cluster, i.e., they
> run "guix build --tune". The test suite for the generic/baseline is
> never run inside a clean environment. And as we know, this isolated
> part allows to detect many common issues; which are often source of
> "it works for me, why does it not work for you?". ;-)

Sure, we can always come up with such scenarios.

Toggle quote (18 lines)
>> > My questions are coming from Julia packages in mind, where the test
>> > suite is the only way to know all is fine. And many times, add System
>> > Image for Julia had been discussed and basically this System Image is
>> > precompilation (generic one or specialized for micro-architecture).
>> > Therefore, maybe this new 'tune' transformation would fit the bill.
>> > :-)
>> >
>> > https://docs.julialang.org/en/v1/devdocs/sysimg/
>>
>> According to this page, ‘--tune’ won’t be necessary here because Julia
>> supports function multi-versioning for its “system image”:
>
> Yes, but from my understanding, the "baseline" cannot provide an image
> for all the micro-architectures, but only 'generic'. Moreover, as you
> described elsewhere, we cannot know for sure whether the machine that
> hosts the daemon is able to run code for this specific
> micro-architecture.

With multi-versioning, the system image (AIUI) provides several versions
of the relevant code, one for each useful micro-architecture. Such a
system image can be used anywhere because the right version of the code
will be picked up at run-time depending on the host CPU.

It’s The Right Thing, so no worries here! We can take advantage of that
feature in our Julia package.

Thanks,
Ludo’.
Z
Z
zimoun wrote on 9 Dec 2021 11:35
(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
86pmq6oykq.fsf@gmail.com
Hi,

On Thu, 09 Dec 2021 at 10:19, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
Toggle quote (13 lines)
> zimoun <zimon.toutoune@gmail.com> skribis:

>> I imagine the scenario: I develop a new simulation tool, I package it
>> for Guix, I share it; usually I run "guix shell -D" and do loop over
>> "make" and "make check", then deploy using "guix build --tune". My
>> colleague fetches it and want to run it on another cluster, i.e., they
>> run "guix build --tune". The test suite for the generic/baseline is
>> never run inside a clean environment. And as we know, this isolated
>> part allows to detect many common issues; which are often source of
>> "it works for me, why does it not work for you?". ;-)
>
> Sure, we can always come up with such scenarios.

Turning off the test is the general case to cover various use case.

Does it make sense to conditionally turn off? Say, the default for
’tune’ is #f, but it is #t when the requested host micro-architecture is
the same than the daemon one. Well, maybe it is overcomplicated for few
corner cases. :-)


Toggle quote (14 lines)
>>> According to this page, ‘--tune’ won’t be necessary here because Julia
>>> supports function multi-versioning for its “system image”:
>>
>> Yes, but from my understanding, the "baseline" cannot provide an image
>> for all the micro-architectures, but only 'generic'. Moreover, as you
>> described elsewhere, we cannot know for sure whether the machine that
>> hosts the daemon is able to run code for this specific
>> micro-architecture.
>
> With multi-versioning, the system image (AIUI) provides several versions
> of the relevant code, one for each useful micro-architecture. Such a
> system image can be used anywhere because the right version of the code
> will be picked up at run-time depending on the host CPU.

Thanks for explaining. Indeed, the “baseline” could provide an image
for all the micro-architectures; if it is not already the case*. The
blog post [1] refers to LWN article [2]; which underlines the impact on
the resulting image size, it should be minimal. Benchmark required for
Julia. :-)



Cheers,
simon

*not already the case: «As an example, at the time of this writing, the
following string is used in the creation of the official x86_64 Julia
binaries downloadable from julialang.org:»

generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)


And I do not know exactly if the current situation for the precompiled
.ji is optimal, another story. Indeed, this tune transformation is not
useful for Julia. :-) Thanks for the patient explanations.
L
L
Ludovic Courtès wrote on 10 Dec 2021 09:49
(name . zimoun)(address . zimon.toutoune@gmail.com)
87ee6kdeui.fsf@inria.fr
Hello!

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (21 lines)
> On Thu, 09 Dec 2021 at 10:19, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
>> zimoun <zimon.toutoune@gmail.com> skribis:
>
>>> I imagine the scenario: I develop a new simulation tool, I package it
>>> for Guix, I share it; usually I run "guix shell -D" and do loop over
>>> "make" and "make check", then deploy using "guix build --tune". My
>>> colleague fetches it and want to run it on another cluster, i.e., they
>>> run "guix build --tune". The test suite for the generic/baseline is
>>> never run inside a clean environment. And as we know, this isolated
>>> part allows to detect many common issues; which are often source of
>>> "it works for me, why does it not work for you?". ;-)
>>
>> Sure, we can always come up with such scenarios.
>
> Turning off the test is the general case to cover various use case.
>
> Does it make sense to conditionally turn off? Say, the default for
> ’tune’ is #f, but it is #t when the requested host micro-architecture is
> the same than the daemon one. Well, maybe it is overcomplicated for few
> corner cases. :-)

Yeah, there’s currently no way to know whether the build machine would
be able to run that code. Knowing what machine the daemon runs on is
not enough because there could be offloading.

Ludo’.
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 00/12] Tuning packages for CPU micro-architectures
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-1-ludo@gnu.org
Hello!

Here is v2 of the patch set implementing the ‘--tune’ package
transformation option. Changes since v1:

• Compiler packages (gcc, clang, gcc-toolchain, clang-toolchain)
now declare in a package property the supported CPU names;
‘--tune’ verifies, when the package a lowered to a bag, whether
the target CPU is supported by the compiler and errors out
if not.

In theory, ‘--tune’ (with no argument) could detect a CPU
that the compiler does not support, though that’s unlikely
since (guix cpu) currently corresponds to what GCC 10 supports.
I considered doing something fancy that would somehow fall
back to a less accurate but supported CPU name, but gave up
out of laziness and fear of complexity.

• Guix now prints which package is being tuned, like so:

$ ./pre-inst-env guix shell --tune inspekt3d -- Studio
guix shell: tuning libfive@0-4.8ca1b86 for CPU skylake

• Documentation reworded as suggested by Josselin. It also
clarifies that a compiler wrapper is used and that tests are
skipped.

• Inputs of the new packages were simplified. \o/
Something left as future work is AMD processor identification
in (guix cpu).

Those interested in compiler optimizations can use it to compare
the job made by different compilers:

guix shell --with-c-toolchain=xtensor-benchmark=clang-toolchain \
--tune xtensor-benchmark -- benchmark_xtensor

Fun fact:

guix shell --tune eigen-benchmarks -- benchBlasGemm 240 240 240

now gives me 45 Gflops/s on my CORE i7 (skylake), when pre-merge it
would give 36 Gflops/s. Same result with:

--with-c-toolchain=eigen-benchmarks=gcc-toolchain@7

Go figure!

I re-pushed the ‘wip-cpu-tuning’ branch so people can give it a try:

guix time-machine --branch=wip-cpu-tuning -- \
shell eigen-benchmarks --tune -- \
benchBlasGemm 240 240 240

Thoughts?

Ludo’.

Ludovic Courtès (12):
Add (guix cpu).
gnu: gcc: Add 'compiler-cpu-architectures' property.
gnu: clang: Add 'compiler-cpu-architectures' property.
transformations: Add '--tune'.
ci: Add extra jobs for tunable packages.
gnu: Add eigen-benchmarks.
gnu: Add xsimd-benchmark.
gnu: Add xtensor-benchmark.
gnu: ceres-solver: Mark as tunable.
gnu: Add ceres-solver-benchmarks.
gnu: libfive: Mark as tunable.
gnu: prusa-slicer: Mark as tunable.

Makefile.am | 1 +
doc/guix.texi | 61 ++++++++++
gnu/ci.scm | 43 +++++--
gnu/packages/algebra.scm | 77 +++++++++++++
gnu/packages/commencement.scm | 1 +
gnu/packages/cpp.scm | 23 ++++
gnu/packages/engineering.scm | 10 +-
gnu/packages/gcc.scm | 31 +++++-
gnu/packages/llvm.scm | 71 +++++++++++-
gnu/packages/maths.scm | 48 +++++++-
guix/cpu.scm | 143 ++++++++++++++++++++++++
guix/transformations.scm | 204 ++++++++++++++++++++++++++++++++++
tests/transformations.scm | 35 ++++++
13 files changed, 733 insertions(+), 15 deletions(-)
create mode 100644 guix/cpu.scm


base-commit: e642378df3b0d218e463397883e7bf331f528c6a
--
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 01/12] Add (guix cpu).
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-2-ludo@gnu.org
* guix/cpu.scm: New file.
* Makefile.am (MODULES): Add it.
---
Makefile.am | 1 +
guix/cpu.scm | 143 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 144 insertions(+)
create mode 100644 guix/cpu.scm

Toggle diff (163 lines)
diff --git a/Makefile.am b/Makefile.am
index c4ccee65f1..dba9f4da82 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -81,6 +81,7 @@ MODULES =					\
   guix/base64.scm				\
   guix/ci.scm					\
   guix/cpio.scm					\
+  guix/cpu.scm					\
   guix/deprecation.scm				\
   guix/docker.scm	   			\
   guix/records.scm				\
diff --git a/guix/cpu.scm b/guix/cpu.scm
new file mode 100644
index 0000000000..e1911f52a8
--- /dev/null
+++ b/guix/cpu.scm
@@ -0,0 +1,143 @@
+;;; GNU Guix --- Functional package management for GNU
+;;; Copyright © 2021 Ludovic Courtès <ludo@gnu.org>
+;;;
+;;; This file is part of GNU Guix.
+;;;
+;;; GNU Guix is free software; you can redistribute it and/or modify it
+;;; under the terms of the GNU General Public License as published by
+;;; the Free Software Foundation; either version 3 of the License, or (at
+;;; your option) any later version.
+;;;
+;;; GNU Guix is distributed in the hope that it will be useful, but
+;;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;;; GNU General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU General Public License
+;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.
+
+(define-module (guix cpu)
+  #:use-module (guix sets)
+  #:use-module (guix memoization)
+  #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-9)
+  #:use-module (ice-9 match)
+  #:use-module (ice-9 rdelim)
+  #:export (current-cpu
+            cpu?
+            cpu-architecture
+            cpu-family
+            cpu-model
+            cpu-flags
+
+            cpu->gcc-architecture))
+
+;;; Commentary:
+;;;
+;;; This module provides tools to determine the micro-architecture supported
+;;; by the CPU and to map it to a name known to GCC's '-march'.
+;;;
+;;; Code:
+
+;; CPU description.
+(define-record-type <cpu>
+  (cpu architecture family model flags)
+  cpu?
+  (architecture cpu-architecture)                 ;string, from 'uname'
+  (family       cpu-family)                       ;integer
+  (model        cpu-model)                        ;integer
+  (flags        cpu-flags))                       ;set of strings
+
+(define current-cpu
+  (mlambda ()
+    "Return a <cpu> record representing the host CPU."
+    (define (prefix? prefix)
+      (lambda (str)
+        (string-prefix? prefix str)))
+
+    (call-with-input-file "/proc/cpuinfo"
+      (lambda (port)
+        (let loop ((family #f)
+                   (model #f))
+          (match (read-line port)
+            ((? eof-object?)
+             #f)
+            ((? (prefix? "cpu family") str)
+             (match (string-tokenize str)
+               (("cpu" "family" ":" family)
+                (loop (string->number family) model))))
+            ((? (prefix? "model") str)
+             (match (string-tokenize str)
+               (("model" ":" model)
+                (loop family (string->number model)))
+               (_
+                (loop family model))))
+            ((? (prefix? "flags") str)
+             (match (string-tokenize str)
+               (("flags" ":" flags ...)
+                (cpu (utsname:machine (uname))
+                     family model (list->set flags)))))
+            (_
+             (loop family model))))))))
+
+(define (cpu->gcc-architecture cpu)
+  "Return the architecture name, suitable for GCC's '-march' flag, that
+corresponds to CPU, a record as returned by 'current-cpu'."
+  (match (cpu-architecture cpu)
+    ("x86_64"
+     ;; Transcribed from GCC's 'host_detect_local_cpu' in driver-i386.c.
+     (or (and (= 6 (cpu-family cpu))              ;the "Pentium Pro" family
+              (letrec-syntax ((model (syntax-rules (=>)
+                                       ((_) #f)
+                                       ((_ (candidate => integers ...) rest
+                                           ...)
+                                        (or (and (= (cpu-model cpu) integers)
+                                                 candidate)
+                                            ...
+                                            (model rest ...))))))
+                (model ("bonnel" => #x1c #x26)
+                       ("silvermont" => #x37 #x4a #x4d #x5a #x5d)
+                       ("core2" => #x0f #x17 #x1d)
+                       ("nehalem" => #x1a #x1e #x1f #x2e)
+                       ("westmere" => #x25 #x2c #x2f)
+                       ("sandybridge" => #x2a #x2d)
+                       ("ivybridge" => #x3a #x3e)
+                       ("haswell" => #x3c #x3f #x45 #x46)
+                       ("broadwell" => #x3d #x47 #x4f #x56)
+                       ("skylake" => #x4e #x5e #x8e #x9e)
+                       ("skylake-avx512" => #x55) ;TODO: cascadelake
+                       ("knl" => #x57)
+                       ("cannonlake" => #x66)
+                       ("knm" => #x85))))
+
+         ;; Fallback case for non-Intel processors or for Intel processors not
+         ;; recognized above.
+         (letrec-syntax ((if-flags (syntax-rules (=>)
+                                     ((_)
+                                      #f)
+                                     ((_ (flags ... => name) rest ...)
+                                      (if (every (lambda (flag)
+                                                   (set-contains? (cpu-flags cpu)
+                                                                  flag))
+                                                 '(flags ...))
+                                          name
+                                          (if-flags rest ...))))))
+           (if-flags ("avx512" => "knl")
+                     ("adx" => "broadwell")
+                     ("avx2" => "haswell")
+                     ;; TODO: tigerlake, cooperlake, etc.
+                     ("avx" => "sandybridge")
+                     ("sse4_2" "gfni" => "tremont")
+                     ("sse4_2" "sgx" => "goldmont-plus")
+                     ("sse4_2" "xsave" => "goldmont")
+                     ("sse4_2" "movbe" => "silvermont")
+                     ("sse4_2" => "nehalem")
+                     ("ssse3" "movbe" => "bonnell")
+                     ("ssse3" => "core2")))
+
+         ;; TODO: Recognize AMD models (bdver*, znver*, etc.)?
+
+         "x86_64"))
+    (architecture
+     ;; TODO: AArch64.
+     architecture)))
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 02/12] gnu: gcc: Add 'compiler-cpu-architectures' property.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-3-ludo@gnu.org
* gnu/packages/gcc.scm (%gcc-7.5-x86_64-micro-architectures)
(%gcc-10-x86_64-micro-architectures): New variables.
(gcc-7, gcc-10): Add 'properties' field.
* gnu/packages/commencement.scm (make-gcc-toolchain): Likewise.
---
gnu/packages/commencement.scm | 1 +
gnu/packages/gcc.scm | 31 +++++++++++++++++++++++++++++--
2 files changed, 30 insertions(+), 2 deletions(-)

Toggle diff (70 lines)
diff --git a/gnu/packages/commencement.scm b/gnu/packages/commencement.scm
index e570a95b04..8c81098bc0 100644
--- a/gnu/packages/commencement.scm
+++ b/gnu/packages/commencement.scm
@@ -3768,6 +3768,7 @@ (define* (make-gcc-toolchain gcc
        (append (package-search-paths gcc)
                (package-search-paths libc)))
 
+      (properties (package-properties gcc))  ;for 'compiler-cpu-architectures'
       (license (package-license gcc))
       (synopsis "Complete GCC tool chain for C/C++ development")
       (description
diff --git a/gnu/packages/gcc.scm b/gnu/packages/gcc.scm
index f526680f56..efa0baeaa1 100644
--- a/gnu/packages/gcc.scm
+++ b/gnu/packages/gcc.scm
@@ -525,6 +525,27 @@ (define-public gcc-6
 
        ,@(package-inputs gcc-4.7)))))
 
+(define %gcc-7.5-x86_64-micro-architectures
+  ;; Suitable '-march' values for GCC 7.5 (info "(gcc) x86 Options").
+  '("core2" "nehalem" "westmere" "sandybridge" "ivybridge"
+    "haswell" "broadwell" "skylake" "bonnell" "silvermont"
+    "knl" "skylake-avx512"
+
+    "k8" "k8-sse3" "barcelona"
+    "bdver1" "bdver2" "bdver3" "bdver4"
+    "znver1"
+    "btver1" "btver2" "geode"))
+
+(define %gcc-10-x86_64-micro-architectures
+  ;; Suitable '-march' values for GCC 10.
+  (append %gcc-7.5-x86_64-micro-architectures
+      '("goldmont" "goldmont-plus" "tremont"
+        "knm" "cannonlake" "icelake-client" "icelake-server"
+        "cascadelake" "cooperlake" "tigerlake"
+
+        "znver2" "znver3")))
+
+
 (define-public gcc-7
   (package
     (inherit gcc-6)
@@ -542,7 +563,10 @@ (define-public gcc-7
     (description
      "GCC is the GNU Compiler Collection.  It provides compiler front-ends
 for several languages, including C, C++, Objective-C, Fortran, Ada, and Go.
-It also includes runtime support libraries for these languages.")))
+It also includes runtime support libraries for these languages.")
+    (properties
+     `((compiler-cpu-architectures
+        ("x86_64" ,@%gcc-7.5-x86_64-micro-architectures))))))
 
 (define-public gcc-8
   (package
@@ -592,7 +616,10 @@ (define-public gcc-10
             (patches (search-patches "gcc-9-strmov-store-file-names.patch"
                                      "gcc-5.0-libvtv-runpath.patch"))
             (modules '((guix build utils)))
-            (snippet gcc-canadian-cross-objdump-snippet)))))
+            (snippet gcc-canadian-cross-objdump-snippet)))
+   (properties
+    `((compiler-cpu-architectures
+       ("x86_64" ,@%gcc-10-x86_64-micro-architectures))))))
 
 (define-public gcc-11
   (package
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 03/12] gnu: clang: Add 'compiler-cpu-architectures' property.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-4-ludo@gnu.org
* gnu/packages/llvm.scm (clang-from-llvm): Add #:properties and honor it.
(clang-properties): New procedures.
(make-clang-toolchain): Set 'properties' field.
---
gnu/packages/llvm.scm | 71 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 70 insertions(+), 1 deletion(-)

Toggle diff (102 lines)
diff --git a/gnu/packages/llvm.scm b/gnu/packages/llvm.scm
index 051bbfeab5..d87ab570ff 100644
--- a/gnu/packages/llvm.scm
+++ b/gnu/packages/llvm.scm
@@ -155,7 +155,9 @@ (define* (clang-runtime-from-llvm llvm hash
     (supported-systems (delete "mips64el-linux" %supported-systems))))
 
 (define* (clang-from-llvm llvm clang-runtime hash
-                          #:key (patches '()) tools-extra)
+                          #:key (patches '()) tools-extra
+                          (properties
+                           (clang-properties (package-version llvm))))
   "Produce Clang with dependencies on LLVM and CLANG-RUNTIME, and applying the
 given PATCHES.  When TOOLS-EXTRA is given, it must point to the
 'clang-tools-extra' tarball, which contains code for 'clang-tidy', 'pp-trace',
@@ -426,10 +428,76 @@ (define (move program)
 Objective-C++ programming languages.  It uses LLVM as its back end.  The Clang
 project includes the Clang front end, the Clang static analyzer, and several
 code analysis tools.")
+    (properties properties)
     (license (if (version>=? version "9.0")
                  license:asl2.0         ;with LLVM exceptions
                  license:ncsa))))
 
+(define (clang-properties version)
+  "Return package properties for Clang VERSION."
+  `((compiler-cpu-architectures
+     ("x86_64"
+      ;; This list was obtained by running:
+      ;;
+      ;;   guix shell clang -- llc -march=x86-64 -mattr=help
+      ;;
+      ;; filtered from uninteresting entries such as "i686" and "pentium".
+      ,@(if (version>=? version "10.0")           ;TODO: refine
+            '("atom"
+              "barcelona"
+              "bdver1"
+              "bdver2"
+              "bdver3"
+              "bdver4"
+              "bonnell"
+              "broadwell"
+              "btver1"
+              "btver2"
+              "c3"
+              "c3-2"
+              "cannonlake"
+              "cascadelake"
+              "cooperlake"
+              "core-avx-i"
+              "core-avx2"
+              "core2"
+              "corei7"
+              "corei7-avx"
+              "generic"
+              "geode"
+              "goldmont"
+              "goldmont-plus"
+              "haswell"
+              "icelake-client"
+              "icelake-server"
+              "ivybridge"
+              "k8"
+              "k8-sse3"
+              "knl"
+              "knm"
+              "lakemont"
+              "nehalem"
+              "nocona"
+              "opteron"
+              "opteron-sse3"
+              "sandybridge"
+              "silvermont"
+              "skx"
+              "skylake"
+              "skylake-avx512"
+              "slm"
+              "tigerlake"
+              "tremont"
+              "westmere"
+              "x86-64"
+              "x86-64-v2"
+              "x86-64-v3"
+              "x86-64-v4"
+              "znver1"
+              "znver2"
+              "znver3")
+            '())))))
+
 (define (make-clang-toolchain clang)
   (package
     (name (string-append (package-name clang) "-toolchain"))
@@ -471,6 +539,7 @@ (define (make-clang-toolchain clang)
     (search-paths (package-search-paths clang))
 
     (license (package-license clang))
+    (properties (package-properties clang))  ;for 'compiler-cpu-architectures'
     (home-page "https://clang.llvm.org")
     (synopsis "Complete Clang toolchain for C/C++ development")
     (description "This package provides a complete Clang toolchain for C/C++
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 05/12] ci: Add extra jobs for tunable packages.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-6-ludo@gnu.org
This allows us to provide substitutes for tuned package variants.

* gnu/ci.scm (package-job): Add #:suffix and honor it.
(package->job): Add #:suffix and honor it.
(%x86-64-micro-architectures): New variable.
(tuned-package-jobs): New procedure.
(cuirass-jobs): Add jobs for tunable packages.
---
gnu/ci.scm | 43 ++++++++++++++++++++++++++++++++++---------
1 file changed, 34 insertions(+), 9 deletions(-)

Toggle diff (90 lines)
diff --git a/gnu/ci.scm b/gnu/ci.scm
index 6039af8f07..35fd583f75 100644
--- a/gnu/ci.scm
+++ b/gnu/ci.scm
@@ -28,6 +28,7 @@ (define-module (gnu ci)
   #:use-module (guix grafts)
   #:use-module (guix profiles)
   #:use-module (guix packages)
+  #:autoload   (guix transformations) (tunable-package? tuned-package)
   #:use-module (guix channels)
   #:use-module (guix config)
   #:use-module (guix derivations)
@@ -107,9 +108,9 @@ (define* (derivation->job name drv
     (#:timeout . ,timeout)))
 
 (define* (package-job store job-name package system
-                      #:key cross? target)
+                      #:key cross? target (suffix ""))
   "Return a job called JOB-NAME that builds PACKAGE on SYSTEM."
-  (let ((job-name (string-append job-name "." system)))
+  (let ((job-name (string-append job-name "." system suffix)))
     (parameterize ((%graft? #f))
       (let* ((drv (if cross?
                       (package-cross-derivation store package target system
@@ -395,21 +396,39 @@ (define package->job
                            (((_ inputs _ ...) ...)
                             inputs))))
                       (%final-inputs)))))
-    (lambda (store package system)
+    (lambda* (store package system #:key (suffix ""))
       "Return a job for PACKAGE on SYSTEM, or #f if this combination is not
-valid."
+valid.  Append SUFFIX to the job name."
       (cond ((member package base-packages)
              (package-job store (string-append "base." (job-name package))
-                          package system))
+                          package system #:suffix suffix))
             ((supported-package? package system)
              (let ((drv (package-derivation store package system
                                             #:graft? #f)))
                (and (substitutable-derivation? drv)
                     (package-job store (job-name package)
-                                 package system))))
+                                 package system #:suffix suffix))))
             (else
              #f)))))
 
+(define %x86-64-micro-architectures
+  ;; Micro-architectures for which we build tuned variants.
+  '("westmere" "ivybridge" "haswell" "skylake" "skylake-avx512"))
+
+(define (tuned-package-jobs store package system)
+  "Return a list of jobs for PACKAGE tuned for SYSTEM's micro-architectures."
+  (filter-map (lambda (micro-architecture)
+                (define suffix
+                  (string-append "." micro-architecture))
+
+                (package->job store
+                              (tuned-package package micro-architecture)
+                              system
+                              #:suffix suffix))
+              (match system
+                ("x86_64-linux" %x86-64-micro-architectures)
+                (_ '()))))
+
 (define (all-packages)
   "Return the list of packages to build."
   (define (adjust package result)
@@ -527,10 +546,16 @@ (define source
          ('all
           ;; Build everything, including replacements.
           (let ((all (all-packages))
-                (job (lambda (package)
-                       (package->job store package system))))
+                (jobs (lambda (package)
+                        (match (package->job store package system)
+                          (#f '())
+                          (main-job
+                           (cons main-job
+                                 (if (tunable-package? package)
+                                     (tuned-package-jobs store package system)
+                                     '())))))))
             (append
-             (filter-map job all)
+             (append-map jobs all)
              (cross-jobs store system))))
          ('core
           ;; Build core packages only.
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 04/12] transformations: Add '--tune'.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
20211216175827.2077-5-ludo@gnu.org
From: Ludovic Courtès <ludovic.courtes@inria.fr>

* guix/transformations.scm (tuning-compiler)
(tuned-package, tunable-package?, package-tuning)
(transform-package-tuning)
(build-system-with-tuning-compiler): New procedures.
(%transformations): Add 'tune'.
(%transformation-options): Add "--tune".
* tests/transformations.scm ("options->transformation, tune")
("options->transformations, tune, wrong micro-architecture"): New
tests.
* doc/guix.texi (Package Transformation Options): Document '--tune'.
---
doc/guix.texi | 61 ++++++++++++
guix/transformations.scm | 204 ++++++++++++++++++++++++++++++++++++++
tests/transformations.scm | 35 +++++++
3 files changed, 300 insertions(+)

Toggle diff (386 lines)
diff --git a/doc/guix.texi b/doc/guix.texi
index 7b1a64deb9..b3207e125a 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -11014,6 +11014,67 @@ available options and a synopsis (these options are not shown in the
 
 @table @code
 
+@cindex performance, tuning code
+@cindex optimization, of package code
+@cindex tuning, of package code
+@cindex SIMD support
+@cindex tunable packages
+@cindex package multi-versioning
+@item --tune[=@var{cpu}]
+Use versions of the packages marked as ``tunable'' optimized for
+@var{cpu}.  When @var{cpu} is @code{native}, or when it is omitted, tune
+for the CPU on which the @command{guix} command is running.
+
+Valid @var{cpu} names are those recognized by the underlying compiler,
+by default the GNU Compiler Collection.  On x86_64 processors, this
+includes CPU names such as @code{nehalem}, @code{haswell}, and
+@code{skylake} (@pxref{x86 Options, @code{-march},, gcc, Using the GNU
+Compiler Collection (GCC)}).
+
+As new generations of CPUs come out, they augment the standard
+instruction set architecture (ISA) with additional instructions, in
+particular instructions for single-instruction/multiple-data (SIMD)
+parallel processing.  For example, while Core2 and Skylake CPUs both
+implement the x86_64 ISA, only the latter supports AVX2 SIMD
+instructions.
+
+The primary gain one can expect from @option{--tune} is for programs
+that can make use of those SIMD capabilities @emph{and} that do not
+already have a mechanism to select the right optimized code at run time.
+Packages that have the @code{tunable?} property set are considered
+@dfn{tunable packages} by the @option{--tune} option; a package
+definition with the property set looks like this:
+
+@lisp
+(package
+  (name "hello-simd")
+  ;; ...
+
+  ;; This package may benefit from SIMD extensions so
+  ;; mark it as "tunable".
+  (properties '((tunable? . #t))))
+@end lisp
+
+Other packages are not considered tunable.  This allows Guix to use
+generic binaries in the cases where tuning for a specific CPU is
+unlikely to provide any gain.
+
+Tuned packages are built with @code{-march=@var{CPU}}; under the hood,
+the @option{-march} option is passed to the actual wrapper by a compiler
+wrapper.  Since the build machine may not be able to run code for the
+target CPU micro-architecture, the test suite is not run when building a
+tuned package.
+
+To reduce rebuilds to the minimum, tuned packages are @emph{grafted}
+onto packages that depend on them (@pxref{Security Updates, grafts}).
+Thus, using @option{--no-grafts} cancels the effect of @option{--tune}.
+
+We call this technique @dfn{package multi-versioning}: several variants
+of tunable packages may be built, one for each CPU variant.  It is the
+coarse-grain counterpart of @dfn{function multi-versioning} as
+implemented by the GNU tool chain (@pxref{Function Multiversioning,,,
+gcc, Using the GNU Compiler Collection (GCC)}).
+
 @item --with-source=@var{source}
 @itemx --with-source=@var{package}=@var{source}
 @itemx --with-source=@var{package}@@@var{version}=@var{source}
diff --git a/guix/transformations.scm b/guix/transformations.scm
index 5ae1977cb2..c43c00cdd3 100644
--- a/guix/transformations.scm
+++ b/guix/transformations.scm
@@ -18,9 +18,11 @@
 ;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.
 
 (define-module (guix transformations)
+  #:use-module ((guix config) #:select (%system))
   #:use-module (guix i18n)
   #:use-module (guix store)
   #:use-module (guix packages)
+  #:use-module (guix build-system)
   #:use-module (guix profiles)
   #:use-module (guix diagnostics)
   #:autoload   (guix download) (download-to-store)
@@ -29,6 +31,7 @@ (define-module (guix transformations)
   #:autoload   (guix upstream) (package-latest-release
                                 upstream-source-version
                                 upstream-source-signature-urls)
+  #:autoload   (guix cpu) (current-cpu cpu->gcc-architecture)
   #:use-module (guix utils)
   #:use-module (guix memoization)
   #:use-module (guix gexp)
@@ -49,6 +52,9 @@ (define-module (guix transformations)
   #:export (options->transformation
             manifest-entry-with-transformations
 
+            tunable-package?
+            tuned-package
+
             show-transformation-options-help
             %transformation-options))
 
@@ -419,6 +425,181 @@ (define replacements
             obj)
         obj)))
 
+(define tuning-compiler
+  (mlambda (micro-architecture)
+    "Return a compiler wrapper that passes '-march=MICRO-ARCHITECTURE' to the
+actual compiler."
+    (define wrapper
+      #~(begin
+          (use-modules (ice-9 match))
+
+          (define* (search-next command
+                                #:optional
+                                (path (string-split (getenv "PATH")
+                                                    #\:)))
+            ;; Search the next COMMAND on PATH, a list of
+            ;; directories representing the executable search path.
+            (define this
+              (stat (car (command-line))))
+
+            (let loop ((path path))
+              (match path
+                (()
+                 (match command
+                   ("cc" (search-next "gcc"))
+                   (_ #f)))
+                ((directory rest ...)
+                 (let* ((file (string-append
+                               directory "/" command))
+                        (st   (stat file #f)))
+                   (if (and st (not (equal? this st)))
+                       file
+                       (loop rest)))))))
+
+          (match (command-line)
+            ((command arguments ...)
+             (match (search-next (basename command))
+               (#f (exit 127))
+               (next
+                (apply execl next
+                       (append (cons next arguments)
+                           (list (string-append "-march="
+                                                #$micro-architecture))))))))))
+
+    (define program
+      (program-file (string-append "tuning-compiler-wrapper-" micro-architecture)
+                    wrapper))
+
+    (computed-file (string-append "tuning-compiler-" micro-architecture)
+                   (with-imported-modules '((guix build utils))
+                     #~(begin
+                         (use-modules (guix build utils))
+
+                         (define bin (string-append #$output "/bin"))
+                         (mkdir-p bin)
+
+                         (for-each (lambda (program)
+                                     (symlink #$program
+                                              (string-append bin "/" program)))
+                                   '("cc" "gcc" "clang" "g++" "c++" "clang++")))))))
+
+(define (build-system-with-tuning-compiler bs micro-architecture)
+  "Return a variant of BS, a build system, that ensures that the compiler that
+BS uses (usually an implicit input) can generate code for MICRO-ARCHITECTURE,
+which names a specific CPU of the target architecture--e.g., when targeting
+86_64 MICRO-ARCHITECTURE might be \"skylake\".  If it does, return a build
+system that builds code for MICRO-ARCHITECTURE; otherwise raise an error."
+  (define %not-hyphen
+    (char-set-complement (char-set #\-)))
+
+  (define lower
+    (build-system-lower bs))
+
+  (define (lower* . args)
+    ;; The list of CPU names supported by the '-march' option of C/C++
+    ;; compilers is specific to each compiler and version thereof.  Rather
+    ;; than pass '-march=MICRO-ARCHITECTURE' as is to the compiler, possibly
+    ;; leading to an obscure build error, check whether the compiler is known
+    ;; to support MICRO-ARCHITECTURE.  If not, bail out.
+    (let* ((lowered      (apply lower args))
+           (architecture (match (string-tokenize (bag-system lowered)
+                                                 %not-hyphen)
+                           ((arch _ ...) arch)))
+           (compiler     (any (match-lambda
+                                ((label (? package? p) . _)
+                                 (and (assoc-ref (package-properties p)
+                                                 'compiler-cpu-architectures)
+                                      p))
+                                (_ #f))
+                              (bag-build-inputs lowered))))
+      (unless compiler
+        (raise (formatted-message
+                (G_ "failed to determine which compiler is used"))))
+
+      (let ((lst (assoc-ref (package-properties compiler)
+                            'compiler-cpu-architectures)))
+        (unless lst
+          (raise (formatted-message
+                  (G_ "failed to determine whether ~a supports ~a")
+                  (package-full-name compiler)
+                  micro-architecture)))
+        (unless (member micro-architecture
+                        (or (assoc-ref lst architecture) '()))
+          (raise (formatted-message
+                  (G_ "compiler ~a does not support micro-architecture ~a")
+                  (package-full-name compiler)
+                  micro-architecture))))
+
+      (bag
+        (inherit lowered)
+        (build-inputs
+         ;; Arrange so that the compiler wrapper comes first in $PATH.
+         `(("tuning-compiler" ,(tuning-compiler micro-architecture))
+           ,@(bag-build-inputs lowered))))))
+
+  (build-system
+    (inherit bs)
+    (lower lower*)))
+
+(define (tuned-package p micro-architecture)
+  "Return package P tuned for MICRO-ARCHITECTURE."
+  (package
+    (inherit p)
+    (build-system
+      (build-system-with-tuning-compiler (package-build-system p)
+                                         micro-architecture))
+    (arguments
+     ;; The machine building this package may or may not be able to run code
+     ;; for MICRO-ARCHITECTURE.  Because of that, skip tests; they are run for
+     ;; the "baseline" variant anyway.
+     (substitute-keyword-arguments (package-arguments p)
+       ((#:tests? _ #f) #f)))
+
+    (properties
+     `((cpu-tuning . ,micro-architecture)
+
+       ;; Remove the 'tunable?' property so that 'package-tuning' does not
+       ;; call 'tuned-package' again on this one.
+       ,@(alist-delete 'tunable? (package-properties p))))))
+
+(define (tunable-package? package)
+  "Return true if package PACKAGE is \"tunable\"--i.e., if tuning it for the
+host CPU is worthwhile."
+  (assq 'tunable? (package-properties package)))
+
+(define package-tuning
+  (mlambda (micro-architecture)
+    "Return a procedure that maps the given package to its counterpart tuned
+for MICRO-ARCHITECTURE, a string suitable for GCC's '-march'."
+    (define rewriting-property
+      (gensym " package-tuning"))
+
+    (package-mapping (lambda (p)
+                       (cond ((assq rewriting-property (package-properties p))
+                              p)
+                             ((assq 'tunable? (package-properties p))
+                              (info (G_ "tuning ~a for CPU ~a~%")
+                                    (package-full-name p) micro-architecture)
+                              (package/inherit p
+                                (replacement (tuned-package p micro-architecture))
+                                (properties `((,rewriting-property . #t)
+                                              ,@(package-properties p)))))
+                             (else
+                              p)))
+                     (lambda (p)
+                       (assq rewriting-property (package-properties p)))
+                     #:deep? #t)))
+
+(define (transform-package-tuning micro-architectures)
+  "Return a procedure that, when "
+  (match micro-architectures
+    ((micro-architecture _ ...)
+     (let ((rewrite (package-tuning micro-architecture)))
+       (lambda (obj)
+         (if (package? obj)
+             (rewrite obj)
+             obj))))))
+
 (define (transform-package-with-debug-info specs)
   "Return a procedure that, when passed a package, set its 'replacement' field
 to the same package but with #:strip-binaries? #f in its 'arguments' field."
@@ -601,6 +782,7 @@ (define %transformations
     (with-commit . ,transform-package-source-commit)
     (with-git-url . ,transform-package-source-git-url)
     (with-c-toolchain . ,transform-package-toolchain)
+    (tune . ,transform-package-tuning)
     (with-debug-info . ,transform-package-with-debug-info)
     (without-tests . ,transform-package-tests)
     (with-patch  . ,transform-package-patches)
@@ -640,6 +822,28 @@ (define %transformation-options
                   (parser 'with-git-url))
           (option '("with-c-toolchain") #t #f
                   (parser 'with-c-toolchain))
+          (option '("tune") #f #t
+                  (lambda (opt name arg result . rest)
+                    (define micro-architecture
+                      (match arg
+                        ((or #f "native")
+                         (unless (string=? (or (assoc-ref result 'system)
+                                               (%current-system))
+                                           %system)
+                           (leave (G_ "\
+building for ~a instead of ~a, so tuning cannot be guessed~%")
+                                  (assoc-ref result 'system) %system))
+
+                         (cpu->gcc-architecture (current-cpu)))
+                        ("generic" #f)
+                        (_ arg)))
+
+                    (apply values
+                           (if micro-architecture
+                               (alist-cons 'tune micro-architecture
+                                           result)
+                               (alist-delete 'tune result))
+                           rest)))
           (option '("with-debug-info") #t #f
                   (parser 'with-debug-info))
           (option '("without-tests") #t #f
diff --git a/tests/transformations.scm b/tests/transformations.scm
index 09839dc1c5..8db85b4305 100644
--- a/tests/transformations.scm
+++ b/tests/transformations.scm
@@ -38,12 +38,14 @@ (define-module (test-transformations)
   #:use-module (guix utils)
   #:use-module (guix git)
   #:use-module (guix upstream)
+  #:use-module (guix diagnostics)
   #:use-module (gnu packages)
   #:use-module (gnu packages base)
   #:use-module (gnu packages busybox)
   #:use-module (ice-9 match)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-26)
+  #:use-module (srfi srfi-34)
   #:use-module (srfi srfi-64))
 
 
@@ -465,6 +467,39 @@ (define (package-name* obj)
                    `((with-latest . "foo")))))
           (package-version (t p)))))
 
+(test-equal "options->transformation, tune"
+  '(cpu-tuning . "superfast")
+  (let* ((p0 (dummy-package "p0"))
+         (p1 (dummy-package "p1"
+               (inputs `(("p0" ,p0)))
+               (properties '((tunable? . #t)))))
+         (p2 (dummy-package "p2"
+               (inputs `(("p1" ,p1)))))
+         (t  (options->transformation '((tune . "superfast"))))
+         (p3 (t p2)))
+    (and (not (package-replacement p3))
+         (match (package-inputs p3)
+           ((("p1" tuned))
+            (match (package-inputs tuned)
+              ((("p0" p0))
+               (and (not (package-replacement p0))
+                    (assq 'cpu-tuning
+                          (package-properties
+                           (package-replacement tuned)))))))))))
+
+(test-assert "options->transformations, tune, wrong micro-architecture"
+  (let ((p (dummy-package "tunable"
+             (properties '((tunable? . #t)))))
+        (t (options->transformation '((tune . "nonexistent-superfast")))))
+    ;; Because GCC used by P's build system does not support
+    ;; '-march=nonexistent-superfast', we should see an error when lowering
+    ;; the tuned package.
+    (guard (c ((formatted-message? c)
+               (member "nonexistent-superfast"
+                       (formatted-message-arguments c))))
+      (package->bag (t p))
+      #f)))
+
 (test-equal "options->transformation + package->manifest-entry"
   '((transformations . ((without-tests . "foo"))))
   (let* ((p (dummy-package "foo"))
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 06/12] gnu: Add eigen-benchmarks.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-7-ludo@gnu.org
* gnu/packages/algebra.scm (eigen-benchmarks): New variable.
---
gnu/packages/algebra.scm | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)

Toggle diff (51 lines)
diff --git a/gnu/packages/algebra.scm b/gnu/packages/algebra.scm
index 79785bd463..e92ef4bf3f 100644
--- a/gnu/packages/algebra.scm
+++ b/gnu/packages/algebra.scm
@@ -1044,6 +1044,44 @@ (define-public eigen
     ;; See 'COPYING.README' for details.
     (license license:mpl2.0)))
 
+(define-public eigen-benchmarks
+  (package
+    (inherit eigen)
+    (name "eigen-benchmarks")
+    (arguments
+     '(#:phases (modify-phases %standard-phases
+                  (delete 'configure)
+                  (replace 'build
+                    (lambda* (#:key outputs #:allow-other-keys)
+                      (let* ((out (assoc-ref outputs "out"))
+                             (bin (string-append out "/bin")))
+                        (define (compile file)
+                          (format #t "compiling '~a'...~%" file)
+                          (let ((target
+                                 (string-append bin "/"
+                                                (basename file ".cpp"))))
+                            (invoke "c++" "-o" target file
+                                    "-I" ".." "-O2" "-g"
+                                    "-lopenblas" "-Wl,--as-needed")))
+
+                        (mkdir-p bin)
+                        (with-directory-excursion "bench"
+                          ;; There are more benchmarks, of varying quality.
+                          ;; Here we pick some that appear to be useful.
+                          (for-each compile
+                                    '("benchBlasGemm.cpp"
+                                      "benchCholesky.cpp"
+                                      ;;"benchEigenSolver.cpp"
+                                      "benchFFT.cpp"
+                                      "benchmark-blocking-sizes.cpp"))))))
+                  (delete 'install))))
+    (inputs (list boost openblas))
+
+    ;; Mark as tunable to take advantage of SIMD code in Eigen.
+    (properties '((tunable? . #t)))
+
+    (synopsis "Micro-benchmarks of the Eigen linear algebra library")))
+
 (define-public eigen-for-tensorflow
   (let ((changeset "fd6845384b86")
         (revision "1"))
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 07/12] gnu: Add xsimd-benchmark.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-8-ludo@gnu.org
* gnu/packages/cpp.scm (xsmimd-benchmark): New variable.
---
gnu/packages/cpp.scm | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)

Toggle diff (36 lines)
diff --git a/gnu/packages/cpp.scm b/gnu/packages/cpp.scm
index 718fb20652..da26a4e346 100644
--- a/gnu/packages/cpp.scm
+++ b/gnu/packages/cpp.scm
@@ -300,6 +300,29 @@ (define-public xsimd
 operating on batches.")
     (license license:bsd-3)))
 
+(define-public xsmimd-benchmark
+  (package
+    (inherit xsimd)
+    (name "xsimd-benchmark")
+    (arguments
+     `(#:configure-flags (list "-DBUILD_BENCHMARK=ON")
+       #:tests? #f
+       #:phases (modify-phases %standard-phases
+                  (add-after 'unpack 'remove-march=native
+                    (lambda _
+                      (substitute* "benchmark/CMakeLists.txt"
+                        (("-march=native") ""))))
+                  (replace 'install
+                    (lambda* (#:key outputs #:allow-other-keys)
+                      ;; Install nothing but the executable.
+                      (let ((out (assoc-ref outputs "out")))
+                        (install-file "benchmark/benchmark_xsimd"
+                                      (string-append out "/bin"))))))))
+    (synopsis "Benchmark of the xsimd library")
+
+    ;; Mark as tunable to take advantage of SIMD code in xsimd/xtensor.
+    (properties '((tunable? . #t)))))
+
 (define-public chaiscript
   (package
     (name "chaiscript")
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 08/12] gnu: Add xtensor-benchmark.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-9-ludo@gnu.org
* gnu/packages/algebra.scm (xtensor-benchmark): New variable.
---
gnu/packages/algebra.scm | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)

Toggle diff (52 lines)
diff --git a/gnu/packages/algebra.scm b/gnu/packages/algebra.scm
index e92ef4bf3f..c129e0f4e0 100644
--- a/gnu/packages/algebra.scm
+++ b/gnu/packages/algebra.scm
@@ -1169,6 +1169,45 @@ (define-public xtensor
 @end itemize")
     (license license:bsd-3)))
 
+(define-public xtensor-benchmark
+  (package
+    (inherit xtensor)
+    (name "xtensor-benchmark")
+    (arguments
+     `(#:configure-flags (list "-DBUILD_BENCHMARK=ON"
+                               "-DDOWNLOAD_GBENCHMARK=OFF")
+       #:tests? #f
+       #:phases (modify-phases %standard-phases
+                  (add-after 'unpack 'remove-march=native
+                    (lambda _
+                      (substitute* "benchmark/CMakeLists.txt"
+                        (("-march=native") ""))))
+                  (add-after 'unpack 'link-with-googlebenchmark
+                    (lambda _
+                      (substitute* "benchmark/CMakeLists.txt"
+                        (("find_package\\(benchmark.*" all)
+                         (string-append
+                          all "\n"
+                          "set(GBENCHMARK_LIBRARIES benchmark)\n")))))
+                  (replace 'build
+                    (lambda _
+                      (invoke "make" "benchmark_xtensor" "-j"
+                              (number->string (parallel-job-count)))))
+                  (replace 'install
+                    (lambda* (#:key outputs #:allow-other-keys)
+                      ;; Install nothing but the executable.
+                      (let ((out (assoc-ref outputs "out")))
+                        (install-file "benchmark/benchmark_xtensor"
+                                      (string-append out "/bin"))))))))
+    (synopsis "Benchmarks of the xtensor library")
+    (native-inputs '())
+    (inputs
+     (modify-inputs (package-native-inputs xtensor)
+       (prepend googlebenchmark xsimd)))
+
+    ;; Mark as tunable to take advantage of SIMD code in xsimd/xtensor.
+    (properties '((tunable? . #t)))))
+
 (define-public gap
   (package
     (name "gap")
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 09/12] gnu: ceres-solver: Mark as tunable.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-10-ludo@gnu.org
* gnu/packages/maths.scm (ceres)[properties]: New field.
---
gnu/packages/maths.scm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Toggle diff (18 lines)
diff --git a/gnu/packages/maths.scm b/gnu/packages/maths.scm
index 3bac086666..256b1c4421 100644
--- a/gnu/packages/maths.scm
+++ b/gnu/packages/maths.scm
@@ -2411,7 +2411,10 @@ (define-public ceres
 @item non-linear least squares problems with bounds constraints;
 @item general unconstrained optimization problems.
 @end enumerate\n")
-    (license license:bsd-3)))
+    (license license:bsd-3)
+
+    ;; Mark as tunable to take advantage of SIMD code in Eigen.
+    (properties `((tunable? . #t)))))
 
 ;; For a fully featured Octave, users are strongly recommended also to install
 ;; the following packages: less, ghostscript, gnuplot.
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 10/12] gnu: Add ceres-solver-benchmarks.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-11-ludo@gnu.org
* gnu/packages/maths.scm (ceres-solver-benchmarks): New variable.
---
gnu/packages/maths.scm | 43 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)

Toggle diff (56 lines)
diff --git a/gnu/packages/maths.scm b/gnu/packages/maths.scm
index 256b1c4421..7f2994d10b 100644
--- a/gnu/packages/maths.scm
+++ b/gnu/packages/maths.scm
@@ -2416,6 +2416,49 @@ (define-public ceres
     ;; Mark as tunable to take advantage of SIMD code in Eigen.
     (properties `((tunable? . #t)))))
 
+(define-public ceres-solver-benchmarks
+  (package
+    (inherit ceres)
+    (name "ceres-solver-benchmarks")
+    (arguments
+     '(#:modules ((ice-9 popen)
+                  (ice-9 rdelim)
+                  (guix build utils)
+                  (guix build cmake-build-system))
+
+       #:phases (modify-phases %standard-phases
+                  (delete 'configure)
+                  (replace 'build
+                    (lambda* (#:key outputs #:allow-other-keys)
+                      (let* ((out (assoc-ref outputs "out"))
+                             (bin (string-append out "/bin")))
+                        (define flags
+                          (string-tokenize
+                           (read-line (open-pipe* OPEN_READ
+                                                  "pkg-config" "eigen3"
+                                                  "--cflags"))))
+
+                        (define (compile-file file)
+                          (let ((source (string-append file ".cc")))
+                            (format #t "building '~a'...~%" file)
+                            (apply invoke "c++" "-fopenmp" "-O2" "-g" "-DNDEBUG"
+                                   source "-lceres" "-lbenchmark" "-lglog"
+                                   "-pthread"
+                                   "-o" (string-append bin "/" file)
+                                   "-I" ".." flags)))
+
+                        (mkdir-p bin)
+                        (with-directory-excursion "internal/ceres"
+                          (for-each compile-file
+                                    '("small_blas_gemm_benchmark"
+                                      "small_blas_gemv_benchmark"
+                                      "autodiff_cost_function_benchmark"))))))
+                  (delete 'check)
+                  (delete 'install))))
+    (inputs (modify-inputs (package-inputs ceres)
+              (prepend googlebenchmark ceres)))
+    (synopsis "Benchmarks of the Ceres optimization problem solver")))
+
 ;; For a fully featured Octave, users are strongly recommended also to install
 ;; the following packages: less, ghostscript, gnuplot.
 (define-public octave-cli
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 11/12] gnu: libfive: Mark as tunable.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-12-ludo@gnu.org
* gnu/packages/engineering.scm (libfive)[properties]: New field.
---
gnu/packages/engineering.scm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Toggle diff (18 lines)
diff --git a/gnu/packages/engineering.scm b/gnu/packages/engineering.scm
index edc0f51d8d..709b6d2864 100644
--- a/gnu/packages/engineering.scm
+++ b/gnu/packages/engineering.scm
@@ -829,7 +829,10 @@ (define-public libfive
 Even fundamental, primitive shapes are represented as code in the user-level
 language.")
       (license (list license:mpl2.0               ;library
-                     license:gpl2+)))))           ;Guile bindings and GUI
+                     license:gpl2+))              ;Guile bindings and GUI
+
+      ;; Mark as tunable to take advantage of SIMD code in Eigen.
+      (properties '((tunable? . #t))))))
 
 (define-public inspekt3d
   (let ((commit "703f52ccbfedad2bf5240bf8183d1b573c9d54ef")
-- 
2.33.0
L
L
Ludovic Courtès wrote on 16 Dec 2021 18:58
[PATCH v2 12/12] gnu: prusa-slicer: Mark as tunable.
(address . 52283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211216175827.2077-13-ludo@gnu.org
* gnu/packages/engineering.scm (prusa-slicer)[properties]: New field.
---
gnu/packages/engineering.scm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Toggle diff (15 lines)
diff --git a/gnu/packages/engineering.scm b/gnu/packages/engineering.scm
index 709b6d2864..fa82448736 100644
--- a/gnu/packages/engineering.scm
+++ b/gnu/packages/engineering.scm
@@ -3121,4 +3121,7 @@ (define-public prusa-slicer
     (synopsis "G-code generator for 3D printers (RepRap, Makerbot, Ultimaker etc.)")
     (description "PrusaSlicer takes 3D models (STL, OBJ, AMF) and converts them into
 G-code instructions for FFF printers or PNG layers for mSLA 3D printers.")
-    (license license:agpl3)))
+    (license license:agpl3)
+
+    ;; Mark as tunable to take advantage of SIMD code in Eigen and in libigl.
+    (properties '((tunable? . #t)))))
-- 
2.33.0
L
L
Ludovic Courtès wrote on 1 Jan 15:59 +0100
Re: bug#52283: [PATCH 00/10] Tuning packages for CPU micro-architectures
(address . 52283-done@debbugs.gnu.org)
871r1rpknf.fsf_-_@gnu.org
Hello!

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (3 lines)
> Here is v2 of the patch set implementing the ‘--tune’ package
> transformation option. Changes since v1:

Pushed!

40662f7da8 news: Add entry about '--tune'.
4cd0b37f6b gnu: gsl: Add 'tunable?' property.
1fcb98ca54 gnu: prusa-slicer: Mark as tunable.
6554294754 gnu: libfive: Mark as tunable.
6b70412370 gnu: Add ceres-solver-benchmarks.
24667081ad gnu: ceres-solver: Mark as tunable.
182b97dac0 gnu: Add xtensor-benchmark.
f5873949f3 gnu: Add xsimd-benchmark.
6542e5713a gnu: Add eigen-benchmarks.
6756c64a8f ci: Add extra jobs for tunable packages.
d090e9c37d transformations: Add '--tune'.
0a767f02d4 gnu: clang: Add 'compiler-cpu-architectures' property.
2576e2019d gnu: gcc: Add 'compiler-cpu-architectures' property.
a644f88d28 Add (guix cpu).

Ludo’.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send email to 52283@debbugs.gnu.org