[PATCH 0/2] Package some dependencies for Argos Translate

  • Open
  • quality assurance status badge
Details
One participant
  • Nguy?n Gia Phong
Owner
unassigned
Submitted by
Nguy?n Gia Phong
Severity
normal
N
N
Nguy?n Gia Phong wrote on 14 Mar 09:29 +0100
(address . guix-patches@gnu.org)(name . Nguy?n Gia Phong)(address . mcsinyx@disroot.org)
cover.1710404630.git.mcsinyx@disroot.org
is an offline translation library based on OpenNMT.

Below are some of its dependencies that are trivial to package.
The last one missing is CTranslate2 https://opennmt.net/CTranslate2.

Nguy?n Gia Phong (2):
gnu: Add python-sacremoses.
gnu: Add python-stanza.

gnu/packages/machine-learning.scm | 30 +++++++++++++++++++++++++++
gnu/packages/python-xyz.scm | 34 +++++++++++++++++++++++++++++++
2 files changed, 64 insertions(+)


base-commit: 76a3414a1bc500626a9feca013673f994eb51a34
--
2.41.0
N
N
Nguy?n Gia Phong wrote on 14 Mar 09:32 +0100
[PATCH 1/2] gnu: Add python-sacremoses.
(address . guix-patches@gnu.org)(name . Nguy?n Gia Phong)(address . mcsinyx@disroot.org)
03cb7e5cac1e4af60d9e655285b76bfd8dbf76c9.1710404630.git.mcsinyx@disroot.org
* gnu/packages/python-xyz.scm (python-sacremoses): New variable.

Change-Id: I2c2cd94c054d7e952ffb4b3afdedd2ee8ce905bf
---
gnu/packages/python-xyz.scm | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

Toggle diff (54 lines)
diff --git a/gnu/packages/python-xyz.scm b/gnu/packages/python-xyz.scm
index 232b5d69993c..ad33d98db142 100644
--- a/gnu/packages/python-xyz.scm
+++ b/gnu/packages/python-xyz.scm
@@ -149,6 +149,7 @@
;;; Copyright © 2024 Timothee Mathieu <timothee.mathieu@inria.fr>
;;; Copyright © 2024 Ian Eure <ian@retrospec.tv>
;;; Copyright © 2024 Adriel Dumas--Jondeau <leirda@disroot.org>
+;;; Copyright © 2024 Nguy?n Gia Phong <mcsinyx@disroot.org>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -21897,6 +21898,39 @@ (define-public python-nltk
reasoning, wrappers for natural language processing libraries.")
(license license:asl2.0)))
+(define-public python-sacremoses
+ (package
+ (name "python-sacremoses")
+ (version "0.1.0")
+ (source (origin
+ (method git-fetch)
+ (uri (git-reference
+ (url "https://github.com/hplt-project/sacremoses")
+ (commit version)))
+ (sha256
+ (base32
+ "0g70vchfniknp65n4wnx7chg6g49d4xrz1wagv7f7ir2swdzyn9b"))))
+ (build-system python-build-system)
+ (arguments
+ '(#:phases
+ (modify-phases %standard-phases
+ (replace 'check
+ (lambda* (#:key tests? #:allow-other-keys)
+ (when tests?
+ ;; Skip truecaser tests which fetch https://norvig.com/big.txt
+ (invoke "python" "-m" "unittest"
+ "sacremoses/test/test_corpus.py"
+ "sacremoses/test/test_no_redos_has_numeric_only.py"
+ "sacremoses/test/test_normalizer.py"
+ "sacremoses/test/test_tokenizer.py")))))))
+ (propagated-inputs
+ (list python-click-7 python-joblib python-regex python-tqdm))
+ (home-page "https://github.com/hplt-project/sacremoses")
+ (synopsis "Natural language tokenizer, truecaser and normalizer")
+ (description "SacreMoses is a Python port of Moses'
+tokenizer, detokenizer, truecaser and punctuation normalizer.")
+ (license license:expat)))
+
(define-public python-pymongo
(package
(name "python-pymongo")
--
2.41.0
N
N
Nguy?n Gia Phong wrote on 14 Mar 09:32 +0100
[PATCH 2/2] gnu: Add python-stanza.
(address . guix-patches@gnu.org)(name . Nguy?n Gia Phong)(address . mcsinyx@disroot.org)
d45e620b075a501f144a561a5416ccdeba3a6136.1710404630.git.mcsinyx@disroot.org
* gnu/packages/machine-learning.scm (python-stanza): New variable.

Change-Id: Ibde67dcb8a015b91554f6a1e36dbf5eef0b73f36
---
gnu/packages/machine-learning.scm | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

Toggle diff (50 lines)
diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 5c18a2e9d57d..5e403d905c49 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -27,6 +27,7 @@
;;; Copyright © 2024 David Pflug <david@pflug.io>
;;; Copyright © 2024 Timothee Mathieu <timothee.mathieu@inria.fr>
;;; Copyright © 2024 Spencer King <spencer.king@geneoscopy.com>
+;;; Copyright © 2024 Nguy?n Gia Phong <mcsinyx@disroot.org>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -1127,6 +1128,35 @@ (define-public python-spacy
model packaging, deployment and workflow management.")
(license license:expat)))
+(define-public python-stanza
+ (package
+ (name "python-stanza")
+ (version "1.8.1")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (pypi-uri "stanza" version))
+ (sha256
+ (base32 "1drq9wyafisnf44jgby1sh45svp0pj2svb01v397i9h0bczc5i08"))))
+ (build-system python-build-system)
+ (propagated-inputs (list python-emoji
+ python-numpy
+ python-protobuf
+ python-requests
+ python-networkx
+ python-toml
+ python-pytorch
+ python-tqdm))
+ ;; Tests require downloading of datasets.
+ (arguments (list #:tests? #false))
+ (home-page "https://stanfordnlp.github.io/stanza")
+ (synopsis "Stanford NLP Python library for many human languages")
+ (description "Stanza is a collection of accurate and efficient tools
+for the linguistic analysis of many human languages. Starting from raw text,
+Stanza divides it into sentences and words, and then can recognize
+parts of speech and entities, do syntactic analysis, and more.")
+ (license license:asl2.0)))
+
(define-public shogun
(package
(name "shogun")
--
2.41.0
?