Hi Mike,
On Thu, 27 Feb 2020 at 02:23, Mike Gerwitz <mtg@gnu.org> wrote:
Thank you for pointing the issue.
My remark is *not* about the rename which seems fine. For the very
same reason than the "git-annex" software is named 'git-annex' and not
'ghc-git-annex'.
Well, your comment is pointing: a) that the description is badly
written and b) the 'relevance' score is too rough.
The command "guix search pandoc" returns as the highest ranked
package: ghc-pandoc-citeproc with the relevance score of 17. The
package of interest 'ghc-pandoc' appears at the 6th position with a
relevance score of 8. (And after emacs-pandoc-mode, ghc-pandoc-types,
emacs-ox-pandoc and python-pandocfilters; well less relevant packages,
IMO.)
Why? Because the number of occurrences of the term 'pandoc' in
synopsis+description+name.
ghc-pandoc-citeproc: 1+5+1
ghc-pandoc: 0+2+1
To be precise, the score uses weights and so it reads:
ghc-pandoc-citeproc: 3*1 + 2*5 + 4*1 = 17
ghc-pandoc: 3*0 + 2*2 + 4*1 = 8
And the rename bumps the score because there is an additional weight
(5) for exact match (which normally happens only for the 'name'
field).
ghc-pandoc-citeproc: 3*1 + 2*5 + 4*1 = 17
pandoc: 3*0 + 2*2 + 4*1*5 = 24
It apparently fixes the issue and now the package named 'pandoc' will
show up first. But it is an artefact because it is easy* to find other
weights that invalidate this expected ranking; and the current weights
are a working rule of thumbs but not deeply thought, AFAIK.
*For example instead of 5, let choose 2, then the score becomes:
3*0+2*2+4*1*2=12 which is less than 17. Well, not so easy because 2 is
the same as 'description' and it seems less natural; i.e., it appears
more natural to have a high weight for an exact match. But the point
is: it is possible to find another working rule of thumb which will
not return the expected result for all the packages.
The real problem is not the non-obvious name (ghc-pandoc instead of
simply pandoc) but it is: a) some descriptions are badly written and
b) the 'relevance' scoring function is not enough "smart" to detect
them.
All the best,
simon