Hi, Arun Isaac skribis: >>> It turns out that most of the time is spent in printing and texinfo >>> rendering of the search results. > > Also, when we put all package metadata into the Xapian index, we don't > have to look up any of the package variables in (gnu packages *) during > `guix search` time. This also contributes substantially to the speedup. Yup. >> In general, pre-rendering doesn’t seem practical to me: the output of >> ‘guix search’ is locale-dependent (it speaks the user’s language) and > > Note that we already need to index package synopses and descriptions in > all languages. I still haven't implemented this, though. Oh, right. Tricky! >> adjusts to the terminal width (well, this is temporarily broken on >> Guile 3.0.0, but see ‘%text-width’ in (guix ui)). > > This could be accomplished even with pre-rendering. Xapian provides > "slots" to store arbitrary strings with a document. Instead of storing > the pre-rendered document as a whole, we could store pre-rendered fields > in separate slots. Then, during `guix search` time, we can assemble the > result from these pre-rendered fields. I’m not sure I understand. The index wouldn’t store pre-rendered strings for every possible terminal width, right? >> Also, if the 12K+ descriptions need to be rendered at the time the user >> runs ‘guix pull’, the experience may not be great, because it could take >> a bit of time. > > This is a problem, but I would see it as a necessary "compilation" > step. :-P In fact, this whole patchset speeds up `guix search` by doing > part of the work of `guix search` ahead of time. So, some such cost is > unavoidable. Yeah. I think we need to take the whole user experience into account, not just ‘guix search’. ‘guix pull’ already feels very slow, and it’s a fairly common operation. Conversely, ‘guix search’ takes roughly between 0.5 and 2 seconds and is an uncommon operation on a “slow path” (in the sense that when you’re searching for software, you’ll probably have to spend more than a couple of seconds to find what you’re looking for.) >> What I like about the recutils format in this context is that it’s both >> human- and machine-readable. The examples in the manual show how it can >> be useful to select the information displayed or to refine the search >> (info "(guix) Invoking guix package"). > > Xapian's query language is much more natural (as in natural language) > than the regexp based techniques we need to use with recutils. I have > hardly ever used the regexp based search and I suspect many others > haven't either. Also, refining the search query should be easier to do > with Xapian. We could even use Xapian's query expansion feature to > suggest improved queries to the user. I’m not sufficiently familiar with Xapian’s query language. The examples I had in mind were: guix search malloc | recsel -p name,version,relevance guix search | recsel -p name -e 'license ~ "LGPL 3"' guix search crypto library | \ recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis It’s not so much about regexps than it is about selecting individual fields. >> Were you able to measure the cost of rendering specifically? > > generate-package-search-index takes around 50 seconds. If I modify > generate-package-search-index to not pre-render but simply store the > package description alone, it takes around 20 seconds. That gives us a > rough idea of the cost of pre-rendering. To me, adding 20–50 seconds on ‘guix pull’ would be undesirable. :-/ >> I think we should look at a profile of ‘package->recutils’, there’s >> probably room for improvement there. > > On quick inspection, most of the time in package->recutils is spent in > texinfo rendering the description. Unless we use the simplified search > results format as discussed above, we cannot avoid it. What I meant was that we could use (statprof) to see whether/how Texinfo rendering/parsing can be optimized. Thanks, Ludo’.