From debbugs-submit-bounces@debbugs.gnu.org Sun Mar 08 16:28:17 2020 Received: (at 39258) by debbugs.gnu.org; 8 Mar 2020 20:28:17 +0000 Received: from localhost ([127.0.0.1]:49334 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jB2X3-0004hA-2Z for submit@debbugs.gnu.org; Sun, 08 Mar 2020 16:28:17 -0400 Received: from mugam.systemreboot.net ([139.59.75.54]:57846) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jB2X0-0004gy-CW for 39258@debbugs.gnu.org; Sun, 08 Mar 2020 16:28:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=systemreboot.net; s=default; h=Content-Type:MIME-Version:Message-ID:Date: References:In-Reply-To:Subject:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=ym4XHLmZ1wsNI3faKASmYZDvhUZCdqYfMdhIUoCxxVU=; b=lX0BoRE1EMjtuuOjkzoQy0HUR wi8Q9HW2V0bOZ3YGOMYcUCeS630N0uVhb4GdtQy0ENGnOqp/kjUgMc6IN60nyGKk/bdnS0XHaRwwJ S1kxVBNhtMgEFzhDQ5CTtNA6vokckQ6ND4DgOogyVtfl4AUWT6hI0BGbS1W5DCjtkmUzA=; Received: from [192.168.2.1] (helo=steel) by systemreboot.net with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1jB2Wg-002hQk-1f; Mon, 09 Mar 2020 01:57:54 +0530 From: Arun Isaac To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: [PATCH v2 0/3] Xapian for Guix package search In-Reply-To: <875zffcc87.fsf@gnu.org> References: <20200307133116.11443-1-arunisaac@systemreboot.net> <87sgijgb1v.fsf@gnu.org> <875zffcc87.fsf@gnu.org> Date: Mon, 09 Mar 2020 01:57:40 +0530 Message-ID: MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 39258 Cc: mail@ambrevar.xyz, 39258@debbugs.gnu.org, zimon.toutoune@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable >> This could be accomplished even with pre-rendering. Xapian provides >> "slots" to store arbitrary strings with a document. Instead of storing >> the pre-rendered document as a whole, we could store pre-rendered fields >> in separate slots. Then, during `guix search` time, we can assemble the >> result from these pre-rendered fields. > > I=E2=80=99m not sure I understand. The index wouldn=E2=80=99t store pre-= rendered > strings for every possible terminal width, right? No, it wouldn't. It would store a partially pre-rendered string, that is without fill-paragraph. We run fill-paragraph at `guix search` time to complete the rendering. > I think we need to take the whole user experience into account, not > just =E2=80=98guix search=E2=80=99. =E2=80=98guix pull=E2=80=99 already = feels very slow, and it=E2=80=99s a > fairly common operation. Conversely, =E2=80=98guix search=E2=80=99 takes= roughly > between 0.5 and 2 seconds and is an uncommon operation on a =E2=80=9Cslow > path=E2=80=9D (in the sense that when you=E2=80=99re searching for softwa= re, you=E2=80=99ll > probably have to spend more than a couple of seconds to find what > you=E2=80=99re looking for.) I agree we can't compromise too much on `guix pull` performance. > To me, adding 20=E2=80=9350 seconds on =E2=80=98guix pull=E2=80=99 would = be undesirable. :-/ Maybe I'm missing something here. guix pull takes around 40 minutes on my machine. In comparison to that, is another 20-50 seconds (roughly 1 minute) a big deal? How much time would it be acceptable to spend on building the Xapian index? Also, is it possible to somehow provide substitutes for the Xapian index so that the user does not have to actually build it locally during `guix pull` time? > I=E2=80=99m not sufficiently familiar with Xapian=E2=80=99s query languag= e. The > examples I had in mind were: > It=E2=80=99s not so much about regexps than it is about selecting individ= ual > fields. I have totally not tested this, but I imagine that equivalent Xapian queries might look something like: > guix search | recsel -p name -e 'license ~ "LGPL 3"' guix search license:LGPL3 > guix search crypto library | \ > recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis guix search crypto library AND (NOT ghc) AND (NOT perl) AND (NOT python) AND (NOT ruby) > What I meant was that we could use (statprof) to see whether/how Texinfo > rendering/parsing can be optimized. Oh, ok. I'll try this if we decide not to pre-render. --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5lVTwACgkQLiXui2GA K7MufAf+PuuYkEHruEk5UrGI/8ofRfpKs0DL5GUdhc09xLrYI7GnBu3IkOB/2ock UfNDd1Vkgs0UKPs6rbkxer/IJ8Vf/Eat5B7r+RlDoUxByICgj/IRhgBYz42IUrfS oZuM5XxeBS158WvNvQcFfYgrXrkUl/WpxyoWZP64xEZBhgSEJYbV783Mf6trYiOY TLwuRi4vzZQru1RAHu8L+GoAUE5SGTf6jLb5xPZ5cYdNXwlRq2zSZcm8+UWv7RCR v9MVlydIsOtb++TSI/n/xwCL9eeSl8Sz5VTphVpHrZ74VKKakI0DEiubg3bTzxR9 WD9peO5+8BTwuI3kI/hmmKDx3vmQQw== =XhPo -----END PGP SIGNATURE----- --=-=-=--