From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 09 09:03:26 2020 Received: (at 39258) by debbugs.gnu.org; 9 Mar 2020 13:03:26 +0000 Received: from localhost ([127.0.0.1]:49975 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jBI45-0002mx-U0 for submit@debbugs.gnu.org; Mon, 09 Mar 2020 09:03:26 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:34444) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jBI44-0002mk-JB for 39258@debbugs.gnu.org; Mon, 09 Mar 2020 09:03:24 -0400 Received: by mail-qt1-f193.google.com with SMTP id 59so6882978qtb.1 for <39258@debbugs.gnu.org>; Mon, 09 Mar 2020 06:03:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=waE7k756z7mml/RIzvy+SVEPtD/IMknUdqHdApZvdpM=; b=dGHgvz69a1pOFlT4gO4i7VS2oJtITAu+5T5vfcrYdPVcPuRgWDfyMWR0xH58LLBsHA wAJCYib6PCUAtQtzV6mqEkG58Jn2qaY2/UWVvoRCreAL0Y8OK9MO8W3gVPdJkNc2Djrs vvAZ0t8I6cWlMb9HYnoqxJfXoOhA4HWqvQOabg9F37YHhQGJUT+VQ5+FdyXwQm28jUew OuLil0iZVE287XCyZCIqH+C9ehC/FsDwh0VzwcFyzcr4XF4K2nE3mFZ+Koo1Z0SNCeOZ 1MhLayySqFGrFpXEU52AzL5F9ZzgwvZFU1nAWXJ8hSkXnbMraZ1Yn/QVld+bz8yekOFL L/Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=waE7k756z7mml/RIzvy+SVEPtD/IMknUdqHdApZvdpM=; b=djdqUi/4O5K3BGaBYi07gtGDmmO9AK+3jdiei6SUdxvo4r3Ew9XcL9dgb1/zU6DrdL gdrXwvwAdhrWfV84Jyr2qUmmzcWMamySlKHvXHrm6M6O1ZvVT1j454GKowhOCR26q9Qu WGjL63zrho9bAHRP1EzV1Q2jp61FRQDPtXRclIlHlNp+WPx8YzIoHHUiovadTy4vsmon 077+ndhxqi6HOyH65gaNDd4FPOOAlkTaFbW6z/7tM5y5BWA3AdJBDecH6jXHKEN5r27h Yqo/V3bWeJtFAlS6cEYTeej7033XRCym5i9zPmOjD1d4wd4pBmDANgGmM4i9ZLvFb4XJ y/UA== X-Gm-Message-State: ANhLgQ34MrnnAEl0oq975rd3CJa4OWB2Eqa8jb9MH1uSGWO06lUwL/Z2 zwruMEWGaJP2rfwmAV7IIY6oJDzOqRMNnwS0T5s= X-Google-Smtp-Source: ADFU+vuku6BKlNml22hB13H5ROYSxEjb2c6FLggCYgigwAJM9Dme8QQ9lne/pBMZtOMXTXmF+Nh5GNgraG5ovUOmrew= X-Received: by 2002:ac8:6b44:: with SMTP id x4mr3143322qts.186.1583758999045; Mon, 09 Mar 2020 06:03:19 -0700 (PDT) MIME-Version: 1.0 References: <20200307133116.11443-1-arunisaac@systemreboot.net> <87sgijgb1v.fsf@gnu.org> <875zffcc87.fsf@gnu.org> <877dzuvues.fsf@ambrevar.xyz> <87blp54yag.fsf@gnu.org> In-Reply-To: <87blp54yag.fsf@gnu.org> From: zimoun Date: Mon, 9 Mar 2020 14:03:06 +0100 Message-ID: Subject: Re: [PATCH v2 0/3] Xapian for Guix package search To: =?UTF-8?Q?Ludovic_Court=C3=A8s?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 39258 Cc: Arun Isaac , Pierre Neidhardt , 39258@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Mon, 9 Mar 2020 at 11:29, Ludovic Court=C3=A8s wrote: > > Back to the topic: I believe that Xapian is a huge win both for the > > shell and the future GUI :) > > It could be, but we need to consider all the aspects of the story, > including the maintenance cost and overhead moved to =E2=80=98guix pull= =E2=80=99. So > it=E2=80=99s not so much about =E2=80=9Cbeliefs=E2=80=9D at this point, b= ut rather about > demonstrating what can be done, and I=E2=80=99m glad Arun is exploring th= at > space! I agree. What is currently tested with Xapian is: 1- speeding up (or not) using an inverted index 2- the accuracy using the state-of-art of information retrieval (BM25) About 1- I do not have a strong opinion; even if I find "guix search" terribly slow as I mentioned earlier (one year ago ;-)). About 2- as I mentioned earlier, the 'relevance' function could be improved. Currently, the score is computed only considering the package itself and not the other packages (the words they use, their number etc.). BM25 is the state-of-art using what I tried to explained some time ago when I showed for example TF-IDF. The question is so what the best move to improve the accuracy. And the improvement necessarily uses a global index (of terms, at least). But on the other hand, the improvement should not pay off because it would add complexity and burden, more than the improvement itself. Without testing, we cannot say. Thank you Arun for pushing forward. All the best, simon