From debbugs-submit-bounces@debbugs.gnu.org Mon Jun 01 18:24:25 2020 Received: (at 39258) by debbugs.gnu.org; 1 Jun 2020 22:24:25 +0000 Received: from localhost ([127.0.0.1]:37268 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jfsr2-0003Nf-UX for submit@debbugs.gnu.org; Mon, 01 Jun 2020 18:24:25 -0400 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:54225) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jfsr1-0003NT-6y for 39258@debbugs.gnu.org; Mon, 01 Jun 2020 18:24:23 -0400 Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.west.internal (Postfix) with ESMTP id 55B4E9CD; Mon, 1 Jun 2020 18:24:17 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Mon, 01 Jun 2020 18:24:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=famulari.name; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=mesmtp; bh=rBTGbzaqNCLwuyZUt/axayMN HtAcCyX93g9xiJ3WoI4=; b=M701//WVMnOPSw/SxpKkPiXc9kawGh2u13zFz2Ci Zks3puqNz++L3YQCiUtIG479gD9sjpeHKCGk3CXppuUx3I418mug1Zp2XxfqM/ll RDV7Yh17qLgSjlS21MKRoTvEFu7loy9MCOd2mX3LjvnVgPW2JT6rrrYYdPztwwXc vhQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=rBTGbz aqNCLwuyZUt/axayMNHtAcCyX93g9xiJ3WoI4=; b=NxKDv9XSSkWWgOhlhBAK31 Zkwa0WNeUpIj0XB3XCUtRQKNjiFEWQntqiwwV32PmRHh7Bqn2x/AgOXz5yqGbYHD YKYdDm6qBQkK/g20s/CZNyBAkQb9gK7+3q14GcJm/WWfF4ygOf8HCXkujYuYWztJ OigzcDJx3fvGLkuXX+88KGyoKo5W/VypPhlD7c0U49fi36L3eu5qsiqJToOuw/u+ F2XzY+Res265f9nBtGvQXxct5jacM8Zpd520uvfOB8A4q8ScgTsPCfHPni1Y/2NB p+VzBggOwgm5NBr00Gg72nNdIcbYAYwq711KrMGWn/RKQzTTg5OcztdUGA7/b6XQ == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduhedrudefiedgtdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpefnvghoucfh rghmuhhlrghrihcuoehlvghosehfrghmuhhlrghrihdrnhgrmhgvqeenucggtffrrghtth gvrhhnpeektdfhtdeigeejleeiffeuuedtiedttddvhfdvhfejgfdujeeiveeifefhtdej veenucffohhmrghinhepfihikhhiphgvughirgdrohhrghenucfkphepjeeirdduvdegrd dufeekrdeifeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr ohhmpehlvghosehfrghmuhhlrghrihdrnhgrmhgv X-ME-Proxy: Received: from localhost (c-76-124-138-63.hsd1.pa.comcast.net [76.124.138.63]) by mail.messagingengine.com (Postfix) with ESMTPA id 64B2B30624CC; Mon, 1 Jun 2020 18:24:16 -0400 (EDT) Date: Mon, 1 Jun 2020 18:24:14 -0400 From: Leo Famulari To: zimoun Subject: Re: [bug#39258] KMP string search algorithm? Message-ID: <20200601222414.GA30829@jasmine.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 39258 Cc: Arun Isaac , Ludovic =?iso-8859-1?Q?Court=E8s?= , 39258@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) On Mon, Jun 01, 2020 at 12:11:52PM +0200, zimoun wrote: > Dear, > > > > Often our search strings are only literal strings. So, we can save some time > > > by using string-contains instead of invoking the regexp engine. Patch 2 does > > > this. In addition, guile's string-contains uses a naive O(n^2) string search > > > algorithm. We should perhaps use the O(n) Knuth-Morris-Pratt algorithm[1]. In > > > fact, a comment on line 2006 of libguile/srfi-13.c in the guile source code > > > mentions this. If implemented, the KMP algorithm could speed up guix search > > > further. > > > > > > [1]: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm > > It could improve. > Well, I will try to do some back-to-envelop computations because I am > not convinced that the mean value of 'n' (length of description, > isn't) is large enough to really see an improvement for the end-user; > the visible bottleneck is I/O. > > All the best, > simon > > ps; > To be honest, I thought this kind of algorithm was the default. :-) I also recommend taking a look at the Boyer Moore string search implementation in (guix build grafts). It would be great to generalize it and make it accessible to other parts of Guix.