27.0.60; next-single-char-property-change hangs on bad argument

  • Done
  • quality assurance status badge
Details
3 participants
  • Yuan Fu
  • Eli Zaretskii
  • Federico Tedin
Owner
unassigned
Submitted by
Yuan Fu
Severity
normal
Y
Y
Yuan Fu wrote on 9 Mar 2020 16:40
(name . Bug Report Emacs)(address . bug-gnu-emacs@gnu.org)
D026EA3A-00F6-49ED-A25F-35C33582FE43@gmail.com
If I pass a LIMIT > point-max to next-single-char-property-change, Emacs hangs. Of course, I shouldn’t pass such a bad argument, but next-single-char-property-change should probably error out instead of hanging in a infinite loop IMHO.

Here is the relevant C code:

while (true)
{
position = Fnext_char_property_change (position, limit);
if (XFIXNAT (position) >= XFIXNAT (limit))
{
position = limit;
break;
}

value = Fget_char_property (position, prop, object);
if (!EQ (value, initial_value))
break;
}

If it gets a LIMIT larger than point-max, position can never == limit, so it will loop in the while loop infinitely. I would add a check in the beginning of the function, to signal an out-of-range error. Or maybe set limit to poin-max quietly. Other similar functions could have the same problem, previous-single-char-property-change comes to my mind.

Yuan

In GNU Emacs 27.0.60 (build 1, x86_64-apple-darwin19.3.0, NS appkit-1894.30 Version 10.15.3 (Build 19D76))
of 2020-02-25 built on missSilver
Repository revision: f27187f963e9e36435b508e29256e048799e0ff2
Repository branch: emacs-27
Windowing system distributor 'Apple', version 10.3.1894
System Description: Mac OS X 10.15.3

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
'configure --with-modules --with-pdumper=yes
--oldincludedir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/libxml2/'

Configured features:
RSVG GLIB NOTIFY KQUEUE ACL GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS XIM
NS MODULES THREADS PDUMPER LCMS2

Important settings:
value of $LC_CTYPE: UTF-8
value of $LANG: en_CN.UTF-8
locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
buffer-read-only: t
line-number-mode: t
transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs text-property-search time-date
subr-x seq byte-opt gv bytecomp byte-compile cconv mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs
cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel term/ns-win ns-win ucs-normalize mule-util term/common-win
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode elisp-mode lisp-mode prog-mode register page
tab-bar menu-bar rfn-eshadow isearch timer select scroll-bar mouse
jit-lock font-lock syntax facemenu font-core term/tty-colors frame
minibuffer cl-generic cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite charscript charprop case-table epa-hook jka-cmpr-hook help
simple abbrev obarray cl-preloaded nadvice loaddefs button faces
cus-face macroexp files text-properties overlay sha1 md5 base64 format
env code-pages mule custom widget hashtable-print-readable backquote
threads kqueue cocoa ns lcms2 multi-tty make-network-process emacs)

Memory information:
((conses 16 44008 8151)
(symbols 48 5908 1)
(strings 32 15290 1605)
(string-bytes 1 499152)
(vectors 16 9324)
(vector-slots 8 119382 11662)
(floats 8 19 25)
(intervals 56 177 0)
(buffers 1000 12))
Attachment: file
F
F
Federico Tedin wrote on 13 Apr 2020 15:52
(name . Yuan Fu)(address . casouri@gmail.com)(address . 40000@debbugs.gnu.org)
87d08b333i.fsf@gmail.com
(congratulations on bug #40000! :-)

I'm attaching a patch which solves the issue. Now, all the
`next-*-property-change' functions return LIMIT when it has been
specified and no properties where found up to the buffer's end.

On the other hand, the `previous-*-property-change' functions are a bit
inconsistent for negative values of LIMIT:

(previous-single-char-property-change 2 'test nil -1) -> `args-out-of-range'
(previous-single-property-change 2 'test nil -1) -> -1
(previous-char-property-change 2 -1) -> 1
(previous-property-change 2 nil -1) -> -1

For positive values of LIMIT I didn't find any problems. Although a
negative value for LIMIT doesn't make any sense, only the documentation
for `previous-char-property-change' mentions that "LIMIT is a no-op if
it is less than (point-min)". So maybe this behavior could be added to
the other 3 functions as well.
From 3d82dbc65c26d64f4dbef87e4dd3355eaf093ec0 Mon Sep 17 00:00:00 2001
From: Federico Tedin <federicotedin@gmail.com>
Date: Mon, 13 Apr 2020 15:30:58 +0200
Subject: [PATCH] Prevent hanging in next-single-char-property-change

* src/textprop.c (Fnext_single_char_property_change): Prevent Emacs
from hanging when the value of LIMIT is greater than the buffer's end
position. (Bug#40000)
---
src/textprop.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Toggle diff (15 lines)
diff --git a/src/textprop.c b/src/textprop.c
index 960dba3f8d..146cb2c76b 100644
--- a/src/textprop.c
+++ b/src/textprop.c
@@ -823,7 +823,7 @@ DEFUN ("next-single-char-property-change", Fnext_single_char_property_change,
while (true)
{
position = Fnext_char_property_change (position, limit);
- if (XFIXNAT (position) >= XFIXNAT (limit))
+ if (XFIXNAT (position) >= XFIXNAT (limit) || XFIXNAT (position) >= ZV)
{
position = limit;
break;
--
2.17.1
E
E
Eli Zaretskii wrote on 13 Apr 2020 16:04
(name . Federico Tedin)(address . federicotedin@gmail.com)
83tv1niira.fsf@gnu.org
Toggle quote (8 lines)
> From: Federico Tedin <federicotedin@gmail.com>
> Date: Mon, 13 Apr 2020 15:52:17 +0200
> Cc: 40000@debbugs.gnu.org
>
> I'm attaching a patch which solves the issue. Now, all the
> `next-*-property-change' functions return LIMIT when it has been
> specified and no properties where found up to the buffer's end.

Doesn't that contradict the documentation? It says:

If the property is constant all the way to the end of OBJECT, return the
last valid position in OBJECT.

Your change means this will no longer be true when LIMIT > EOB.

Thanks.
F
F
Federico Tedin wrote on 13 Apr 2020 16:20
(name . Eli Zaretskii)(address . eliz@gnu.org)
878siz31ta.fsf@gmail.com
Eli Zaretskii <eliz@gnu.org> writes:

Toggle quote (17 lines)
>> From: Federico Tedin <federicotedin@gmail.com>
>> Date: Mon, 13 Apr 2020 15:52:17 +0200
>> Cc: 40000@debbugs.gnu.org
>>
>> I'm attaching a patch which solves the issue. Now, all the
>> `next-*-property-change' functions return LIMIT when it has been
>> specified and no properties where found up to the buffer's end.
>
> Doesn't that contradict the documentation? It says:
>
> If the property is constant all the way to the end of OBJECT, return the
> last valid position in OBJECT.
>
> Your change means this will no longer be true when LIMIT > EOB.
>
> Thanks.

Aah OK, I missed that part at the end. But if I evaluate:

(next-single-char-property-change 0 'test "hello" 10000)

It still returns 10000. Is it possible that "return LIMIT if nothing is
found before LIMIT" is meant to take precedence over "return the last
valid position in OBJECT"?
E
E
Eli Zaretskii wrote on 13 Apr 2020 16:31
(name . Federico Tedin)(address . federicotedin@gmail.com)
83sgh7ihj2.fsf@gnu.org
Toggle quote (21 lines)
> From: Federico Tedin <federicotedin@gmail.com>
> Cc: casouri@gmail.com, 40000@debbugs.gnu.org
> Date: Mon, 13 Apr 2020 16:20:01 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > If the property is constant all the way to the end of OBJECT, return the
> > last valid position in OBJECT.
> >
> > Your change means this will no longer be true when LIMIT > EOB.
> >
> > Thanks.
>
> Aah OK, I missed that part at the end. But if I evaluate:
>
> (next-single-char-property-change 0 'test "hello" 10000)
>
> It still returns 10000. Is it possible that "return LIMIT if nothing is
> found before LIMIT" is meant to take precedence over "return the last
> valid position in OBJECT"?

I think this means you are dealing with "undefined behavior". Since
you want to make it well-defined, the results must be consistent,
where they weren't before.

Note that the doc string also says this:

In a string, scan runs to the end of the string, unless LIMIT is non-nil.
In a buffer, if LIMIT is nil or omitted, it runs to (point-max), and the
value cannot exceed that.

So it's quite clear to me that the value should never be more than the
last valid position, both for strings and for buffers. No doubt, no
one called this function with LIMIT beyond the last valid position,
because that would produce an infloop. So a change that will return
the maximum valid position in this case cannot be
backward-incompatible.
F
F
Federico Tedin wrote on 13 Apr 2020 17:27
(name . Eli Zaretskii)(address . eliz@gnu.org)
874ktn2yph.fsf@gmail.com
Toggle quote (17 lines)
> I think this means you are dealing with "undefined behavior". Since
> you want to make it well-defined, the results must be consistent,
> where they weren't before.
>
> Note that the doc string also says this:
>
> In a string, scan runs to the end of the string, unless LIMIT is non-nil.
> In a buffer, if LIMIT is nil or omitted, it runs to (point-max), and the
> value cannot exceed that.
>
> So it's quite clear to me that the value should never be more than the
> last valid position, both for strings and for buffers. No doubt, no
> one called this function with LIMIT beyond the last valid position,
> because that would produce an infloop. So a change that will return
> the maximum valid position in this case cannot be
> backward-incompatible.

I agree that the function should always return equal or smaller than the
last valid position. In case of buffers, fixing LIMIT > (point-max) is
OK because the function wasn't defined for that case anyways, as you
mentioned. However I'm not sure if my patch should fix the case for
strings where LIMIT > (length object), since that would mean that the
returned value would now be (length object) instead of LIMIT. And calls
to this function with these types of values could already exist, since
in this case the function did not hang.
E
E
Eli Zaretskii wrote on 13 Apr 2020 17:54
(name . Federico Tedin)(address . federicotedin@gmail.com)
83pncbidov.fsf@gnu.org
Toggle quote (13 lines)
> From: Federico Tedin <federicotedin@gmail.com>
> Cc: casouri@gmail.com, 40000@debbugs.gnu.org
> Date: Mon, 13 Apr 2020 17:27:06 +0200
>
> I agree that the function should always return equal or smaller than the
> last valid position. In case of buffers, fixing LIMIT > (point-max) is
> OK because the function wasn't defined for that case anyways, as you
> mentioned. However I'm not sure if my patch should fix the case for
> strings where LIMIT > (length object), since that would mean that the
> returned value would now be (length object) instead of LIMIT. And calls
> to this function with these types of values could already exist, since
> in this case the function did not hang.

Then we'd need to clearly document the behavior in each case, I guess.
F
F
Federico Tedin wrote on 14 Apr 2020 23:05
(name . Eli Zaretskii)(address . eliz@gnu.org)
87v9m1dbhc.fsf@gmail.com
Toggle quote (2 lines)
> Then we'd need to clearly document the behavior in each case, I guess.

Ok, I've taken another shot at it. I updated the function's
documentation to reflect what happens with LIMIT when working on a
string and on a buffer, hopefully it's clear enough.
From 972c1d0f2b479ba9f1ad5643ee9ce5d268077ace Mon Sep 17 00:00:00 2001
From: Federico Tedin <federicotedin@gmail.com>
Date: Tue, 14 Apr 2020 23:02:44 +0200
Subject: [PATCH] Prevent hanging in next-single-char-property-change

* src/textprop.c (Fnext_single_char_property_change): Prevent Emacs
from hanging when the value of LIMIT is greater than the buffer's end
position. (Bug#40000)
---
src/textprop.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)

Toggle diff (43 lines)
diff --git a/src/textprop.c b/src/textprop.c
index 960dba3f8d..378c158875 100644
--- a/src/textprop.c
+++ b/src/textprop.c
@@ -765,15 +765,15 @@ DEFUN ("next-single-char-property-change", Fnext_single_char_property_change,
the current buffer), POSITION is a buffer position (integer or marker).
If OBJECT is a string, POSITION is a 0-based index into it.
-In a string, scan runs to the end of the string, unless LIMIT is non-nil.
-In a buffer, if LIMIT is nil or omitted, it runs to (point-max), and the
-value cannot exceed that.
-If the optional fourth argument LIMIT is non-nil, don't search
-past position LIMIT; return LIMIT if nothing is found before LIMIT.
+In a string, scan runs to the end of the string, unless LIMIT is
+non-nil, in which case its value is returned if nothing is found
+before it.
-The property values are compared with `eq'.
-If the property is constant all the way to the end of OBJECT, return the
-last valid position in OBJECT. */)
+In a buffer, if LIMIT is nil or omitted, it runs to (point-max). If
+LIMIT is non-nil, scan does not go past it, and the smallest of
+(point-max) and LIMIT is returned.
+
+The property values are compared with `eq'. */)
(Lisp_Object position, Lisp_Object prop, Lisp_Object object, Lisp_Object limit)
{
if (STRINGP (object))
@@ -832,6 +832,12 @@ DEFUN ("next-single-char-property-change", Fnext_single_char_property_change,
value = Fget_char_property (position, prop, object);
if (!EQ (value, initial_value))
break;
+
+ if (XFIXNAT (position) >= ZV)
+ {
+ XSETFASTINT (position, ZV);
+ break;
+ }
}
position = unbind_to (count, position);
--
2.17.1
E
E
Eli Zaretskii wrote on 25 Apr 2020 14:23
(name . Federico Tedin)(address . federicotedin@gmail.com)
831rob92jt.fsf@gnu.org
Toggle quote (10 lines)
> From: Federico Tedin <federicotedin@gmail.com>
> Cc: casouri@gmail.com, 40000@debbugs.gnu.org
> Date: Tue, 14 Apr 2020 23:05:35 +0200
>
> > Then we'd need to clearly document the behavior in each case, I guess.
>
> Ok, I've taken another shot at it. I updated the function's
> documentation to reflect what happens with LIMIT when working on a
> string and on a buffer, hopefully it's clear enough.

Thanks. I have a few minor comments:

Toggle quote (6 lines)
> Subject: [PATCH] Prevent hanging in next-single-char-property-change
>
> * src/textprop.c (Fnext_single_char_property_change): Prevent Emacs
> from hanging when the value of LIMIT is greater than the buffer's end
> position. (Bug#40000)

What you used as the description of the change is actually the
explanation why the change was made. The description of the change
would be something like this:

* src/textprop.c (Fnext_single_char_property_change): Clarify in
the doc string the behavior when LIMIT is past the end of OBJECT.
Stop the search when position gets to end of buffer, for when LIMIT
is beyond that. (Bug#40000)

The explanation should ideally be in the comments to the code. In
this case, I'm not even sure we need an explanation, since there's a
reference to the bug report, and the explanation and the fix are quite
simple.

Toggle quote (18 lines)
> -In a string, scan runs to the end of the string, unless LIMIT is non-nil.
> -In a buffer, if LIMIT is nil or omitted, it runs to (point-max), and the
> -value cannot exceed that.
> -If the optional fourth argument LIMIT is non-nil, don't search
> -past position LIMIT; return LIMIT if nothing is found before LIMIT.
> +In a string, scan runs to the end of the string, unless LIMIT is
> +non-nil, in which case its value is returned if nothing is found
> +before it.
>
> -The property values are compared with `eq'.
> -If the property is constant all the way to the end of OBJECT, return the
> -last valid position in OBJECT. */)
> +In a buffer, if LIMIT is nil or omitted, it runs to (point-max). If
> +LIMIT is non-nil, scan does not go past it, and the smallest of
> +(point-max) and LIMIT is returned.
> +
> +The property values are compared with `eq'. */)

Try to avoid passive tense in documentation, it makes the text longer
and sometimes harder to understand. Here's how I'd rephrase this:

In a string, scan runs to the end of the string, unless LIMIT is non-nil.
In a buffer, scan runs to end of buffer, unless LIMIT is non-nil.
If the optional fourth argument LIMIT is non-nil, don't search
past position LIMIT; return LIMIT if nothing is found before LIMIT.
However, if OBJECT is a buffer and LIMIT is beyond the end of the
buffer, this function returns `point-max', not LIMIT.

The property values are compared with `eq'.

Toggle quote (6 lines)
> + if (XFIXNAT (position) >= ZV)
> + {
> + XSETFASTINT (position, ZV);
> + break;
> + }

Here we expect 'position' to have the value of ZV already, right.
Then there's no need to use XSETFASTINT; if you want to make sure we
never get here with value >= ZV, you can add an assertion.
F
F
Federico Tedin wrote on 3 May 2020 16:04
(name . Eli Zaretskii)(address . eliz@gnu.org)
877dxtglml.fsf@gmail.com
Hey Eli, thanks for the detailed feedback.

Toggle quote (14 lines)
> What you used as the description of the change is actually the
> explanation why the change was made. The description of the change
> would be something like this:
>
> * src/textprop.c (Fnext_single_char_property_change): Clarify in
> the doc string the behavior when LIMIT is past the end of OBJECT.
> Stop the search when position gets to end of buffer, for when LIMIT
> is beyond that. (Bug#40000)
>
> The explanation should ideally be in the comments to the code. In
> this case, I'm not even sure we need an explanation, since there's a
> reference to the bug report, and the explanation and the fix are quite
> simple.

Aah OK, that makes sense. I agree that the bug number should be enough
if someone wants to find an explanation for the changes; I've changed
the commit message to the one you proposed.

Toggle quote (28 lines)
>> -In a string, scan runs to the end of the string, unless LIMIT is non-nil.
>> -In a buffer, if LIMIT is nil or omitted, it runs to (point-max), and the
>> -value cannot exceed that.
>> -If the optional fourth argument LIMIT is non-nil, don't search
>> -past position LIMIT; return LIMIT if nothing is found before LIMIT.
>> +In a string, scan runs to the end of the string, unless LIMIT is
>> +non-nil, in which case its value is returned if nothing is found
>> +before it.
>>
>> -The property values are compared with `eq'.
>> -If the property is constant all the way to the end of OBJECT, return the
>> -last valid position in OBJECT. */)
>> +In a buffer, if LIMIT is nil or omitted, it runs to (point-max). If
>> +LIMIT is non-nil, scan does not go past it, and the smallest of
>> +(point-max) and LIMIT is returned.
>> +
>> +The property values are compared with `eq'. */)
>
> Try to avoid passive tense in documentation, it makes the text longer
> and sometimes harder to understand. Here's how I'd rephrase this:
>
> In a string, scan runs to the end of the string, unless LIMIT is non-nil.
> In a buffer, scan runs to end of buffer, unless LIMIT is non-nil.
> If the optional fourth argument LIMIT is non-nil, don't search
> past position LIMIT; return LIMIT if nothing is found before LIMIT.
> However, if OBJECT is a buffer and LIMIT is beyond the end of the
> buffer, this function returns `point-max', not LIMIT.

Good point, for some reason I am biased towards using the passive
tense. I've changed the documentation string.

Toggle quote (12 lines)
> The property values are compared with `eq'.
>
>> + if (XFIXNAT (position) >= ZV)
>> + {
>> + XSETFASTINT (position, ZV);
>> + break;
>> + }
>
> Here we expect 'position' to have the value of ZV already, right.
> Then there's no need to use XSETFASTINT; if you want to make sure we
> never get here with value >= ZV, you can add an assertion.

Aah right, because `next-char-property-change' never returns
values larger than `point-max'. So in the last iteration, `position'
will be `ZV'. I don't feel like it's necessary to add an assertion
though, unless the code inside the loop somehow was modified to be
longer/more complex. I'm attaching an updated patch.

- Fede
From 6adaeec2a5681bb290a3a70968952780146abf66 Mon Sep 17 00:00:00 2001
From: Federico Tedin <federicotedin@gmail.com>
Date: Sun, 3 May 2020 15:47:56 +0200
Subject: [PATCH] Prevent hanging in next-single-char-property-change

* src/textprop.c (Fnext_single_char_property_change): Clarify in
the doc string the behavior when LIMIT is past the end of OBJECT.
Stop the search when position gets to end of buffer, for when LIMIT
is beyond that. (Bug#40000)
---
src/textprop.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

Toggle diff (35 lines)
diff --git a/src/textprop.c b/src/textprop.c
index 960dba3f8d..0876badc87 100644
--- a/src/textprop.c
+++ b/src/textprop.c
@@ -766,14 +766,13 @@ DEFUN ("next-single-char-property-change", Fnext_single_char_property_change,
If OBJECT is a string, POSITION is a 0-based index into it.
In a string, scan runs to the end of the string, unless LIMIT is non-nil.
-In a buffer, if LIMIT is nil or omitted, it runs to (point-max), and the
-value cannot exceed that.
+In a buffer, scan runs to end of buffer, unless LIMIT is non-nil.
If the optional fourth argument LIMIT is non-nil, don't search
past position LIMIT; return LIMIT if nothing is found before LIMIT.
+However, if OBJECT is a buffer and LIMIT is beyond the end of the
+buffer, this function returns `point-max', not LIMIT.
-The property values are compared with `eq'.
-If the property is constant all the way to the end of OBJECT, return the
-last valid position in OBJECT. */)
+The property values are compared with `eq'. */)
(Lisp_Object position, Lisp_Object prop, Lisp_Object object, Lisp_Object limit)
{
if (STRINGP (object))
@@ -832,6 +831,9 @@ DEFUN ("next-single-char-property-change", Fnext_single_char_property_change,
value = Fget_char_property (position, prop, object);
if (!EQ (value, initial_value))
break;
+
+ if (XFIXNAT (position) >= ZV)
+ break;
}
position = unbind_to (count, position);
--
2.17.1
E
E
Eli Zaretskii wrote on 9 May 2020 09:29
(name . Federico Tedin)(address . federicotedin@gmail.com)
83d07dh8hf.fsf@gnu.org
Toggle quote (6 lines)
> From: Federico Tedin <federicotedin@gmail.com>
> Cc: casouri@gmail.com, 40000@debbugs.gnu.org
> Date: Sun, 03 May 2020 16:04:50 +0200
>
> I'm attaching an updated patch.

Thanks, pushed to the master branch, and closing the bug.
Closed
?