Re: sscanf: implement basic character sets

From: Jessica Yu
Date: Wed Feb 24 2016 - 00:40:37 EST


+++ Rasmus Villemoes [23/02/16 23:47 +0100]:
On Tue, Feb 23 2016, Jessica Yu <jeyu@xxxxxxxxxx> wrote:

Implement basic character sets for the '%[]' conversion specifier.


lib/vsprintf.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 525c8e1..983358a 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
num++;
}
continue;
+ case '[':
+ {
+ char *s = (char *)va_arg(args, char *);
+ char *set;
+ size_t (*op)(const char *str, const char *set);
+ size_t len = 0;
+ bool negate = (*(fmt) == '^');
+
+ if (field_width == -1)
+ field_width = SHRT_MAX;
+
+ op = negate ? &strcspn : &strspn;
+ if (negate)
+ fmt++;
+
+ len = strcspn(fmt, "]");
+ /* invalid format; stop here */
+ if (!len)
+ return num;
+
+ set = kstrndup(fmt, len, GFP_KERNEL);
+ if (!set)
+ return num;
+
+ /* advance fmt past ']' */
+ fmt += len + 1;
+
+ len = op(str, set);
+ /* no matches */
+ if (!len) {
+ kfree(set);
+ return num;
+ }
+
+ while (len-- && field_width--)
+ *s++ = *str++;
+ *s = '\0';
+ kfree(set);
+ num++;
+ }
+ continue;
case 'o':
base = 8;
break;

(1) How do we know that doing a memory allocation would be ok, and then
with GFP_KERNEL? vsnprintf can be called from just about any context, so
I don't think that would fly there. Sooner or later someone is going to
be calling sscanf with a spinlock held, methinks.

(2) I think a field width should be mandatory (so %[ should simply be
regarded as malformed - it should be %*[ or %n[ for some explicit
decimal n). That will allow the compiler or other static analyzers to do
sanity checking, and we'll probably be saved from a few buffer
overflows down the line.

It's a bit sad that the C standard doesn't include the terminating '\0'
in the field width, so one would sometimes have to write
'(int)sizeof(buf)-1', but there's not much to do about that. On that
note, it seems that your field width handling is off-by-one.

To get rid of the allocation, why not use a small bitmap? Something like

{
char *s = (char *)va_arg(args, char *);
DECLARE_BITMAP(map, 256) = {0};
bool negate = false;

/* a field width is required, and must provide room for at least a '\0' */
if (field_width <= 0)
return num;

if (*fmt == '^') {
negate = true;
++fmt;
}
for ( ; *fmt && *fmt != ']'; ++fmt)
set_bit((u8)*fmt, map);
if (!*fmt) // no ], so malformed input
return num;
++fmt;
if (negate) {
bitmap_complement(map, map, 256);
clear_bit(0, map); // this avoids testing *str != '\0' below
}

if (!test_bit((u8)*str, map)) // match must be non-empty
return num;
while (test_bit((u8)*str, map) && --field_width) {
*s++ = *str++;
}
*s = '\0';
++num;
}

I quite like this idea, as it avoids allocations and doesn't need
strcspn/strspn. What do other people think?