regexp and escaped char in scheme
in scheme,
there is "hello hellu-#\"hella.helloo,hallo#\return#\""
string
I want to list them as ("hello" "hellu" "h开发者_C百科ella" "helloo" "hallo")
separate by space, hyphen, double quote, dot, comma, return
I tried
(regexp-split #rx"( +)|(#\-)|(#\")|(#\.)|(,)|(#\return)" string)
but #\- , #\.
make error
any hint or solution?
thanks
It looks like you're confusing the syntax for characters (#\foo
) with the syntax for strings, and you do that in both the string and the regexp. So my guess is that the string that you want to split is actually:
"hello hellu-\"hella.helloo,hallo\n\""
where \"
stands for a double quote character, and \n
for a newline. If this is the case, then (again, this is guessing your intention) the regexp should be:
(regexp-split #rx"( +)|(\-)|(\")|(\.)|(,)|(\n)" string)
But that doesn't work either, since \-
and \.
are invalid escapes (Racket uses C-like escapes), so change that to:
(regexp-split #rx"( +)|(-)|(\")|(.)|(,)|(\n)" string)
This doesn't work either, since .
has the usual "any char" meaning in a regexp -- so you want to escape it with a backslash. As with many other string syntaxes, you get a backslash by escaping it with a backslash, so now we have a version that is finally close to a working one:
> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"( +)|(-)|(\")|(\\.)|(,)|(\n)" string)
'("hello" "hellu" "" "hella" "helloo" "hallo" "" "")
First, the regexp can be improved considerably: the parens are not needed for splitting:
(regexp-split #rx" +|-|\"|\\.|,|\n" string)
Then, instead of using a bunch of single-characters with |
s, you can just use a "character range":
(regexp-split #rx" +|[-\".,\n]" string)
Note that it's important that the -
is the first (or last) character in the range, so it will not have the usual meaning of a range of characters. Next, it seems that you really want any sequence of such characters to be used as a separator, which will avoid some of those empty strings in the result:
(regexp-split #rx" +|[-\".,\n]+" string)
and in this case you can just as well throw the space into the range too (carefully putting it after the -
, as I explained above). We now get:
> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"[- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo" "")
And finally you'd probably want to get rid of that last empty string. Technically, it should be there, since there is a sequence of matching characters before the end of the string. An easy way in Racket around this is to use the complementary regexp-match*
which returns the list of matches rather than splitting on the list of matches:
> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[- \".,\n]+" string)
'(" " "-\"" "." "," "\n\"")
This is obviously broken, since it gives you the separators rather than what's between them. But since this regexp is a range of characters, it is easy to resolve -- simply negate the character range, and you get what you want:
> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[^- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo")
精彩评论