O(n) string handling in Scheme
Background: I've been writing a little interpreter in Scheme (R5RS).
The reader/lexer takes a (sometimes long) string from input and tokenises it. It does this by matching the first few charac开发者_如何转开发ters of the string against some token and returning the token and the remaining unmatched part of the string.
Problem: to return the remaining portion of the string, a new string is created every time a token is read. This means the reader is O(n^2) in the number of tokens present in the string.
Possible solution: convert the string to a list, which can be done in time O(n), then pull tokens from the list instead of the string, returning the remainder of the list instead of the remainder of the string. But this seems terribly inefficient and artificial.
Question: am I imagining it, or is there just no other way to do this efficiently in Scheme due to its purely functional outlook?
Edit: in R5RS Scheme, there isn't a way to return a pointer into a string. The "substring" function is the only function which extracts an object which is itself a string. But the Scheme standard insists this be a newly allocated string. Why? Because strings are not immutable in Scheme R5RS, e.g. see the "string-set!" function!!
One solution suggested below which works is to store an index into the string. Then one can read off the characters one at a time from that index until a token is read. Too bad the regexp library I'm using for the tokenisation requires an actual string not an index into one...
Consider making a shared-substring implementation of strings (this is how Java does it, for example). So when you want to grab a substring of a given string, rather than copying the characters, simply keep a pointer to (some location in) those characters, and a length.
精彩评论