Reading tokens from file (complicated)
I have a basic tokenization structure/algorithm in place. It's pretty complicated, and I hope I can clarify it simply enough to enlighten you about the "flaw" in my design.
class ParserState
// bool functions return false if getline() or stream extraction '>>' fails
static bool nextLine(); // reads and tokenizes next line from file and puts it in m_buffer
static bool nextToken(); // gets next token from m_buffer, via fetchToken(), and puts it in m_token
static bool fetchToken( std::string &token ); // procures next token from file/buffer
static size_t m_lineNumber;
static std::ifstream m_fstream;
static std::string m_buffer;
static std::string m_token;
The reason for this setup is being able to report the line number if a syntax error occurs. Depending on the phase/state of the parser, differend things happen in my program, and subclasses of this ParserState use m_token
and nextToken
to continue. fetchToken
calls nextLine
if m_buffer
is empty, and puts the next token in its argument:
istringstream stream;
do // read new line until valid token can be extracted
{
Debug(5) << "m_buffer contains: " << m_buffer << "\n";
stream.str( m_buffer );
if( stream >> token )
{
Debug(5) << "Token extracted: " << token << "\n";
m_token = token;
return true; // return when token found
}
stream.clear();
} while( nextLine() );
// if no tokens can be extracted from the whole file, return false
return false;
The problem is that the token removed from m_buffer isn't removed, and the same token gets read with every call to nextToken()
. The problem is that m_buffer
can be modified, thus the call to istringstream::str
in the loop. But this is the cause of my issue, and as far as I can see, it can't be worked around, hence my question: How can I let stream >> token
remove something from the string pointed to internally by the stringstream? Perhaps I need to not use a stringstream
, but something more elementary in this situation (like find first whitespace and cut the first token from the string)?
Thanks a billion!
PS: any suggestions altering my function/class structure are ok as long as they allow line numbers to be kept track of 开发者_Python百科(so no full file read into m_buffer
and a class member istringstream
, which is what I had before I wanted line number error reporting).
Why not simply make m_buffer
an std::istringstream
instead of a std::string
? You would remove a temporary variable as well as get the desired effect. Whenever you change m_buffer
in statements such as
m_buffer = ...
write this instead:
m_buffer.str(...);
To avoid reading the same token multiple times I think you have to get the position in stream
using tellg
and then restore it using seekg
(these methods are described here). However using std::istringstream
looks as an overkill for me here. I would rather work with m_buffer
directly.
The usual scheme for handling line-number reporting is to read lines one at time, as you have, incrementing a the line count, and then as your tokenizer starts to build a token, it takes a snapshot of the line number and stores it into the token data structure (typically containing the line number, token type, and token value if any).
This decouples line-reading from token building without losing the line number. It also means you can have lots of tokens, they can all have line numbers (including different ones), a token can start on one line and and finish on another, etc.
精彩评论