What is the end of line character when reading a file in using C++ get(char& c);?
My issue is I am trying my first attempt at writing a very basic lexical analyzer for ascii text files. so far, it reads and compares to my token list properly, however I am unable to grab the final token without a space or pressing enter. I've tried using the delimiter ^Z ASCII 26 as another selection before comparing the string to my token list. This failed to work. I've also tried moving the f->eof() check to below the comparison location to see if it will snag it then check the eof flag. I've had no luck. could anyone possibly enlighten me? The code is below for the read method. m_TokenList is just a vector of type string.
void CelestialAnalyzer::ReadInTokens(ifstream *f){
vector<string> statement;
vector<string> tokens;
string token;
char c;
do{
f->get(c); // Read in each character
if(f->eof())
break;
if(c == '\n' || c == ' ' || c == '^Z' || c == '\r'){ // 26 ASCII ^Z (end of file marker)
for(unsigned int i=0; i<m_TokenList.size(); i++){
if(!token.compare(m_TokenList[i])){
tokens.push_back(token);
token.clear();
}
}
} else {
token.push_back(c); // Add it to the token array
}
} while (true);
f->close();
for(unsigned int i=0; i<tokens.size(); i++){
cout << "Found Token: " << tokens[i].c_str() << endl;
}
}
The m_TokenList is initialized as
CelestialAnalyzer::CelestialAnalyzer(){
m_TokenList.push_back("KEY"); // Prints data
m_TokenList.push_back("GETINPUT"); // Grabs user data
m_TokenList.push_back("+"); // Ad开发者_开发技巧dition/Concation
m_TokenList.push_back("-"); // Subtraction
m_TokenList.push_back("=="); // Equator
m_TokenList.push_back("="); // Assignment
m_TokenList.push_back(";"); // End statement
m_TokenList.push_back(" "); // Blank
m_TokenList.push_back("{"); // Open Grouping
m_TokenList.push_back("}"); // Close Grouping
m_TokenList.push_back("("); // Parameter opening
m_TokenList.push_back(")"); // Parameter closing
for(unsigned int i=48; i<=57; i++){
string s; s.push_back((char)i);
m_TokenList.push_back(s); s.clear();
}
}
A test file for reading is this simple example. 1 + 2 = KEY
It will register all but 'KEY' unless there is a space or a newline after it.
Why don't you just delete:
if(f->eof())
break;
and use
if(f->eof() || c == '\n' || c == ' ' || c == '^Z' || c == '\r'){
then break afterwards? That way, when you hit EOF, you will add whatever remaining token you have.
Alternately, you could just check if the token is nonempty after you break out of the loop, and add it in that case.
What about double 'new line'? As I know, in several messenger protocol regard \r\n\r\n with the end of the message. I think it's pretty reasonable. :)
精彩评论