Trying to read from a file and skip punctuation in C++, tips?
I'm trying to read from a file, and make a vector of all the words from the file. What I tried to do below is have the user input the filename, and then have the code open the file, and skip characters if they aren't alphanumeric, then input that to a file.
Right now it just closes immediately when I input the filename. Any idea what I could be doing wrong?
#include <vector>
#include <string>
#include <iostream>
#include <iomanip>
#include <fstream>
using namespace std;
int main()
{
string line; //for storing words
vector<string> words; //unspecified size vector
string whichbook;
cout << "Welcome to the book analysis program. Please input the filename of the book you would like to analyze: ";
cin >> whichbook;
cout << endl;
ifstream bookread;
//could be issue
//ofstream bookoutput("results.txt");
bookread.open(whichbook.c_str());
//assert(!bookread.fail());
if(bookread.is_open()){
while(bookread.good()){
getline(bookread, line);
cout << line;
while(isalnum(bookread)){
word开发者_C百科s.push_back(bookread);
}
}
}
cout << words[];
}
I think I'd do the job a bit differently. Since you want to ignore all but alphanumeric characters, I'd start by defining a locale that treats all other characters as white space:
struct digits_only: std::ctype<char> {
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['0'], &rc['9']+1, std::ctype_base::digit);
std::fill(&rc['a'], &rc['z']+1, std::ctype_base::lower);
std::fill(&rc['A'], &rc['Z']+1, std::ctype_base::upper);
return &rc[0];
}
};
That makes reading words/numbers from the stream quite trivial. For example:
int main() {
char const test[] = "This is a bunch=of-words and 2@numbers#4(with)stuff to\tseparate,them, I think.";
std::istringstream infile(test);
infile.imbue(std::locale(std::locale(), new digits_only));
std::copy(std::istream_iterator<std::string>(infile),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}
For the moment, I've copied the words/numbers to standard output, but copying to a vector just means giving a different iterator to std::copy
. For real use, we'd undoubtedly want to get the data from an std::ifstream
as well, but (again) it's just a matter of supplying the correct iterator. Just open the file, imbue it with the locale, and read your words/numbers. All the punctuation, etc., will be ignored automatically.
The following would read every line, skip non-alpha numeric characters and add each line as an item to the output vector. You can adapt it so it outputs words instead of lines. I did not want to provide the entire solution, as this looks a bit like a homework problem.
#include <vector>
#include <sstream>
#include <string>
#include <iostream>
#include <iomanip>
#include <fstream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
string line; //for storing words
vector<string> words; //unspecified size vector
string whichbook;
cout << "Welcome to the book analysis program. Please input the filename of the book you would like to analyze: ";
cin >> whichbook;
cout << endl;
ifstream bookread;
//could be issue
//ofstream bookoutput("results.txt");
bookread.open(whichbook.c_str());
//assert(!bookread.fail());
if(bookread.is_open()){
while(!(bookread.eof())){
line = "";
getline(bookread, line);
string lineToAdd = "";
for(int i = 0 ; i < line.size(); ++i)
{
if(isalnum(line[i]) || line[i] == ' ')
{
if(line[i] == ' ')
lineToAdd.append(" ");
else
{ // just add the newly read character to the string 'lineToAdd'
stringstream ss;
string s;
ss << line[i];
ss >> s;
lineToAdd.append(s);
}
}
}
words.push_back(lineToAdd);
}
}
for(int i = 0 ; i < words.size(); ++i)
cout << words[i] + " ";
return 0;
}
精彩评论