Reading formatted data with C++'s stream operator >> when data has spaces
I have data in the following format:
4:How do you do? 10:Happy birthday 1:Purple monkey dishwasher 200:The Ancestral Territorial Imperatives of the Trumpeter Swan
The number can be anywhere from 1 to 999, and the string is at most 255 characters long. I'm new to C++ and it seems a few sources recommend extracting formatted data with a stream's >>
operator, but when I want to extract a string it stops at the first wh开发者_如何学JAVAitespace character. Is there a way to configure a stream to stop parsing a string only at a newline or end-of-file? I saw that there was a getline
method to extract an entire line, but then I still have to split it up manually [with find_first_of
], don't I?
Is there an easy way to parse data in this format using only STL?
The C++ String Toolkit Library (StrTk) has the following solution to your problem:
#include <string>
#include <deque>
#include "strtk.hpp"
int main()
{
struct line_type
{
unsigned int id;
std::string str;
};
std::deque<line_type> line_list;
const std::string file_name = "data.txt";
strtk::for_each_line(file_name,
[&line_list](const std::string& line)
{
line_type temp_line;
const bool result = strtk::parse(line,
":",
temp_line.id,
temp_line.str);
if (!result) return;
line_list.push_back(temp_line);
});
return 0;
}
More examples can be found Here
You can read the number before you use std::getline
, which reads from a stream and stores into a std::string
object. Something like this:
int num;
string str;
while(cin>>num){
getline(cin,str);
}
You've already been told about std::getline
, but they didn't mention one detail that you'll probably find useful: when you call getline
, you can also pass a parameter telling it what character to treat as the end of input. To read your number, you can use:
std::string number;
std::string name;
std::getline(infile, number, ':');
std::getline(infile, name);
This will put the data up to the ':' into number
, discard the ':', and read the rest of the line into name
.
If you want to use >>
to read the data, you can do that too, but it's a bit more difficult, and delves into an area of the standard library that most people never touch. A stream has an associated locale
that's used for things like formatting numbers and (importantly) determining what constitutes "white space". You can define your own locale to define the ":" as white space, and the space (" ") as not white space. Tell the stream to use that locale, and it'll let you read your data directly.
#include <locale>
#include <vector>
struct colonsep: std::ctype<char> {
colonsep(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::mask());
rc[':'] = std::ctype_base::space;
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
Now to use it, we "imbue" the stream with a locale:
#include <fstream>
#include <iterator>
#include <algorithm>
#include <iostream>
typedef std::pair<int, std::string> data;
namespace std {
std::istream &operator>>(std::istream &is, data &d) {
return is >> d.first >> d.second;
}
std::ostream &operator<<(std::ostream &os, data const &d) {
return os << d.first << ":" << d.second;
}
}
int main() {
std::ifstream infile("testfile.txt");
infile.imbue(std::locale(std::locale(), new colonsep));
std::vector<data> d;
std::copy(std::istream_iterator<data>(infile),
std::istream_iterator<data>(),
std::back_inserter(d));
// just for fun, sort the data to show we can manipulate it:
std::sort(d.begin(), d.end());
std::copy(d.begin(), d.end(), std::ostream_iterator<data>(std::cout, "\n"));
return 0;
}
Now you know why that part of the library is so neglected. In theory, getting the standard library to do your work for you is great -- but in fact, most of the time it's easier to do this kind of job on your own instead.
Just read the data line by line (whole line) using getline and parse it.
To parse use find_first_of()
int i;
char *string = (char*)malloc(256*sizeof(char)); //since max is 255 chars, and +1 for '\0'
scanf("%d:%[^\n]s",&i, string); //use %255[^\n]s for accepting 255 chars max irrespective of input size
printf("%s\n", string);
Its C and will work in C++ too. scanf provides more control, but no error management. So use with caution :).
精彩评论