Using stringstream instead of `sscanf` to parse a fixed-format string
I would like to use the facilities provided by stringstream
to extract values from a fixed-format string
as a type-safe alternative to sscanf
. How can I do this?
Consider the following specific use case. I have a std::string
in the following fixed format:
YYYYMMDDHHMMSSmmm
Where:
YYYY = 4 digits representing the year
MM = 2 digits representing the month ('0' padded to 2 characters)
DD = 2 digits representing the day ('0' padded to 2 characters)
HH = 2 digits representing the hour ('0' padded to 2 characters)
MM = 2 digits representing the minute ('0' padded to 2 characters)
SS = 2 digits representing the second ('0' padded to 2 characters)
mmm = 3 digits representing the milliseconds ('0' padded to 3 characters)
Previously I was doing something along these lines:
string s = "20101220110651184";
unsigned year = 0, month = 0, day = 0, hour = 0, minute = 0, second = 0, milli = 0;
sscanf(s.c_str(), "%4u%2u%2u%2u%2u%2u%3u", &year, &month, &day, &hour, &minute, &second, &milli );
The width values are magic numbers, and that's ok. I'd like to use streams to extract these values and convert them to unsigned
s in the interest of type safety. But when I try this:
stringstream ss;
ss << "20101220110651184";
ss >> setw(4) >> year;
year
retains the value 0
. It should be 2010
.
How do I do what I'm trying to do? I can't use Boost or any other 3rd party library, nor can I use C++0x.
One not particularly efficient option would be to construct some temporary strings and use a lexical cast:
std::string s("20101220110651184");
int year = lexical_cast<int>(s.substr(0, 4));
// etc.
lexical_cast
can be implemented in just a few lines of code; Herb Sutter presented the bare minimum in his article, "The String Formatters of Manor Farm."
It's not exactly what you're looking for, but it's a type-safe way to extract fixed-width fields from a string.
Erm, if it's fixed format, why don't you do this?
std::string sd("20101220110651184");
// insert spaces from the back
sd.insert(14, 1, ' ');
sd.insert(12, 1, ' ');
sd.insert(10, 1, ' ');
sd.insert(8, 1, ' ');
sd.insert(6, 1, ' ');
sd.insert(4, 1, ' ');
int year, month, day, hour, min, sec, ms;
std::istringstream str(sd);
str >> year >> month >> day >> hour >> min >> sec >> ms;
I use the following, it might be useful for you:
template<typename T> T stringTo( const std::string& s )
{
std::istringstream iss(s);
T x;
iss >> x;
return x;
};
template<typename T> inline std::string toString( const T& x )
{
std::ostringstream o;
o << x;
return o.str();
}
These templates require:
#include <sstream>
Usage
long date;
date = stringTo<long>( std::cin );
YMMV
From here, you might find this useful:
template<typename T, typename charT, typename traits>
std::basic_istream<charT, traits>&
fixedread(std::basic_istream<charT, traits>& in, T& x)
{
if (in.width( ) == 0)
// Not fixed size, so read normally.
in >> x;
else {
std::string field;
in >> field;
std::basic_istringstream<charT, traits> stream(field);
if (! (stream >> x))
in.setstate(std::ios_base::failbit);
}
return in;
}
setw()
only applies to reading in of strings cstrings. The above function use this fact, reading into a string and then casting it to the required type. You can use it in combination with setw()
or ss.width(w)
to read in a fixed-width field of any type.
template<typename T>
struct FixedRead {
T& content;
int size;
FixedRead(T& content, int size) :
content(content), size(size) {
assert(size != 0);
}
template<typename charT, typename traits>
friend std::basic_istream<charT, traits>&
operator >>(std::basic_istream<charT, traits>& in, FixedRead<T> x) {
int orig_w = in.width();
std::basic_string<charT, traits> o;
in >> setw(x.size) >> o;
std::basic_stringstream<charT, traits> os(o);
if (!(os >> x.content))
in.setstate(std::ios_base::failbit);
in.width(orig_w);
return in;
}
};
template<typename T>
FixedRead<T> fixed_read(T& content, int size) {
return FixedRead<T>(content, size);
}
void test4() {
stringstream ss("20101220110651184");
int year = 0, month = 0, day = 0, hour = 0, min = 0, sec = 0, ms = 0;
ss >> fixed_read(year, 4) >> fixed_read(month, 2) >> fixed_read(day, 2)
>> fixed_read(hour, 2) >> fixed_read(min, 2) >> fixed_read(sec, 2)
>> fixed_read(ms, 4);
cout << "year:" << year << "," << "month:" << month << "," << "day:" << day
<< "," << "hour:" << hour << "," << "min:" << min << "," << "sec:"
<< sec << "," << "ms:" << ms << endl;
}
The solution of ps5mh is really nice, but does not work for fixed-size parsing of strings that include white spaces. The following solution fixes this:
template<typename T, typename T2>
struct FixedRead
{
T& content;
T2& number;
int size;
FixedRead(T& content, int size, T2 & number) :
content(content), number(number), size(size)
{
assert (size != 0);
}
template<typename charT, typename traits>
friend std::basic_istream<charT, traits>&
operator >>(std::basic_istream<charT, traits>& in, FixedRead<T,T2> x)
{
if (!in.eof() && in.good())
{
std::vector<char> buffer(x.size+1);
in.read(buffer.data(), x.size);
int num_read = in.gcount();
buffer[num_read] = 0; // set null-termination of string
std::basic_stringstream<charT, traits> os(buffer.data());
if (!(os >> x.content))
in.setstate(std::ios_base::failbit);
else
++x.number;
}
return in;
}
};
template<typename T, typename T2>
FixedRead<T,T2> fixedread(T& content, int size, T2 & number) {
return FixedRead<T,T2>(content, size, number);
}
This can be used as:
std::string s = "90007127 19000715790007397";
std::vector<int> ints(5);
int num_read = 0;
std::istringstream in(s);
in >> fixedread(ints[0], 8, num_read)
>> fixedread(ints[1], 8, num_read)
>> fixedread(ints[2], 8, num_read)
>> fixedread(ints[3], 8, num_read)
>> fixedread(ints[4], 8, num_read);
// output:
// num_read = 4 (like return value of sscanf)
// ints = 90007127, 1, 90007157, 90007397
// ints[4] is uninitialized
精彩评论