C++ binary file I/O to/from containers (other than char *) using STL algorithms
I'm attempting a simple test of binary file I/O using the STL copy algorithm to copy data to/from containers and a binary file. See below:
1 #include <iostream>
2 #include <iterator>
3 #include <fstream>
4 #include <vector>
5 #include <algorithm>
6
7 using namespace std;
8
9 typedef std::ostream_iterator<double> oi_t;
10 typedef std::istream_iterator<double> ii_t;
11
12 int main () {
13
14 // generate some data to test
15 std::vector<double> vd;
16 for (int i = 0; i < 20; i++)
17 {
18 double d = rand() / 1000000.0;
19 vd.push_back(d);
20 }
21
22 // perform output to a binary file
23 ofstream output ("temp.bin", ios::binary);
24 copy (vd.begin(), vd.end(), oi_t(output, (char *)NULL));
25 output.close();
26
27 // input from the binary file to a container
28 std::vector<double> vi;
29 ifstream input 开发者_StackOverflow("temp.bin", ios::binary);
30 ii_t ii(input);
31 copy (ii, ii_t(), back_inserter(vi));
32 input.close();
33
34 // output data to screen to verify/compare the results
35 for (int i = 0; i < vd.size(); i++)
36 printf ("%8.4f %8.4f\n", vd[i], vi[i]);
37
38 printf ("vd.size() = %d\tvi.size() = %d\n", vd.size(), vi.size());
39 return 0;
40 }
The resulting output is as follows and has two problems, afaik:
1804.2894 1804.2985
846.9309 0.9312
1681.6928 0.6917
1714.6369 0.6420
1957.7478 0.7542
424.2383 0.2387
719.8854 0.8852
1649.7605 0.7660
596.5166 0.5171
1189.6414 0.6410
1025.2024 0.2135
1350.4900 0.4978
783.3687 0.3691
1102.5201 0.5220
2044.8978 0.9197
1967.5139 0.5114
1365.1805 0.1815
1540.3834 0.3830
304.0892 0.0891
1303.4557 0.4600
vd.size() = 20 vi.size() = 20
1) Every double
read from the binary data is missing the information before the decimal place.
2) The data is mangled at the 3rd decimal place (or earlier) and some arbitrary error is being introduced.
Please any help would be appreciated. (I would love for someone to point me to a previous post about this, as I've come up short in my search)
For the question 1) You need to specify a separator (for example a space). The non-decimal part was stuck to the decimal part of the previous number. Casting and using NULL is generally wrong in C++. Should have been a hint ;)
copy (vd.begin(), vd.end(), oi_t(output, " "));
For the question 2)
#include <iomanip>
output << setprecision(9);
To write binary data using std::copy().
I would do this:
template<typename T>
struct oi_t: public iterator<output_iterator_tag, void, void, void, void>
{
oi_t(std::ostream& str)
:m_str(str)
{}
oi_t& operator++() {return *this;} // increment does not do anything.
oi_t& operator++(int){return *this;}
oi_t& operator*() {return *this;} // Dereference returns a reference to this
// So that when the assignment is done we
// actually write the data from this class
oi_t& operator=(T const& data)
{
// Write the data in a binary format
m_str.write(reinterpret_cast<char const*>(&data),sizeof(T));
return *this;
}
private:
std::ostream& m_str;
};
Thus the call to std::copy is:
copy (vd.begin(), vd.end(), oi_t<double>(output));
The input iterator is slightly more complicated as we have to test for the end of the stream.
template<typename T>
struct ii_t: public iterator<input_iterator_tag, void, void, void, void>
{
ii_t(std::istream& str)
:m_str(&str)
{}
ii_t()
:m_str(NULL)
{}
ii_t& operator++() {return *this;} // increment does nothing.
ii_t& operator++(int){return *this;}
T& operator*()
{
// On the de-reference we actuall read the data into a local //// static ////
// Thus we can return a reference
static T result;
m_str->read(reinterpret_cast<char*>(&result),sizeof(T));
return result;
}
// If either iterator has a NULL pointer then it is the end() of stream iterator.
// Input iterators are only equal if they have read past the end of stream.
bool operator!=(ii_t const& rhs)
{
bool lhsPastEnd = (m_str == NULL) || (!m_str->good());
bool rhsPastEnd = (rhs.m_str == NULL) || (!rhs.m_str->good());
return !(lhsPastEnd && rhsPastEnd);
}
private:
std::istream* m_str;
};
The call to read the input is now:
ii_t<double> ii(input);
copy (ii, ii_t<double>(), back_inserter(vi));
You could set the precision using setprecision
as Tristram pointed out, and do you need a delimiter. See the cppreference to see how the operator=
functions. There is no format set, so you will need to set it on output:
ofstream output ("temp.bin", ios::binary);
output.flags(ios_base::fixed); //or output << fixed;
copy(vd.begin(), vd.end(), oi_t(output, " "));
output.close();
I would tend to favor using fixed
to eliminate precision problems. There have been many cases were someone thought "we'll never need more than 5 digits" so they hardcoded a precision everywhere. Those are costly bugs to have to correct.
I have come up with a better design for binary I/O. The fundamental approach is to have three methods: size_on_stream, load_from_buffer,
and store_to_buffer
. These go into an interface class so that all classes that support binary I/O inherit it.
The size_on_stream
method returns the size of the data as transmitted on the stream. Generally, this does not include padding bytes. This should be recursive such that a class calls the method on all of its members.
The load_from_buffer
method is passed a reference to a pointer to a buffer ( unsigned char * &
). The method loads the object's data members from the buffer, incrementing the pointer after every member (or incrementing once after all the members).
The store_to_buffer
method stores data into the given buffer and increments the pointer.
The client calls size_on_stream
to determine the size of all the data. A buffer of this size is dynamically allocated. Another pointer to this buffer is passed to the store_to_buffer
to store the object's members into the buffer. Finally, the client uses a binary write (fwrite or std::ostream::write)
to transfer the buffer to the stream.
Some of the benefits of this technique are: packing, abstraction and block I/O. The objects pack their members into the buffer. The process for writing into the buffer is hidden from the client. The client can use block I/O functions which are always more efficient than transferring individual members.
This design is also more portable, as the objects can take care of the Endianess. There is a simple method for this, which is left up to the reader.
I have expanded this concept to incorporate POD (Plain Old Data) types as well, which is left as an exercise for the reader.
精彩评论