开发者

HDF5 C++ interface: writing dynamic 2D arrays

I am using the HDF5 C++ API to write 2D array dataset files. The HDF Group has an example to create a HDF5 file from a statically defined array size, which I've modified to suite my needs below. However, I require a dynamic array, where both NX and NY are determined at runtime. I've found another solution to create 2D arrays using the "new" keyword to help create a dynamic array. Here is what I have:

#include "StdAfx.h"
#include "H5Cpp.h"
using namespace H5;

const H5std_string FILE_NAME("C:\\SDS.h5");
const H5std_string DATASET_NAME("FloatArray");
const int NX = 5; // dataset dimensions
const int NY = 6;

int main (void)
{
    // Create a 2D array using "new" method
    double **data = new double*[NX];
    for (int j = 0; j < NX; j++)         // 0 1 2 3 4 5
    {                                    // 1 2 3 4 5 6
        data[j] = new double[NY];        // 2 3 4 5 6 7
        for (int i = 0; i < NY; i++)     // 3 4 5 6 7 8
            data[j][i] = (float)(i + j); // 4 5 6 7 8 9
    }

    // Create HDF5 file and dataset
    H5File file(FILE_NAME, H5F_ACC_TRUNC);
    hsize_t dimsf[2] = {NX, NY};
    DataSpace dataspace(2, dimsf);
    DataSet dataset = file.createDataSet(DATASET_NAME, PredType::NAT开发者_StackOverflowIVE_DOUBLE,
                                            dataspace);
    // Attempt to write data to HDF5 file
    dataset.write(data, PredType::NATIVE_DOUBLE);

    // Clean up
    for(int j = 0; j < NX; j++)
        delete [] data[j];
    delete [] data;
    return 0;
}

The resulting file, however, is not as expected (output from hdf5dump):

HDF5 "SDS.h5" {
GROUP "/" {
   DATASET "FloatArray" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 5, 6 ) / ( 5, 6 ) }
      DATA {
      (0,0): 4.76465e-307, 4.76541e-307, -7.84591e+298, -2.53017e-098, 0,
      (0,5): 3.8981e-308,
      (1,0): 4.76454e-307, 0, 2.122e-314, -7.84591e+298, 0, 1,
      (2,0): 2, 3, 4, 5, -2.53017e-098, -2.65698e+303,
      (3,0): 0, 3.89814e-308, 4.76492e-307, 0, 2.122e-314, -7.84591e+298,
      (4,0): 1, 2, 3, 4, 5, 6
      }
   }
}
}

The problem stems back to how the 2D array was created (since this example works fine with a static array method). As I understand from this email thread:

The HDF5 library expects to a contiguous array of elements, not pointers to elements in lower dimensions

As I am rather new to C++/HDF5, I'm not sure how to create a dynamically sized array at runtime that is a contiguous array of elements. I do not want to do the more complicated "hyperslab" method described in the email thread, as this looks overly complicated. Any help is appreciated.


Well, I don't know anything about HDF5, but dynamic 2D arrays in C++ with a contiguous buffer can be simulated by using a 1D array of size NX * NY. For example:

Allocation:

double *data = new double[NX*NY];

Element access:

 data[j*NY + i]

(instead of data[j][i])


Here is how to write N dimension arrays in HDF5 format

It is much better to use the boost multi_array class. This is the equivalent of using std::vector rather than raw arrays: It does all the memory management for you and you can access elements as efficiently as raw arrays using familiar subscripting (e.g. data[12][13] = 46)

Here is a short example:

#include <algorithm>
#include <boost/multi_array.hpp>
using boost::multi_array;
using boost::extents;

// dataset dimensions set at run time
int NX = 5,  NY = 6,  NZ = 7;


// allocate array using the "extents" helper. 
// This makes it easier to see how big the array is
multi_array<double, 3>  float_data(extents[NX][NY][NZ]);

// use resize to change size when necessary
// float_data.resize(extents[NX + 5][NY + 4][NZ + 3]);


// This is how you would fill the entire array with a value (e.g. 3.0)
std::fill_n(float_data.data(), float_data.num_elements(), 3.0)

// initialise the array to some variables
for (int ii = 0; ii != NX; ii++)
    for (int jj = 0; jj != NY; jj++)
        for (int kk = 0; kk != NZ; kk++)
            float_data[ii][jj][kk]  = ii + jj + kk

// write to HDF5 format
H5::H5File file("SDS.h5", H5F_ACC_TRUNC);
write_hdf5(file, "doubleArray", float_data );

The last line calls a function which can write multi_arrays of any dimension and any standard number type (ints, chars, floats etc).

Here is code for write_hdf5().

First, we must map c++ types to HDF5 types (from the H5 c++ api):

#include <cstdint>

//!_______________________________________________________________________________________
//!     
//!     map types to HDF5 types
//!         
//!     
//!     \author lg (04 March 2013)
//!_______________________________________________________________________________________ 

template<typename T> struct get_hdf5_data_type
{   static H5::PredType type()  
    {   
        //static_assert(false, "Unknown HDF5 data type"); 
        return H5::PredType::NATIVE_DOUBLE; 
    }
};
template<> struct get_hdf5_data_type<char>                  {   H5::IntType type    {   H5::PredType::NATIVE_CHAR       };  };
//template<> struct get_hdf5_data_type<unsigned char>       {   H5::IntType type    {   H5::PredType::NATIVE_UCHAR      };  };
//template<> struct get_hdf5_data_type<short>               {   H5::IntType type    {   H5::PredType::NATIVE_SHORT      };  };
//template<> struct get_hdf5_data_type<unsigned short>      {   H5::IntType type    {   H5::PredType::NATIVE_USHORT     };  };
//template<> struct get_hdf5_data_type<int>                 {   H5::IntType type    {   H5::PredType::NATIVE_INT        };  };
//template<> struct get_hdf5_data_type<unsigned int>        {   H5::IntType type    {   H5::PredType::NATIVE_UINT       };  };
//template<> struct get_hdf5_data_type<long>                {   H5::IntType type    {   H5::PredType::NATIVE_LONG       };  };
//template<> struct get_hdf5_data_type<unsigned long>       {   H5::IntType type    {   H5::PredType::NATIVE_ULONG      };  };
template<> struct get_hdf5_data_type<long long>             {   H5::IntType type    {   H5::PredType::NATIVE_LLONG      };  };
template<> struct get_hdf5_data_type<unsigned long long>    {   H5::IntType type    {   H5::PredType::NATIVE_ULLONG     };  };
template<> struct get_hdf5_data_type<int8_t>                {   H5::IntType type    {   H5::PredType::NATIVE_INT8       };  };
template<> struct get_hdf5_data_type<uint8_t>               {   H5::IntType type    {   H5::PredType::NATIVE_UINT8      };  };
template<> struct get_hdf5_data_type<int16_t>               {   H5::IntType type    {   H5::PredType::NATIVE_INT16      };  };
template<> struct get_hdf5_data_type<uint16_t>              {   H5::IntType type    {   H5::PredType::NATIVE_UINT16     };  };
template<> struct get_hdf5_data_type<int32_t>               {   H5::IntType type    {   H5::PredType::NATIVE_INT32      };  };
template<> struct get_hdf5_data_type<uint32_t>              {   H5::IntType type    {   H5::PredType::NATIVE_UINT32     };  };
template<> struct get_hdf5_data_type<int64_t>               {   H5::IntType type    {   H5::PredType::NATIVE_INT64      };  };
template<> struct get_hdf5_data_type<uint64_t>              {   H5::IntType type    {   H5::PredType::NATIVE_UINT64     };  };
template<> struct get_hdf5_data_type<float>                 {   H5::FloatType type  {   H5::PredType::NATIVE_FLOAT      };  };
template<> struct get_hdf5_data_type<double>                {   H5::FloatType type  {   H5::PredType::NATIVE_DOUBLE     };  };
template<> struct get_hdf5_data_type<long double>           {   H5::FloatType type  {   H5::PredType::NATIVE_LDOUBLE    };  };

Then we can use a bit of template forwarding magic to make a function of the right type to output our data. Since this is template code, it needs to live in a header file if you are going to output HDF5 arrays from multiple source files in your programme:

//!_______________________________________________________________________________________
//!     
//!     write_hdf5 multi_array
//!         
//!     \author leo Goodstadt (04 March 2013)
//!     
//!_______________________________________________________________________________________
template<typename T, std::size_t DIMENSIONS, typename hdf5_data_type>
void do_write_hdf5(H5::H5File file, const std::string& data_set_name, const boost::multi_array<T, DIMENSIONS>& data, hdf5_data_type& datatype)
{
    // Little endian for x86
    //FloatType datatype(get_hdf5_data_type<T>::type());
    datatype.setOrder(H5T_ORDER_LE);

    vector<hsize_t> dimensions(data.shape(), data.shape() + DIMENSIONS);
    H5::DataSpace dataspace(DIMENSIONS, dimensions.data());

    H5::DataSet dataset = file.createDataSet(data_set_name, datatype, dataspace);

    dataset.write(data.data(), datatype);
}

template<typename T, std::size_t DIMENSIONS>
void write_hdf5(H5::H5File file, const std::string& data_set_name, const boost::multi_array<T, DIMENSIONS>& data )
{

    get_hdf5_data_type<T> hdf_data_type;
    do_write_hdf5(file, data_set_name, data, hdf_data_type.type);
}


In scientific programming it's common to represent multidimensional arrays as a big 1D array and then calculating the corresponding offset from the multidimensional indices, e.g. as seen in the answer by Doc Brown.

Alternatively, you can overload the subscript operator (operator[]()) in order to provide an interface that allows the use of multi-dimensional indices backed by the 1D array. Or better yet, use a library which does this, such as Boost multi_array. Or in case your 2D arrays are matrices, you can use a nice C++ linear algebra library such as Eigen.


Actually, the "hyperslab" method is not very complicated to implement. You only need to modify the "write" part:

dataset.write(data, PredType::NATIVE_DOUBLE);

Select a hyperslab in the data space before output:

#include "H5Cpp.h"
using namespace H5;

const H5std_string FILE_NAME("SDS.h5");
const H5std_string DATASET_NAME("FloatArray");
const int NX = 5; // dataset dimensions
const int NY = 6;

int main ()
{
    // Create a 2D array using "new" method
    double **data = new double*[NX];
    for (int j = 0; j < NX; j++)         // 0 1 2 3 4 5
    {                                    // 1 2 3 4 5 6
        data[j] = new double[NY];        // 2 3 4 5 6 7
        for (int i = 0; i < NY; i++)     // 3 4 5 6 7 8
            data[j][i] = (float)(i + j); // 4 5 6 7 8 9
    }

    // Create HDF5 file and dataset
    H5File file(FILE_NAME, H5F_ACC_TRUNC);
    hsize_t dimsf[2] = {NX, NY};
    DataSpace dataspace(2, dimsf);
    DataSet dataset = file.createDataSet(DATASET_NAME, PredType::NATIVE_DOUBLE,
                                             dataspace);
    
    // The above codes are the same.    

    hsize_t start[2]={0, 0}, count[2]={1, NY};
    // Create memory space for one line
    DataSpace memspace(2, count);

    for(int k=0; k<NX; k++)
    {
        start[0] = k;

        // select the hyperslab for one line
        dataspace.selectHyperslab(H5S_SELECT_SET, count, start, NULL, NULL);

        // Attempt to write data to HDF5 file
        dataset.write(data[k], PredType::NATIVE_DOUBLE, memspace, dataspace);
        /*
        * memspace: dataspace specifying the size of the memory that needs to be written
        * dataspace: dataspace sepcifying the portion of the dataset that needs to be written
        */

        // Reset the selection for the dataspace.
        dataspace.selectNone();
    }

    // Clean up
    for(int j = 0; j < NX; j++)
        delete [] data[j];
    delete [] data;
    return 0;
}

The resulting file is correct:

HDF5 "SDS.h5" {
GROUP "/" {
   DATASET "FloatArray" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 5, 6 ) / ( 5, 6 ) }
      DATA {
      (0,0): 0, 1, 2, 3, 4, 5,
      (1,0): 1, 2, 3, 4, 5, 6,
      (2,0): 2, 3, 4, 5, 6, 7,
      (3,0): 3, 4, 5, 6, 7, 8,
      (4,0): 4, 5, 6, 7, 8, 9
      }
   }
}
}


I've been struggling with a similar question for some time too. For some reasons I need to process data stream in C++, but eventually I would like to analyze the resulting HDF in python, using benefits of numpy and matplotlib. The solution is simpler than expected. First I declare the dataspace of whatever shape I really need.

hsize_t dims[2] = {rows, cols};         
dataspace = new DataSpace(2, dims);
dataset = new DataSet(group->createDataSet("data", PredType::STD_U16LE, *dataspace));

Next I use 1D dynamic array and fill it in remembering that element [i][j] is at position [i * cols + j]

unsigned short* hits = new unsigned short[cols * rows]; (...) hits[i * cols + j] = foo; (...) Now the fun part. Since DataSet.write takes void* it does not care about what you pass. It just takes contiguous array of elements, and the shape is interpreted by the DataSpace definition. Since our dynamic array is contiguous, of correct overall size and elements ordering, you may just simply write it.

dataset->write(hits, PredType::STD_U16LE);

The resulting array is correctly interpreted as 2D if you read your HDF5 file later on.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜