Is it possible to make a shallow copy of very large STL strings?
Good afternoon, We are building a prototype of a deduper. We are using a array of STL strings to store the records to be depuped. The array looks like this:
std::string* StringArray = new std::string[NumberDedupeRecor开发者_JS百科ds]
The records are very large, as large as 160,000,000 bytes. When we try to store a std::string
version of a record to deduped in the std::string* StringArray
, STL makes a deep copy of the string and mallocs a new buffer of at least 160,000,000 bytes. We quickly run out of heap memory and get a std::bad_alloc exception
. Is there a workaround to avoid the deep copy and std::bad_alloc
? Perhaps we should use a new data structure for storing the std::string
records to be deduped or maybe we should save auto_ptr
's.
We show a code snippet here:
std::string clara5(curr.getPtr());
char* const maryptr = (curr.getPtr() + n - curr.low());
maryptr[54] = '\x0';
StringArray[StringArrayCount] = clara5;
curr.mPtr = (char*)StringArray[StringArrayCount].c_str();
std::multiset<Range>::iterator miter5 = ranges_type.lower_bound(Range(n));
(*miter5).mPtr = curr.mPtr; StringArrayCount += 1;
Thank you.
You can simply take a pointer or reference to the original std::string
- including smart pointer if you find it necessary to enforce various ownership strategems.
If possible, rather than trying to use smart pointers, you may want to change your code so that you only have a few instances of std::string
in memory at a time. This of course will depend on your access patterns, but you may be able to load and process one string (record) at a time rather than allocating an array for all of them at once.
EDIT: Given that the OP is trying to remove duplicates, this may not work very well.
I think the real answer to your problem is to use a rope - see http://www.sgi.com/tech/stl/Rope.html - std::string is not really designed to be use for very large strings.
精彩评论