C++ Storing a list of addresses to array to parse raw non terminated text?
I'm just starting out programming but I've had a lot of ideas about how to make my life easier when parsing files by making a program that maps addresses of data when read into memory from a file.
Note: I cut down the wall text here's the problem in a nutshell
How does one parse an array of chars with no null terminator but th开发者_StackOverflowe words all begin with uppercase letters so Capital can be used as delimiter?
Basically I want to parse text file that is just 'WordWordWord' and send each word to a to it's own separate string variable then be able to write each word to a text file with a newline added.
I wanted to do some more advanced stuff but I was asked to cut the wall of text so that will do for now :)
//pointers and other values like file opening were declared
int len = (int) strlen( words2 );
cout << "\nSize of Words2 is : " << len << " bytes\n";
// Loops through array if uppercase then...
for (int i = 0; i < len; i++)
{
if (isupper(words2[i]))
{
// Output the contents of words2
cout << "\n Words2 is upper : " << words2[i] << "\n";
b1 = &words2[i];
//output the address of b1 and the intvalue of words2[var]
cout << "\nChar address is " << &b1 << " word address is " << (int) words2[i] << "\n";
cout << "\nChar string is " << b1 << " address +1 "<< &b1+1 <<"\n and string is " << b1+1 << "\n";
}
cout << "\nItem I is : i " << i << " and words2 is " << words2[i] << "\n";
}
fin.clear();
fin.close();
fout.close();
Easy. Use Boost.Tokenizer, with char_separator("", "ABCDEFGHIJKLMNOPQRSTUVWXYZ")
. ""
is the set of dropped separators, and A-Z is the set of kept separators. (If you'd used A-Z as dropped separators, you'd get ord ord ord
because you'd drop the W.)
Since you also
wanted to do some more advanced stuff
I would have a look a Boost.Regex from the get go. This is a good library for doing textual manipulations.
vector<char *> parsedStrings;
char * words = "HelloHelloHello";
int stringStartAddress = 0;
for (int i = 0; i <= strlen(words); i++)
{
/* Parses word if current char is uppercase or
if it's the last char and an uppercase char was previously matched */
if (isupper(words[i]) || ((i == strlen(words)) && (stringStartAddress != 0)))
{
// Current char is first uppercase char matched, so don't parse word
if (stringStartAddress == 0)
{
stringStartAddress = ((int)(words + i));
continue;
}
int newStringLength = ((int)(words + i)) - stringStartAddress;
char * newString = new char[newStringLength + 1];
// Copy each char from previous uppercase char up to current char
for (int j = 0; j < newStringLength; j++)
{
// Cast integer address of char to a char pointer and then get the char by dereferencing the pointer
// Increment address to that of the next char
newString[j] = *((char *)stringStartAddress++);
}
newString[newStringLength] = '\0'; // add null-terminator to string
parsedStrings.push_back(newString);
}
}
精彩评论