Tokenize a string with delim of strings
If I have a string 开发者_StackOverflowlike
"This is a string that will be split by this and that"
I would like to get the split results as
- "is a string that will be split by"
- "and that"
- "this is a string"
- "will be split by this and"
1 and 2 are split by "this" 3 and 4 are split by "that"
My solution is use a map of string to string and store the result in another map of the same type-string to string. However, for more complex and longer text, the results stored in the map become repeated, i.e as in the above 1 and 3 the substring "is a string" is repeated and this redundancy produces incorrect statistical results.
Would you please offer a neat better solution to tokenizing a long string with delimiters that are different long strings?
string myString = "This is a string that will be splitted by this and that";
string foo = myString.ToUpper();
string[] byThis = foo.Split(new string[] { "THIS" }, StringSplitOptions.RemoveEmptyEntries);
string[] byThat = foo.Split(new string[] { "THAT" }, StringSplitOptions.RemoveEmptyEntries);
string[] all = foo.Split(new string[] { "THAT", "THIS" }, StringSplitOptions.RemoveEmptyEntries);
Or you can use Regex for that
string[] all = System.Text.RegularExpressions.Regex.Split(myString, "your pattern", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
精彩评论