Format String : Parsing
I have a parsing question. I have a paragraph which has instances of : word
. So basically it has a colon, two spaces, a word (could be anything), then two more spaces.
So when I have those instances I want to convert the string so I have
- A new line character after
:
and the word.开发者_JAVA百科 - Removed the double space after the word.
- Replace all double spaces with new line characters.
Don't know exactly how about to do this. I'm using C# to do this. Bullet point 2 above is what I'm having a hard time doing this.
Thanks
Assuming your original string is exactly in the form you described, this will do:
var newString = myString.Trim().Replace(" ", "\n");
The Trim()
removes leading and trailing whitespaces, taking care of your spaces at the end of the string.
Then, the Replace
replaces the remaining " "
two space characters, with a "\n"
new line character.
The result is assigned to the newString
variable. This is needed, as myString
will not change - as strings in .NET are immutable.
I suggest you read up on the String
class and all its methods and properties.
You can try
var str = ": first : second ";
var result = Regex.Replace(str, ":\\s{2}(?<word>[a-zA-Z0-9]+)\\s{2}",
":\n${word}\n");
Using RegularExpressions will give you exact matches on what you are looking for.
The regex match for a colon, two spaces, a word, then two more spaces is:
Dim reg as New Regex(": [a-zA-Z]* ")
[a-zA-Z]
will look for any character within the alphabetical range. Can append 0-9
on as well if you accept numbers within the word. The *
afterwards indicated that there can be 0 or more instances of the preceding value.
[a-zA-Z]*
will attempt to do a full match of any set of contiguous alpha characters.
Upon further reading, you may use [\w]
in place of [a-zA-Z0-9]
if that's what you are looking for. This will match any 'word' character.
source: http://msdn.microsoft.com/en-us/library/ms972966.aspx
You can retrieve all the matches using reg.Matches(inputString)
.
Review http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace.aspx for more information on regular expression replacements and your options from there out
edit: Before I was using \s
to search for spaces. This will match any whitespace character including tabs, new lines and other. That is not what we want, so I reverted it back to search for exact space characters.
You can use string.TrimEnd - http://msdn.microsoft.com/en-us/library/system.string.trimend.aspx - to trim spaces at the end of the string.
The following is an example using Regular Expressions. See also this question for more info.
Basically the pattern string tells the regex to look for a colon followed by two spaces. Then we save in a capture group named "word" whatever the word is surrounded by two spaces on either side. Finally two more spaces are specified to finish the pattern.
The replace uses a lambda which says for every match, replace it with a colon, a new line, the "lone" word, and another newline.
string Paragraph = "Jackdaws love my big sphinx of quartz: fizz The quick onyx goblin jumps over the lazy dwarf. Where: buzz The crazy dogs.";
string Pattern = @": (?<word>\S*) ";
string Result = Regex.Replace(Paragraph, Pattern, m =>
{
var LoneWord = m.Groups[1].Value;
return @":" + Environment.NewLine + LoneWord + Environment.NewLine;
},
RegexOptions.IgnoreCase);
Input
Jackdaws love my big sphinx of quartz: fizz The quick onyx goblin jumps over the lazy dwarf. Where: buzz The crazy dogs.
Output
Jackdaws love my big sphinx of quartz:
fizz
The quick onyx goblin jumps over the lazy dwarf. Where:
buzz
The quick brown fox.
Note, for item 3 on your list, if you also want to replace individual occurrences of two spaces with newlines, you could do this:
Result = Result.Replace(" ", Environment.NewLine);
精彩评论