Detect particular tokens in a string. C#
I have a very large string (HTML) and in this HTML there is particular tokens where all of them starts with "#" and ends with "#"
Simple Eg开发者_如何学JAVA
<html>
<body>
<p>Hi #Name#, You should come and see this #PLACE# - From #SenderName#</p>
</body>
</html>
I need a code that will detect these tokens and will put it in a list. 0 - #Name# 1 - #Place# 2 - #SenderName#
I know that I can use Regex maybe, anyway have you got some ideas to do that?
You can try:
// using System.Text.RegularExpressions;
// pattern = any number of arbitrary characters between #.
var pattern = @"#(.*?)#";
var matches = Regex.Matches(htmlString, pattern);
foreach (Match m in matches) {
Console.WriteLine(m.Groups[1]);
}
Answer inspired in this SO question.
Yes you can use regular expressions.
string test = "Hi #Name#, You should come and see this #PLACE# - From #SenderName#";
Regex reg = new Regex(@"#\w+#");
foreach (Match match in reg.Matches(test))
{
Console.WriteLine(match.Value);
}
As you might have guessed \w denotes any alphanumeric character. The + denotes that it may appear 1 or more times. You can find more info here msdn doc (for .Net 4. You'll find other versions there as well).
A variant without Regex
if you like:
var splitstring = myHtmlString.Split('#');
var tokens = new List<string>();
for( int i = 1; i < splitstring.Length; i+=2){
tokens.Add(splitstring[i]);
}
foreach (Match m in Regex.Matches(input, @"#\w+#"))
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
try this
var result = html.Split('#')
.Select((s, i) => new {s, i})
.Where(p => p.i%2 == 1)
.Select(t => t.s);
Explanation:
line1 - we split the text by the character '#'
line2 - we select a new anonymous type, which includes the strings position in the array, and the string itself
line3 - we filter the list of anonymous objects to those that have an odd index value - effectively picking 'every other' string - this fits in with finding those strings that were wrapped in the hash character, rather than those outside
line4 = we strip away the indexer, and return just the string from the anonymous type
Use:
MatchCollection matches = Regex.Matches(mytext, @"#(\w+)#");
foreach(Match m in matches)
{
Console.WriteLine(m.Groups[1].Value);
}
Naive solution:
var result = Regex
.Matches(html, @"\#([^\#.]*)\#")
.OfType<Match>()
.Select(x => x.Groups[1].Value)
.ToList();
Linq solution:
string s = @"<p>Hi #Name#,
You should come and see this #PLACE# - From #SenderName#</p>";
var result = s.Split('#').Where((x, y) => y % 2 != 0).Select(x => x);
Use the Regex.Matches
method with a pattern of something like
#[^#]+#
for the pattern.
Which is possibly the most naive way.
This might then need to be adjusted if you wish to avoid including the '#' characters in the output match, possibly with a lookaround:
(?<=#)[^#]+(?=#)
(A match value for this would be 'hello' not '#hello#' - so you don't have to do any more trimming)
This gives you a list of the tokens as requested:
var tokens = new List<string>();
var matches = new Regex("(#.*?#)").Matches(html);
foreach (Match m in matches)
tokens.Add(m.Groups[1].Value);
Edit: If you don't want the pound characters included, just move them outside the parentheses in the Regex string (see Pablo's answer).
精彩评论