开发者

Using String methods instead of Regex

as i am not very familiar with regex, is it possible (whether its hard to do or not) to extract certain text inbetween symbols? f开发者_如何学Pythonor example:

<meta name="description" content="THIS IS THE TEXT I WANT TO EXTRACT" />


Since you give an xml example, just use an xml parser:

string s = (string) XElement.Parse(xml).Attribute("content");

xml is not a simple text format, and Regex isn't really a very good fit; using an appropriate tool will protect you from a range of evils... for example, the following is identical as xml:

<meta
    name="description"
    content=
        'THIS IS THE TEXT I WANT TO EXTRACT'
/>

It also means that when the requirement changes, you have a simple tweak to make to the code, rather than trying to unpick a regex and put it back together again (which can be tricky if you are access a non-trivial node). Equally, xpath might be an option; so in your data the xpath:

/meta/@content

is all you need.

If you haven't got .NET 3.5:

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
string s = doc.DocumentElement.GetAttribute("content");


Sure, you can identify the start and the end of your desired substring by string methods such as IndexOf, then get the desired Substring! In your example, you want to locate (with IndexOf) the "contents=" and then the first following ", right? And once you have those indices into the string, Substring will work fine. (Not posting C# code because I'm not entirely sure of what exactly it IS that you want, beyond IndexOf and Substring...!-)

If so, then:

int first = str.IndexOf("contents=\"");
int last = str.IndexOf("\"", first + 10);
return str.Substring(first + 10, last - first - 10);

should more or less do what you want (apologies in again if there's an off-by-one or so in those hardcoded 10s -- they're meant to stand for the length of the first substring you're looking for; adjust them a little bit up or down until you get exactly the result you want!-), but this is the general concept. Locate the start with single-argument IndexOf, locate the end with two-args IndexOf, slice off the desired piece with Substring...!


if the input is : text1/text2/text3

The below regex will give the 2 in the group i.e, TEXT3

^([^/]*/){2}([^/]*)/$


if you need the last text always, then use the below

^.*/([^/]*)/$


Sure you can do it with out Regex. Say you want to get the text between < and >...

string GetTextBetween(string content)
{
  int start = content.IndexOf("<");
  if(start == -1) return null; // Not found.
  int end = content.IndexOf(">");
  if(end == -1) return null;  // end not found
  return content.SubString(start, end - start);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜