How do I parse out Author information using a Regex in C#?
I have the following text:
BATTLE HYMN OF THE TIGER MOTHER, by Amy Chua. (Penguin
Press, $25.95.) A Chinese-American mother makes a case for strict
and demanding parenting
I'd like to use a regex开发者_开发技巧 to parse out:
Title
Author
Publisher
MSRP (Retail Price)
Description
How do I write a regex to do this in C#?
Just saw answers were allowed again. This is my recommended regex:
^(?<title>[\w\s]*), by (?<author>[\w\s]*)\. \((?<publisher>[\w\s]*), (?<msrp>.*)\.\) (?<description>.*)$
It will give you a named capture for the fields above and can be used in C# like this:
private void Main()
{
string input = "BATTLE HYMN OF THE TIGER MOTHER, by Amy Chua. (Penguin Press, $25.95.) A Chinese-American mother makes a case for strict and demanding parenting";
string pattern = @"^(?<title>[\w\s]*), by (?<author>[\w\s]*)\. \((?<publisher>[\w\s]*), (?<msrp>.*)\.\) (?<description>.*)$";
MatchCollection myMatchCollection = Regex.Matches(input, pattern);
foreach (Match myMatch in myMatchCollection)
{
var title = myMatch.Groups["title"];
var author = myMatch.Groups["author"];
var publisher = myMatch.Groups["publisher"];
var msrp = myMatch.Groups["msrp"];
var description = myMatch.Groups["description"];
}
}
I think it might be simpler to:
- Split on "(" or ")"
- Split on "by" for the left part
- Split on ", " for the middle part
- right part is your description
Using the string.Split() method.
This all of course depends on how reliable the pattern is--as the above commenters mention.
This does it:
^([ \w]+), by ([ \w]+). \(([ \w]+), ([$.\d]+)\) ([ \w-]+)$
You can add named groups to pull them out by name or just the matches by index. However this will most likely be incredibly brittle unless your source data is very strict.
I've also only done it for this one example, the description has a -
in it, which is an example of a special character in the names, so you might want to make sure those are handled as you expect.
精彩评论