Regex questions

2023-02-15 08:09 问答作者：

I'm trying to get some text from a large text file, the text I'm looking for is:

Type:Production Color:Red

I pass the whole text in the following method to get (Type:Production , Color:Red)

  private static void FindKeys(IEnumerable<string> keywords, string source)
    {
        var found = new Dictionary<string, string>(10);
        var keys = string.Join("|", keywords.ToArray());
        var matches = Regex.Matches(source, @"(?<key>" + @"\B\s" + keys + @"\B\s" + "):",
                              RegexOptions.Singleline);

        foreach (Match m in matches)
        {
            var key = m.Groups["key"].ToString();
            var start = m.Index + m.Length;
            var nx = m.NextMatch();
            var end = (nx.Success ? nx.Index : source.Length);
            found.Add(k开发者_如何学Goey, source.Substring(start, end - start));

        }

        foreach (var n in found)
        {
            Console.WriteLine("Key={0}, Value={1}", n.Key, n.Value);
        }
    }
}

My problems are the following:

The search returns _Type: as well, where I only need Type:
The search return Color:Red/n/n/n/n/n (with the rest of the text, where I only need Color:Red

So, basically: - How can I force Regex to get the exact match for Type and ignore _Type - How to get only the text after : and ignore /n/n/ and any other text

I hope this is clear

Thanks,

Your regex currently looks like this:

(?<key>\B\sWord1|Word2|Word3\B\s):

I see the following issues here:

First, Word1|Word2|Word3 should be put in parenthesis. Otherwise, it will search for \B\sWord1 or Word2 or Word3\B\s, which is not what you want (I guess).
Why \B\s? A non-boundary followed by a whitespace? That doesn't make sense. I guess you want just \b (= word boundary). There's no need to use it in the end, because the colon already constitutes a word boundary.

So, I would suggest to use the following. It will fix the _Type problem, because there is no word boundary between _ and Type (since _ is considered to be a word character).

\b(?<key>Word1|Word2|Word3):

If the text following the key is always just a single word, I'd match it in the regex as well: (\s* allows for whitespace after the colon, I don't know if you need this. \w+ ensures that only word characters -- i.e. no line breaks etc. -- are matched as the value.)

\b(?<key>Word1|Word2|Word3):\s*(?<value>\w+)

Then you just need to iterate through all the matches and extract the key and value groups. No need for any string operations or index arithmetic.

So if I understand correctly, you have:

Pairs of key:values
Each pair is separated by a space
Within each pair, the key and value is separated by “:”

Then I would not use regex at all. I would:

use String.Split(' ') to get an array of pairs
loop over all the pairs
use String.Split(':') to get the key and value from each pair

继续阅读：regex

Regex questions

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？