开发者

regular expression for matching correct string

i have a string:

Recent overs</b> <tt>. . . . . . <b>|</b> 3 . . 1b 4 .<b>|</b> 1 1 1 . . 4 &开发者_开发问答lt;b>|</b> . . . 4 . .</tt></p>

It is all in a single line, so how would I extract only the information about the balls, ie output should be . . . . . . 3 . . 1b 4 . 1 1 1 . . 4 . . . 4 . .

The closest i got was with [^(Recent overs|<b>|<tt>|</b>|</tt>|</p>)]+, but it matches the 1 and not 1b.


First, the brackets [] are used for creating what is called a "character class" - this is meant to represent a single character. Your code effectively says don't match these characters: (Recntovrsbp|<>/

You'd be better off using a regex to remove the unwanted strings, then it's easier to parse the result, like this:

Javascript, because you didn't specify the language

var s = "Recent overs</b> <tt>. . . . . . <b>|</b> 3 . . 1b 4 .<b>|</b> 1 1 1 . . 4 <b>|</b> . . . 4 . .</tt></p>";
s = s.replace(/(Recent overs|<[^>]+>|\|)/ig, '');

jsfiddle example

The resulting 's' is much easier to parse.


Try \s[\d\.][\w]* to match all digit (possibly followed by word) characters or points preceeded by a space!


Based solely on the example you gave, you could try something like:

/(?<>)[a-z\d\s\.]+/g

Alternative, in case your regex engine doesn't support lookbehinds:

/>([a-z\d\s\.]+)/g     #Matches will be in the first capture group.

However, it's a little hard to infer the rules of what should/should not be allowed based on the small sample you gave, and your output sample doesn't make much sense to me as a data structure. It seems like you might be better off using an HTML parser for this, since using regex to process HTML is frequently a bad idea.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜