开发者

Remove text (& brackets) from a string with regex

It's easy when you understand...unfortunately, I don't! I will deeply appreciate you if you can guide me to the answer, thanks.

I want to capture a string, using just regex, but remove any text that's within brackets. e.g.

This is a typical string...

<td class="rc_entry_alt" >Mark Anthony (IRE)</td>

I can capture "Mark Anthony (IRE)" very easily. I'm currently using...

/<td class="rc_entry(_alt)?" >.*<\/td>/

What i'd like is to remove the " (IRE)". Note the preceding sp开发者_运维技巧ace prior to the first bracket. I want to remove this too. Also, the text between the ( and ) will vary, e.g. USA, ITY, FR, etc. It should look like this...

Mark Anthony

I've no doubt it's very simple, and yet it eludes me. Thanks for your time :)

n.b. The stuff in brackets isn't always there. Sometimes I get what I want with the original code I mentioned.


Your Regexp would look something like that. The acutal Syntax depends on your programming language / tool.

First you need to match the <td ..> part. Then you capute everything upto (. then to be sure match everything in brackets followed by </td>.

/<td[^>].*>\([^(]*\)(.*)</td>/

You should read the Book: Mastering Regular Expressions by Jeffrey Friedl.


Okay, so remove the HTML first, then do something like this to remove the (...) part:

\s+\(.*?\)

If you know the (...) part is the very last thing in the string (i.e. there's nothing after it), you can use this to check that it's at the end, too:

\s+\(.*?\)$

Just use a Regex find and replace function, find the expression above, and replace with nothing.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜