开发者

how to match "ABC-123" but not "XABC-123" in a regular expression

开发者_如何学编程

I have this egrep search:

egrep -is "(ABC-[0-9]+)"

which matches ABC-123 anywhere in a string.

I'd like it to ignore XABC-456 or YABC-789.

In other words, those examples should output "ok":

echo "ABC-123" | egrep -is "(ABC-[0-9]+)" && echo "ok"
echo "test ABC-123" | egrep -is "(ABC-[0-9]+)" && echo "ok"

But this shouldn't:

echo "XABC-123" | egrep -is "(<fill in>ABC-[0-9]+)" && echo "ok"

I tried this without any luck (no output):

echo "ABC-123" | egrep -is "(\bABC-[0-9]+)" && echo "ok"

(I'm running Solaris 10)

How can I do that?


It look like you're looking for \bABC-[0-9]+ - Word Boundaries.

Another option is to use a negetive lookbedind, whci gives you more control over what can and cannot be before the match: (?<![a-z])ABC-[0-9]+.


This should do :

^(ABC-[0-9]+)

This way you're telling you want the line to start with your regexp.


If \b doesn't work for you, have you tried ((^| )ABC-[0-9]+)?


Try the following:

echo "XABC-123" | egrep -is "(\bABC-[0-9]+)" && echo "ok"

There are a couple solutions that propose using ^ (starts with...) however, they will fail if you are looking at " ABC-123" which you might want to catch. Word boundaries is probably what you want, unless you are looking for starts with...

Here's some sample output:

tim@Ikura ~
$ echo " ABC-123" | egrep -is "(\bABC-[0-9]+)" && echo "ok"
 ABC-123
ok

tim@Ikura ~
$ echo "ABC-123" | egrep -is "(\bABC-[0-9]+)" && echo "ok"
ABC-123
ok

tim@Ikura ~
$ echo "XABC-123" | egrep -is "(\bABC-[0-9]+)" && echo "ok"

tim@Ikura ~
$

Update: Solaris issues... "Searching for a word isn't quite as simple as it at first appears. The string "the" will match the word "other". You can put spaces before and after the letters and use this regular expression: " the ". However, this does not match words at the beginning or end of the line. And it does not match the case where there is a punctuation mark after the word.

There is an easy solution. The characters "\<" and ">" are similar to the "^" and "$" anchors, as they don't occupy a position of a character. They do "anchor" the expression between to only match if it is on a word boundary. The pattern to search for the word "the" would be "\<[tT]he>". The character before the "t" must be either a new line character, or anything except a letter, number, or underscore. The character after the "e" must also be a character other than a number, letter, or underscore or it could be the end of line character."

tim@Ikura ~
$ echo "XABC-123" | egrep -is "(\<ABC-[0-9]+\>)" && echo "ok"

tim@Ikura ~
$ echo " ABC-123" | egrep -is "(\<ABC-[0-9]+\>)" && echo "ok"
 ABC-123
ok


echo "XABC-123" | egrep -is "^ABC-[0-9]+" && echo "ok"

EDIT: To accept ABC when anything but a letter precedes it:

echo "XABC-123" | egrep -is "(^|[^A-Z])ABC-[0-9]+" && echo "ok"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜