how to match "ABC-123" but not "XABC-123" in a regular expression
I have this egrep search:
egrep -is "(ABC-[0-9]+)"
which matches ABC-123 anywhere in a string.
I'd like it to ignore XABC-456 or YABC-789.
In other words, those examples should output "ok":
echo "ABC-123" | egrep -is "(ABC-[0-9]+)" && echo "ok"
echo "test ABC-123" | egrep -is "(ABC-[0-9]+)" && echo "ok"
But this shouldn't:
echo "XABC-123" | egrep -is "(<fill in>ABC-[0-9]+)" && echo "ok"
I tried this without any luck (no output):
echo "ABC-123" | egrep -is "(\bABC-[0-9]+)" && echo "ok"
(I'm running Solaris 10)
How can I do that?
It look like you're looking for \bABC-[0-9]+
- Word Boundaries.
Another option is to use a negetive lookbedind, whci gives you more control over what can and cannot be before the match: (?<![a-z])ABC-[0-9]+
.
This should do :
^(ABC-[0-9]+)
This way you're telling you want the line to start with your regexp.
If \b
doesn't work for you, have you tried ((^| )ABC-[0-9]+)
?
Try the following:
echo "XABC-123" | egrep -is "(\bABC-[0-9]+)" && echo "ok"
There are a couple solutions that propose using ^ (starts with...) however, they will fail if you are looking at " ABC-123" which you might want to catch. Word boundaries is probably what you want, unless you are looking for starts with...
Here's some sample output:
tim@Ikura ~
$ echo " ABC-123" | egrep -is "(\bABC-[0-9]+)" && echo "ok"
ABC-123
ok
tim@Ikura ~
$ echo "ABC-123" | egrep -is "(\bABC-[0-9]+)" && echo "ok"
ABC-123
ok
tim@Ikura ~
$ echo "XABC-123" | egrep -is "(\bABC-[0-9]+)" && echo "ok"
tim@Ikura ~
$
Update: Solaris issues... "Searching for a word isn't quite as simple as it at first appears. The string "the" will match the word "other". You can put spaces before and after the letters and use this regular expression: " the ". However, this does not match words at the beginning or end of the line. And it does not match the case where there is a punctuation mark after the word.
There is an easy solution. The characters "\<" and ">" are similar to the "^" and "$" anchors, as they don't occupy a position of a character. They do "anchor" the expression between to only match if it is on a word boundary. The pattern to search for the word "the" would be "\<[tT]he>". The character before the "t" must be either a new line character, or anything except a letter, number, or underscore. The character after the "e" must also be a character other than a number, letter, or underscore or it could be the end of line character."
tim@Ikura ~
$ echo "XABC-123" | egrep -is "(\<ABC-[0-9]+\>)" && echo "ok"
tim@Ikura ~
$ echo " ABC-123" | egrep -is "(\<ABC-[0-9]+\>)" && echo "ok"
ABC-123
ok
echo "XABC-123" | egrep -is "^ABC-[0-9]+" && echo "ok"
EDIT: To accept ABC
when anything but a letter precedes it:
echo "XABC-123" | egrep -is "(^|[^A-Z])ABC-[0-9]+" && echo "ok"
精彩评论