Bash Regular Expression Condition
I have a regular expression that I need to verify. The regular expression has double quotes in it, but I can't seem to figure out how to properly escape them.
First attempt, doesn't work as the quotes are not escaped.
while read line
do
if [[ $line =~ "<a href="(.+)">HTTP</a>" ]]; then
SOURCE=${BASH_REMATCH[1]}
break
fi
done < tmp/source.html
echo "{$SOURCE}" #output = {"link.html"} (with double quotes)
How can I properly run this so the output is link.html without double quotes.
I have tried...
while read line
do
if [[ $line =~ "<a href=/"(.+)/">HTTP</a>" ]]; then
SOURCE=${BASH_REMATCH[1]}
break
fi
done < tmp/source.html
echo "{$SOURCE}" #output = {}
Without luck. Can someone please help me so I can stop beating my head on my desk? I a开发者_Python百科m not great with Bash. Thank you!
It's always best to put your regex in a variable.
pattern='<a href="(.+)">HTTP</a>'
while read line
do
if [[ $line =~ $pattern ]]; then
SOURCE=${BASH_REMATCH[1]}
break
fi
done < tmp/source.html
echo "{$SOURCE}" #output = {link.html} (without double quotes)
If you quote the right hand side (the pattern), it changes the match from regex to a simple string equal (=~
effectively becomes ==
).
As a side note, escaping is done with backslashes (\
) rather than slashes (/
), but that would not help your situation because of the outer quotes as mentioned in my previous paragraph.
$line =~ "<a href=\"(.+)\">HTTP</a>"
I recommend always use a variable when specifying the regex:
#!/bin/bash
SOURCE=
url_re='<a href="(.+)">HTTP</a>'
while read line
do
if [[ "$line" =~ $url_re ]]; then
SOURCE=${BASH_REMATCH[1]}
break
fi
done < test.txt
echo $SOURCE # http://example.com/
# test.txt contents:
# <a href="http://example.com/">HTTP</a>
Try this "<a href="""(.+)""">HTTP</a>"
Edit, well try this
"<a href="\""(.+)"\"">HTTP</a>"
or
'<a href="(.+)">HTTP</a>'
or
'<a href='\"'(.+)'\"'>HTTP</a>'
<-- this will give the right syntax in Bash, as for the regex (.+), don't know how that will play
Edit, what do you get when you use this regex "<a href=(.+)>HTTP</a>"
??
Without an intermediate variable (i.e. use the regex directly after =~), it works only if the regex pattern doesn't have certain characters (space, < or >, etc.) and you remove the quotes around the regex, or if the regex is a plain alphanumeric string
$ x='Hello'
$ [[ $x =~ ^H ]] && echo OK
OK
$ [[ $x =~ 'H' ]] && echo OK
OK
$ [[ $x =~ H ]] && echo OK
OK
I stumbled across this page while looking for an explanation on the design of bash that generally doesn't allow you to use regex directly after =~. For example
$ re='^H'
$ [[ $x =~ $re ]] && echo OK
OK
works as expected, while
$ [[ $x =~ '^H' ]] && echo OK
does not. I personally always put the regex in a variable first. But I still wonder why bash is designed this way. You can argue assigning the regex to a variable first would overall make the code look neater. Any other reason? If a regex is not supposed to be interpreted as a string, bash could use other ways to represent it. For example, Perl uses slashes, /regex/, or more explicitly m/regex/.
精彩评论