开发者

Bash Regular Expression Condition

I have a regular expression that I need to verify. The regular expression has double quotes in it, but I can't seem to figure out how to properly escape them.

First attempt, doesn't work as the quotes are not escaped.

while read line
do
  if [[ $line =~ "<a href="(.+)">HTTP</a>" ]]; then
    SOURCE=${BASH_REMATCH[1]}
    break
  fi
done < tmp/source.html

echo "{$SOURCE}" #output = {"link.html"} (with double quotes)

How can I properly run this so the output is link.html without double quotes.

I have tried...

while read line
do
  if [[ $line =~ "<a href=/"(.+)/">HTTP</a>" ]]; then
    SOURCE=${BASH_REMATCH[1]}
    break
  fi
done < tmp/source.html

echo "{$SOURCE}" #output = {}

Without luck. Can someone please help me so I can stop beating my head on my desk? I a开发者_Python百科m not great with Bash. Thank you!


It's always best to put your regex in a variable.

pattern='<a href="(.+)">HTTP</a>'
while read line
do
  if [[ $line =~ $pattern ]]; then
    SOURCE=${BASH_REMATCH[1]}
    break
  fi
done < tmp/source.html

echo "{$SOURCE}" #output = {link.html} (without double quotes)

If you quote the right hand side (the pattern), it changes the match from regex to a simple string equal (=~ effectively becomes ==).

As a side note, escaping is done with backslashes (\) rather than slashes (/), but that would not help your situation because of the outer quotes as mentioned in my previous paragraph.


$line =~ "<a href=\"(.+)\">HTTP</a>" 


I recommend always use a variable when specifying the regex:

#!/bin/bash

SOURCE=
url_re='<a href="(.+)">HTTP</a>'
while read line
do
    if [[ "$line" =~ $url_re ]]; then
        SOURCE=${BASH_REMATCH[1]}
        break
    fi
done < test.txt

echo $SOURCE # http://example.com/

# test.txt contents:
# <a href="http://example.com/">HTTP</a>


Try this "<a href="""(.+)""">HTTP</a>"

Edit, well try this

"<a href="\""(.+)"\"">HTTP</a>"

or

'<a href="(.+)">HTTP</a>'

or

'<a href='\"'(.+)'\"'>HTTP</a>' <-- this will give the right syntax in Bash, as for the regex (.+), don't know how that will play

Edit, what do you get when you use this regex "<a href=(.+)>HTTP</a>" ??


Without an intermediate variable (i.e. use the regex directly after =~), it works only if the regex pattern doesn't have certain characters (space, < or >, etc.) and you remove the quotes around the regex, or if the regex is a plain alphanumeric string

$ x='Hello'
$ [[ $x =~ ^H ]] && echo OK
OK
$ [[ $x =~ 'H' ]] && echo OK
OK
$ [[ $x =~ H ]] && echo OK
OK

I stumbled across this page while looking for an explanation on the design of bash that generally doesn't allow you to use regex directly after =~. For example

$ re='^H'
$ [[ $x =~ $re ]] && echo OK
OK

works as expected, while

$ [[ $x =~ '^H' ]] && echo OK

does not. I personally always put the regex in a variable first. But I still wonder why bash is designed this way. You can argue assigning the regex to a variable first would overall make the code look neater. Any other reason? If a regex is not supposed to be interpreted as a string, bash could use other ways to represent it. For example, Perl uses slashes, /regex/, or more explicitly m/regex/.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜