Extract string from string using RegEx in the Terminal [duplicate]

2023-01-12 16:37 问答作者：

This question already has answers here: 开发者_StackOverflow中文版 How to extract a value from a string using regex and a shell? (7 answers) Closed 3 years ago.

I have a string like first url, second url, third url and would like to extract only the url after the word second in the OS X Terminal (only the first occurrence). How can I do it?

In my favorite editor I used the regex /second (url)/ and used $1 to extract it, I just don't know how to do it in the Terminal.

Keep in mind that url is an actual url, I'll be using one of these expressions to match it: Regex to match URL

echo 'first url, second url, third url' | sed 's/.*second//'

Edit: I misunderstood. Better:

echo 'first url, second url, third url' | sed 's/.*second \([^ ]*\).*/\1/'

or:

echo 'first url, second url, third url' | perl -nle 'm/second ([^ ]*)/; print $1'

Piping to another process (like 'sed' and 'perl' suggested above) might be very expensive, especially when you need to run this operation multiple times. Bash does support regexp:

[[ "string" =~ regex ]]

Similarly to the way you extract matches in your favourite editor by using $1, $2, etc., Bash fills in the $BASH_REMATCH array with all the matches.

In your particular example:

str="first url1, second url2, third url3"
if [[ $str =~ (second )([^,]*) ]]; then
  echo "match: '${BASH_REMATCH[2]}'"
else
  echo "no match found"
fi

Output:

match: 'url2'

Specifically, =~ supports extended regular expressions as defined by POSIX, but with platform-specific extensions (which vary in extent and can be incompatible).
On Linux platforms (GNU userland), see man grep; on macOS/BSD platforms, see man re_format.

In the other answer provided you still remain with everything after the desired URL. So I propose you the following solution.

echo 'first url, second url, third url' | sed 's/.*second \(url\)*.*/\1/'

Under sed you group an expression by escaping the parenthesis around it (POSIX standard).

While trying this, what you probably forgot was the -E argument for sed.

From sed --help:

  -E, -r, --regexp-extended
                 use extended regular expressions in the script
                 (for portability use POSIX -E).

You don't have to change your regex significantly, but you do need to add .* to match greedily around it to remove the other part of string.

This works fine for me:

echo "first url, second url, third url" | sed -E 's/.*second (url).*/\1/'

Output:

url

In which the output "url" is actually the second instance in the string. But if you already know that it is formatted in between comma and space, and you don't allow these characters in URLs, then the regex [^,]* should be fine.

Optionally:

echo "first http://test.url/1, second ://test.url/with spaces/2, third ftp://test.url/3" \
     | sed -E 's/.*second ([a-zA-Z]*:\/\/[^,]*).*/\1/'

Which correctly outputs:

://example.com/with spaces/2

继续阅读：bash grep regex

Extract string from string using RegEx in the Terminal [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？