开发者

help with regex - extracting text

Suppose I have some text files (f1.txt, f2.txt, ...) that looks something like

@article {paper1,
author = {some author},
title = {some {T}itle} ,
journal = {journal},
volume = {16},
number = {4},
publisher = {John Wiley & Sons, Ltd.},
issn = {some number},
url = {some url},
doi = {some number},
pages = {1},
year = {1997},
}

I want to extract the content of title and store it in a bash va开发者_StackOverflow社区riable (call it $title), that is, "some {T}itle" in the example. Notice that there may be curly braces in the first set of braces. Also, there might not be white space around "=", and there may be more white spaces before "title".

Thanks so much. I just need a working example of how to extract this and I can extract the other stuff.


Give this a try:

title=$(sed -n '/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {s///; s/}[^}]*$//p}' inputfile)

Explanation:

  • /^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ { - If a line matches this regex
    • s/// - delete the matched portion
    • s/}[^}]*$//p - delete the last closing curly brace and every character that's not a closing curly brace until the end of the line and print
  • } - end if


title=$(sed -n '/title *=/{s/^[^{]*{\([^,]*\),.*$/\1/;s/} *$//p}' ./f1.txt)
  1. /title *=/: Only act upon lines which have the word 'title' followed by a '=' after an arbitrary number of spaces
  2. s/^[^{]*{\([^,]*\),.*$/\1/: From the beginning of the line look for the first '{' character. From that point save everything you find until you hit a comma ','. Replace the entire line with everything you saved
  3. s/} *$//p: strip off the trailing brace '}' along with any spaces and print the result.
  4. title=$(sed -n ... ): save the result of the above 3 steps in the bash variable named title


There are definitely more elegant ways, but at 2:40AM:

title=`cat test | grep "^\s*title\s*=\s*" | sed 's/^\s*title\s*=\s*{?//' | sed 's/}?\s*,\s*$//'`

Grep for the line that interests us, strip everything up to and including the opening curly, then strip everything from the last curly to the end of the line

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜