help with regex - extracting text
Suppose I have some text files (f1.txt, f2.txt, ...) that looks something like
@article {paper1,
author = {some author},
title = {some {T}itle} ,
journal = {journal},
volume = {16},
number = {4},
publisher = {John Wiley & Sons, Ltd.},
issn = {some number},
url = {some url},
doi = {some number},
pages = {1},
year = {1997},
}
I want to extract the content of title and store it in a bash va开发者_StackOverflow社区riable (call it $title), that is, "some {T}itle" in the example. Notice that there may be curly braces in the first set of braces. Also, there might not be white space around "=", and there may be more white spaces before "title".
Thanks so much. I just need a working example of how to extract this and I can extract the other stuff.
Give this a try:
title=$(sed -n '/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {s///; s/}[^}]*$//p}' inputfile)
Explanation:
/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {
- If a line matches this regexs///
- delete the matched portions/}[^}]*$//p
- delete the last closing curly brace and every character that's not a closing curly brace until the end of the line and print
}
- end if
title=$(sed -n '/title *=/{s/^[^{]*{\([^,]*\),.*$/\1/;s/} *$//p}' ./f1.txt)
/title *=/
: Only act upon lines which have the word 'title' followed by a '=' after an arbitrary number of spacess/^[^{]*{\([^,]*\),.*$/\1/
: From the beginning of the line look for the first '{' character. From that point save everything you find until you hit a comma ','. Replace the entire line with everything you saveds/} *$//p
: strip off the trailing brace '}' along with any spaces and print the result.title=$(sed -n ... )
: save the result of the above 3 steps in the bash variable namedtitle
There are definitely more elegant ways, but at 2:40AM:
title=`cat test | grep "^\s*title\s*=\s*" | sed 's/^\s*title\s*=\s*{?//' | sed 's/}?\s*,\s*$//'`
Grep for the line that interests us, strip everything up to and including the opening curly, then strip everything from the last curly to the end of the line
精彩评论