Java Regular Expressions
Hallo,
I have the following syntax:
@AAAA{tralala10aa,
author = {Some Author},
title = {Some Title},
booktitle = {Some Booktitle},
year = {2010},
month = {March},
booktitle_short = {CC 2010},
conference_url = {http://www.mmmm.com},
projects = {projects}
}
....
I've made the following regular expression:
@[A-Z]*[{][a-z0-9]*[,]
but I need the whole text blo开发者_如何转开发ck. How can I do it ?
It seems like you would be much better off using a context-free grammar instead of a regular expression in this case. Consider using a parser generator, such as CUP or ANTLR.
If the "block" always ends with a lone closing brace, then this maywill do it:
"(?ms)@[A-Z]+\\{.+?^\\}$"
Where (?ms)
sets the expression to "multiline" and "dotall" (so the .+
can also match newlines), and the stuff at the end matches a closing brace on a line by itself.
The question mark in the middle makes the .+
match non-greedy so it won't match all blocks up to and including the last block in the file.
If the nesting on braces is only allowed one-deep:
/@[A-Z]*{([^{}]*+|{[^{}]*+})*}/
Note the use of the possessive quantifier *+
- without it, this can take quite a long time on failed matches.
I'm not sure if Java supports it - if it doesn't, remove it, but keep in mind the poor failure-behaviour.
I would not use regex, I would tokenize the string and build up a dictionary. Sorry, this is a Python implementation (not Java):
>>> s ="""@AAAA{tralala10aa,
author = {Some Author},
title = {Some Title},
booktitle = {Some Booktitle},
year = {2010},
month = {March},
booktitle_short = {CC 2010},
conference_url = {http://www.mmmm.com},
projects = {projects}
}"""
>>>
>>> s
'@AAAA{tralala10aa,\n author = {Some Author},\n title = {Some Title},\n booktitle = {Some Booktitle},\n year = {2010},\n month = {March},\n booktitle_short = {CC 2010},\n conference_url = {http://www.mmmm.com},\n projects = {projects}\n}'
>>>
>>>
>>> lst = s.replace('@AAA', '').replace('{', '').replace('}', '').split(',\n')
>>> lst
['Atralala10aa', ' author = Some Author', ' title = Some Title', ' booktitle = Some Booktitle', ' year = 2010', ' month = March', ' booktitle_short = CC 2010', ' conference_url = http://www.mmmm.com', ' projects = projects\n']
>>> dct = dict((x[0].strip(), x[1].strip()) for x in (y.split('=') for y in lst[1:]))
>>> dct
{'booktitle_short': 'CC 2010', 'title': 'Some Title', 'booktitle': 'Some Booktitle', 'author': 'Some Author', 'month': 'March', 'conference_url': 'http://www.mmmm.com', 'year': '2010', 'projects': 'projects'}
>>>
>>> dct['title']
'Some Title'
>>>
Hopefully the code above seems self explanatory.
精彩评论