Lua pattern matching for extracting hard coded strings in code base
I'm working with a C++ code base. Right now I'm using a C++ code calling lua script to look through the entire code base and hopefully return a list of all of the strings which are used in the program.
The strings in question are always preceded by a JUCE macro called TRANS. Here are some examples which should extract a string
TRANS("Normal")
TRANS ( "With spaces" )
TRANS("")
TRANS("multiple"" ""quotations")
TRANS(")")
TRANS("spans \
multiple \
lines")
And I'm sure you can imagine some other possible string varients that could occur in a large code base. I'm making an automatic tool to generate JUCE translation formatted files to automate the process as much as possible
I've gotten this far, as it stands, for pattern matching in order to find these strings. I've converted the source code into a lua string
path = ..开发者_JAVA百科.
--Open file and read source into string
file = io.open(path, "r")
str = file:read("*all")
and called
for word in string.gmatch(string, 'TRANS%s*%b()') do print(word) end
which finds a pattern that starts with TRANS, has balanced parenthesis. This will get me the full Macro, including the brackets but from there I figured it would be pretty easy to split off the fat I don't need and just keep the actual string value.
However this doesn't work for strings which cause a parenthesis imbalance.
e.gTRANS(")")
will return TRANS(")
, instead of TRANS("(")
I revised my pattern to
for word in string.gmatch(string, 'TRANS%s*(%s*%b""%s*') do print(word) end
where, the pattern should start with a TRANS, then 0 or many spaces. Then it should have a ( character followed by zero or more spaces. Now that we are inside the brackets, we should have a balanced number of "" marks, followed by another 0 or many spaces, and finally ended by a ) . Unfortunately, this does not return a single value when used. But... I think even IF it worked as I expected it to... There can be a \"
inside, which causes the bracket imbalance.
Any advice on extracting these strings? Should I continue to try and find a pattern matching sequence? or should I try a direct algorithm... Do you know why my second pattern returned no strings? Any other advice! I'm not looking to cover 100% of all possibilities, but being close to 100% would be awesome. Thanks! :D
I love Lua patterns as much as anyone, but you're bringing a knife to a gun fight. This is one of those problems where you really don't want to code the solution as regular expressions. To deal correctly with doublequote marks and backslash escapes, you want a real parser, and LPEG will manage your needs nicely.
In the second case, you forgot to escape parentheses. Try
for word in string.gmatch(str, 'TRANS%s*%(%s*(%b"")%s*%)') do print(word) end
精彩评论