I believe this should be one rule in Treetop
I have this working pair of rules in Treetop that the perfectionist in me believes should be one and only one rule, or maybe something more beautiful at least:
rule _
crap
/
" "*
end
rule crap
" "* "\\x0D\\x0A"* " "*
end
I'm parsing some expressions that开发者_如何学编程 every now and then ended up with "\x0D\x0A". Yeah, not "\r\n" but "\x0D\x0A". Something was double escaped at some point. Long story.
That rule works, but it's ugly and it bothers me. I tried this:
rule _
" "* "\\x0D\\x0A"* " "*
/
" "*
end
which caused
SyntaxError: (eval):1276:in `load_from_string': compile error
(eval):1161: class/module name must be CONSTANT
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:42:in `load_from_string'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:35:in `load'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `open'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `load'
Ideally I would like to actually write something like:
rule _
(" " | "\\x0D\\x0A")*
end
but that doesn't work, and while we are at it, I also discovered that you can't have only one * per rule:
rule _
" "*
/
"\n"*
end
that will match " ", but never \n.
I see you're using three different OR
chars: /
, |
and \
(of which only the first means OR
).
This works fine:
grammar Language
rule crap
(" " / "\\x0D\\x0A")* {
def value
text_value
end
}
end
end
#!/usr/bin/env ruby
require 'rubygems'
require 'treetop'
require 'polyglot'
require 'language'
parser = LanguageParser.new
value = parser.parse(' \\x0D\\x0A \\x0D\\x0A ').value
print '>' + value + '<'
prints:
> \x0D\x0A \x0D\x0A <
You said "I also discovered that you can't have only one * per rule" (you mean: you CAN have), "that will match " ", but never \n".
Of course; the rule succeeds when it matches zero space characters. You could just use a + instead:
rule _
" "+
/
"\n"*
end
You could also parenthesise the space characters if you want to match any number of space-or-newline characters:
rule _
(" " / "\n")*
end
Your error "class/module name must be CONSTANT" is because the rule name is used as the prefix of a module name to contain any methods attached to your rule. A module name may not begin with an underscore, so you can't use methods in a rule whose name begins with an underscore.
精彩评论