开发者

Simplest treetop grammar is returning a parse error, just learning

I'm trying to learn treetop and was taking most of the code from https://github.com/survival/lordbishop for parsing names and was going to build from that.

My structure is a bit different because I'm building it in rails, rather than ruby command line.

When I run a very simple parse, I have a parse error being returned on a space (which should be one of the simpler things in my grammar. What am I doing wrong?

My code is fairly simple, in my model

require 'treetop'
require 'polyglot'

require 'grammars/name'

class Name
      def self.parse(data)
           parser = FullNameParser.new
           tree = parser.parse(data)
           if tree.nil?
              return "Parse error at offset: #{parser.index}"
           end
           result_hash = {}
           tree.value.each do |node|
              result_hash[node[0] = node[1].strip if node.is_a?(Array) && !node[1].blank?
           end
           return result_hash
      end
end

I've stripped most of the grammar down to just getting words and spaces

grammar FullName
    rule word
        [^\s]+ {
        def value
            text_value
        end
        }
    end

    rule s
        [\s]+ {
        def value
            ""
        end
        }
    end
end

I'm trying to parse 'john smith',i was hoping to just开发者_JS百科 get back words and spaces and build my logic from there, but I'm stuck at even this simple level. Any suggestions??


AFAIK, treetop starts parsing with the first rule in your grammar (the rule word, in your case!). Now, if you input is 'John Smith' (i.e.: word, s, word), it stops parsing after matching the rule word for the first time. And produces an error when it encounters the first s since word does not match s.

You need to add a rule to the top of your grammar that describes an entire name: that is a word, followed by a space followed by a word, etc.

grammar FullName

  rule name
    word (s word)* {
      def value
        text_value
      end
    }
  end

  rule word
    [^\s]+ {
      def value
        text_value
      end
    }
  end

  rule s
    [\s]+ {
      def value
        text_value
      end
    }
  end

end

A quick test with the script:

#!/usr/bin/env ruby

require 'rubygems'
require 'treetop'
require 'polyglot'
require 'FullName'

parser = FullNameParser.new
name = parser.parse('John Smith').value
print name

will print:

John Smith
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜