i using ruby::parslet.
i parsing document similar sv interface, eg:
interface my_intf; protocol validonly; transmit [bool] valid; transmit [bool] pipeid; transmit [5:0] incr; transmit [bool] sample; endinterface
here parser:
class myparse < parslet::parser rule(:lparen) { space? >> str('(') >> space? } rule(:rparen) { space? >> str(')') >> space? } rule(:lbox) { space? >> str('[') >> space? } rule(:rbox) { space? >> str(']') >> space? } rule(:lcurly) { space? >> str('{') >> space? } rule(:rcurly) { space? >> str('}') >> space? } rule(:comma) { space? >> str(',') >> space? } rule(:semicolon) { space? >> str(';') >> space? } rule(:eof) { any.absent? } rule(:space) { match["\t\s"] } rule(:whitespace) { space.repeat } rule(:space?) { whitespace.maybe } rule(:blank_line) { space? >> newline.repeat(1) } rule(:newline) { str("\n") } # things rule(:integer) { space? >> match('[0-9]').repeat(1).as(:int) >> space? } rule(:identifier) { match['a-z'].repeat(1) } rule(:intf_start) { space? >> str('interface') >> space? >> (match['a-za-z_'].repeat(1,1) >> match['[:alnum:]_'].repeat(0)).as(:intf_name) >> space? >> str(';') >> space? >> str("\n") } rule(:protocol) { space? >> str('protocol') >> whitespace >> (str('validonly').maybe).as(:protocol) >> space? >> str(';') >> space? >> str("\n") } rule(:bool) { lbox >> space? >> str('bool').as(:bool) >> space? >> rbox } rule(:transmit_width) { lbox >> space? >> match('[0-9]').repeat.as(:msb) >> space? >> str(':') >> space? >> match('[0-9]').repeat.as(:lsb) >> space? >> rbox } rule(:transmit) { space? >> str('transmit') >> whitespace >> (bool | transmit_width) >> whitespace >> (match['a-za-z_'].repeat(1,1) >> match['[:alnum:]_'].repeat(0)).as(:transmit_name) >> space? >> str(';') >> space? >> str("\n") } rule(:interface_body) { (protocol | blank_line.maybe) } rule(:interface) { intf_start >> interface_body } rule(:expression) { ( interface ).repeat } root :expression end
i having issue making rule interface_body
.
it can have 0 or more transmit
lines , 0 or 1 protocol
line , multiple blanks, comments etc.
can me out please? rules have written in code snippet works single transmit
, single protocol
, i.e. match, when parse whole interface not work.
thanks in advance.
ok... parses file mentioned. don't understand desired format can't work files, started.
require 'parslet' class myparse < parslet::parser rule(:lparen) { space? >> str('(') } rule(:rparen) { space? >> str(')') } rule(:lbox) { space? >> str('[') } rule(:rbox) { space? >> str(']') } rule(:lcurly) { space? >> str('{') } rule(:rcurly) { space? >> str('}') } rule(:comma) { space? >> str(',') } rule(:semicolon) { space? >> str(';') } rule(:eof) { any.absent? } rule(:space) { match["\t\s"] } rule(:whitespace) { space.repeat(1) } rule(:space?) { space.repeat(0) } rule(:blank_line) { space? >> newline.repeat(1) } rule(:newline) { str("\n") } # things rule(:integer) { space? >> match('[0-9]').repeat(1).as(:int) >> space? } rule(:identifier) { match['a-z'].repeat(1) } def line( expression ) space? >> expression >> space? >> str(';') >> space? >> str("\n") end rule(:expression?) { ( interface ).repeat(0) } rule(:interface) { intf_start >> interface_body.repeat(0) >> intf_end } rule(:interface_body) { intf_end.absent? >> interface_bodyline >> blank_line.repeat(0) } rule(:intf_start) { line ( str('interface') >> space? >> ( match['a-za-z_'].repeat(1,1) >> match['[:alnum:]_'].repeat(0)).as(:intf_name) ) } rule(:interface_bodyline) { line ( protocol | transmit ) } rule(:protocol) { str('protocol') >> whitespace >> (str('validonly').maybe).as(:protocol) } rule(:transmit) { str('transmit') >> whitespace >> (bool | transmit_width) >> whitespace >> name.as(:transmit_name) } rule(:name) { match('[a-za-z_]') >> (match['[:alnum:]'] | str("_")).repeat(0) } rule(:bool) { lbox >> str('bool').as(:bool) >> rbox } rule(:transmit_width) { lbox >> space? >> match('[0-9]').repeat(1).as(:msb) >> space? >> str(':') >> space? >> match('[0-9]').repeat(1).as(:lsb) >> space? >> rbox } rule(:intf_end) { str('endinterface') } root :expression? end require 'rspec' require 'parslet/rig/rspec' rspec.describe myparse let(:parser) { myparse.new } context "simple_rule" "should consume protocol line" expect(parser.interface_bodyline).to parse(' protocol validonly; ') end 'name' expect(parser.name).to parse('valid') end "bool" expect(parser.bool).to parse('[bool]') end "transmit line" expect(parser.transmit).to parse('transmit [bool] valid') end "transmit bodyline'" expect(parser.interface_bodyline).to parse(' transmit [bool] valid; ') end end end rspec::core::runner.run(['--format', 'documentation']) begin doc = file.read("test.txt") myparse.new.parse(doc) rescue parslet::parsefailed => error puts error.cause.ascii_tree end
the main changes...
don't consume whitespace both side of tokens. had expressions parsed "[bool] valid" lbox bool rbox space? expected whitespace couldn't find 1 (as previous rule had consumed it).
when expression can validly parse 0 length (e.g. repeat(0)) , there problem it's written, odd error. rule pass , match nothing, next rule typically fail. explicitly matched 'body lines' 'not end line' fail error.
'repeat' defaults (0) love change. see mistakes around time.
x.repeat(1,1) means make 1 match. that's same having x. :)
there more whitespace problems
so....
write parser top down. write tests bottom up. when tests top done! :)
good luck.
Comments
Post a Comment