Ruby:parslet for a system verilog interface parser -

i using ruby::parslet.

i parsing document similar sv interface, eg:

interface my_intf;   protocol validonly;    transmit  [bool]   valid;   transmit  [bool]   pipeid;   transmit  [5:0]    incr;   transmit  [bool]   sample;  endinterface 

here parser:

class myparse < parslet::parser   rule(:lparen)     { space? >> str('(') >> space? }   rule(:rparen)     { space? >> str(')') >> space? }   rule(:lbox)       { space? >> str('[') >> space? }   rule(:rbox)       { space? >> str(']') >> space? }   rule(:lcurly)     { space? >> str('{') >> space? }   rule(:rcurly)     { space? >> str('}') >> space? }   rule(:comma)      { space? >> str(',') >> space? }   rule(:semicolon)  { space? >> str(';') >> space? }   rule(:eof)        { any.absent? }   rule(:space)      { match["\t\s"] }   rule(:whitespace) { space.repeat }   rule(:space?)     { whitespace.maybe }   rule(:blank_line) { space? >> newline.repeat(1) }   rule(:newline)    { str("\n") }    # things   rule(:integer)    { space? >> match('[0-9]').repeat(1).as(:int) >> space? }   rule(:identifier) { match['a-z'].repeat(1) }     rule(:intf_start)     { space? >> str('interface') >> space? >> (match['a-za-z_'].repeat(1,1) >> match['[:alnum:]_'].repeat(0)).as(:intf_name) >> space? >> str(';') >> space? >> str("\n") }   rule(:protocol)       { space? >> str('protocol') >> whitespace >> (str('validonly').maybe).as(:protocol) >> space? >> str(';') >> space? >> str("\n") }   rule(:bool)           { lbox >> space? >> str('bool').as(:bool) >> space? >> rbox }   rule(:transmit_width) { lbox >> space? >> match('[0-9]') >> space? >> str(':') >> space? >> match('[0-9]') >> space? >> rbox }   rule(:transmit)       { space? >> str('transmit') >> whitespace >> (bool | transmit_width) >> whitespace >> (match['a-za-z_'].repeat(1,1) >> match['[:alnum:]_'].repeat(0)).as(:transmit_name) >> space? >> str(';') >> space? >> str("\n") }   rule(:interface_body) { (protocol | blank_line.maybe) }   rule(:interface)      { intf_start >> interface_body }    rule(:expression)     { ( interface ).repeat }    root :expression end 

i having issue making rule interface_body.

it can have 0 or more transmit lines , 0 or 1 protocol line , multiple blanks, comments etc.

can me out please? rules have written in code snippet works single transmit , single protocol, i.e. match, when parse whole interface not work.

thanks in advance.

ok... parses file mentioned. don't understand desired format can't work files, started.

require 'parslet'  class myparse < parslet::parser   rule(:lparen)     { space? >> str('(') }   rule(:rparen)     { space? >> str(')') }   rule(:lbox)       { space? >> str('[') }   rule(:rbox)       { space? >> str(']') }   rule(:lcurly)     { space? >> str('{') }   rule(:rcurly)     { space? >> str('}') }   rule(:comma)      { space? >> str(',') }   rule(:semicolon)  { space? >> str(';') }   rule(:eof)        { any.absent? }   rule(:space)      { match["\t\s"] }   rule(:whitespace) { space.repeat(1) }   rule(:space?)     { space.repeat(0) }   rule(:blank_line) { space? >> newline.repeat(1) }   rule(:newline)    { str("\n") }    # things   rule(:integer)    { space? >> match('[0-9]').repeat(1).as(:int) >> space? }   rule(:identifier) { match['a-z'].repeat(1) }    def line( expression )     space? >>      expression >>     space? >>      str(';') >>      space? >>      str("\n")       end    rule(:expression?)    { ( interface ).repeat(0) }    rule(:interface)      { intf_start >> interface_body.repeat(0) >> intf_end }    rule(:interface_body) {      intf_end.absent? >>      interface_bodyline >>      blank_line.repeat(0)   }    rule(:intf_start) {      line (        str('interface')  >>        space? >>        ( match['a-za-z_'].repeat(1,1) >>          match['[:alnum:]_'].repeat(0)).as(:intf_name)      )   }    rule(:interface_bodyline) {     line ( protocol | transmit )   }    rule(:protocol)       {      str('protocol') >> whitespace >>      (str('validonly').maybe).as(:protocol)   }    rule(:transmit)       {          str('transmit') >> whitespace >>      (bool | transmit_width) >> whitespace >>   }    rule(:name) {     match('[a-za-z_]') >>      (match['[:alnum:]'] | str("_")).repeat(0)   }    rule(:bool)           { lbox  >> str('bool').as(:bool) >> rbox }    rule(:transmit_width) {      lbox   >>      space? >>      match('[0-9]').repeat(1).as(:msb) >>      space? >>      str(':') >>      space? >>      match('[0-9]').repeat(1).as(:lsb) >>      space? >>      rbox    }    rule(:intf_end)       {  str('endinterface') }    root :expression? end    require 'rspec'   require 'parslet/rig/rspec'    rspec.describe myparse      let(:parser) { }     context "simple_rule"       "should consume protocol line"         expect(parser.interface_bodyline).to parse('  protocol validonly; ')       end        'name'         expect( parse('valid')       end       "bool"         expect(parser.bool).to parse('[bool]')       end        "transmit line"         expect(parser.transmit).to parse('transmit [bool] valid')       end        "transmit bodyline'"         expect(parser.interface_bodyline).to parse('  transmit  [bool]   valid; ')       end      end   end['--format', 'documentation'])     begin    doc ="test.txt")    rescue parslet::parsefailed => error     puts error.cause.ascii_tree   end 

the main changes...

  • don't consume whitespace both side of tokens. had expressions parsed "[bool] valid" lbox bool rbox space? expected whitespace couldn't find 1 (as previous rule had consumed it).

  • when expression can validly parse 0 length (e.g. repeat(0)) , there problem it's written, odd error. rule pass , match nothing, next rule typically fail. explicitly matched 'body lines' 'not end line' fail error.

  • 'repeat' defaults (0) love change. see mistakes around time.

  • x.repeat(1,1) means make 1 match. that's same having x. :)

  • there more whitespace problems


write parser top down. write tests bottom up. when tests top done! :)

good luck.
