Comparison: flex/bison vs Scala combinators
Lexer side-by-side
Actually a side-by-side comparison of flex and a lexer based one scala combinators is next to impossible. Let’s take a look at some of PHP’s flex code and its corresponding part in JBJ:
...
<ST_IN_SCRIPTING>"new" {
return T_NEW;
}
<ST_IN_SCRIPTING>"clone" {
return T_CLONE;
}
<ST_IN_SCRIPTING>"var" {
return T_VAR;
}
<ST_IN_SCRIPTING>"("{TABS_AND_SPACES}("int"|
"integer"){TABS_AND_SPACES}")" {
return T_INT_CAST;
}
<ST_IN_SCRIPTING>"("{TABS_AND_SPACES}("real"|
"double"|"float"){TABS_AND_SPACES}")" {
return T_DOUBLE_CAST;
}
<ST_IN_SCRIPTING>"("{TABS_AND_SPACES}("string"|
"binary"){TABS_AND_SPACES}")" {
return T_STRING_CAST;
}
<ST_IN_SCRIPTING>"("{TABS_AND_SPACES}"array"
{TABS_AND_SPACES}")" {
return T_ARRAY_CAST;
}
...
1 case class ScriptLexer(mode: ScriptingLexerMode)
2 extends Lexer {
3 ...
4 val reserved = Set( ... "var", "new", "clone", ... )
5
6 def processIdent(name: String) =
7 if (reserved contains name.toLowerCase)
8 Keyword(name.toLowerCase)
9 else
10 Identifier(name)
11 ...
12 '(' ~> tabsOrSpaces ~> (str("int") | str("integer"))
13 <~ tabsOrSpaces <~ ')' ^^ IntegerCast
14 | '(' ~> tabsOrSpaces ~> (str("real") | str("double")
15 | str("float")) <~ tabsOrSpaces <~ ')' ^^ DoubleCast
16 | '(' ~> tabsOrSpaces ~> (str("string") |
17 str("binary")) <~ tabsOrSpaces <~ ')' ^^ StringCast
18 | '(' ~> tabsOrSpaces ~> str("array")
19 <~ tabsOrSpaces <~ ')' ^^ ArrayCast
20 | '(' ~> tabsOrSpaces ~> str("bool") <~
21 opt(str("ean")) <~ tabsOrSpaces <~ ')' ^^
22 BooleanCast
23 | '(' ~> tabsOrSpaces ~> str("unset")
24 <~ tabsOrSpaces <~ ')' ^^ UnsetCast
25 ...
26 }
Overall there are some similarities, but the syntax is vastly different so that a one-by-one translation of the rules is in general on a feasible approach.
Parser side-by-side
The transformation of yacc rules to corresponding scala combinators is more straight-forward, as long as the the underlying lexical analyzers have a similar set of tokens. Let’s take a look at some off PHP’s yacc/bison code and the corresponding parts in JBJ:
...
interface_extends_list:
/* empty */
| T_EXTENDS interface_list
;
implements_list:
/* empty */
| T_IMPLEMENTS interface_list
;
interface_list:
fully_qualified_class_name {...}
| interface_list ',' fully_qualified_class_name {...}
;
foreach_optional_arg:
/* empty */ {...}
| T_DOUBLE_ARROW foreach_variable {...}
;
foreach_variable:
variable {...}
| '&' variable {...}
| T_LIST '(' {...} assignment_list ')' {...}
;
for_statement:
statement
| ':' inner_statement_list T_ENDFOR ';'
;
foreach_statement:
statement
| ':' inner_statement_list T_ENDFOREACH ';'
;
...
1 class JbjParser extends Parsers with PackratParsers {
2 ...
3 lazy val interfaceExtendsList =
4 opt("extends" ~> interfaceList) ^^ { ... }
5
6 lazy val implementsList =
7 opt("implements" ~> interfaceList) ^^ { ... }
8
9 lazy val interfaceList =
10 rep1sep(fullyQualifiedClassName, ",")
11
12 lazy val foreachOptionalArg =
13 opt("=>" ~> foreachVariable)
14
15 lazy val foreachVariable =
16 variable ^^ { ... } |
17 "&" ~> variable ^^ { ... } |
18 "list" ~> "(" ~> assignmentList <~ ")" ^^ { ... }
19
20 lazy val forStatement =
21 ":" ~> innerStatementList <~ "endfor" <~ ";" |
22 statement ^^ (List(_)) |
23 ";" ^^^ Nil
24
25 lazy val foreachStatement =
26 ":" ~> innerStatementList <~ "endforeach" <~ ";" |
27 statement ^^ (List(_)) |
28 ";" ^^^ Nil
29 ...
Most of the differences are based on the fact that many of the yacc rules are expressed recursively, which should be converted to a more explicit form when using scala combinators.
It should be pointed out though, that some parts of the parser - especially those that handle operator precedence - are vastly different from the original bison code. In bison operator precedence can be configured/defined at a global level, white scala combinators require a more explicit implementation.
Somewhat unfair statistics
The technologies are vastly different, nevertheless one can do line counting:
Lexical analyzer
PHP flex | JBJ scala combinators | |
---|---|---|
Files | 1 | 10 |
Lines | 2442 | 493 |
Syntactic parser
PHP bison | JBJ scala combinators | |
---|---|---|
Files | 1 | 1 |
Lines | 1283 | 714 |