BED-Con talk The Parser Scala combinators

Scala combinators

Scala combinators are part of the scala-library (scala.util.parsing.combinator) and offer a new way to write lexical analyzers and parsers. In contrast to to the classic approach a lexer is supposed to be just another kind of parser that can be written with the same tool set as the parser itself.

Very simplistic overview of the interface

 1 package scala.util.parsing.combinator
 2 
 3 trait Parsers {
 4     type Elem
 5 
 6     trait Parser {
 7         def apply(input: Reader[Elem]) : ParseResult[T]
 8 
 9         ...
10     }
11 
12     sealed abstract class ParseResult[+T]
13 
14     case class Success[+T](...) extends ParseResult[T]
15 
16     sealed abstract class NoSuccess(...)
17 
18     case class Failure(...) extends NoSuccess(...)
19 
20     case class Error(...) extends NoSuccess(...)
21 
22     ...
23 }
 1 abstract class Reader[+T] {
 2   def first: T
 3 
 4   def rest: Reader[T]
 5 
 6   def pos: Position
 7 
 8   def atEnd: Boolean
 9 
10   ...  
11 }

For simple parsers it usually suffices to use Parsers.Elem = Char. Though it is also possible to use the combinator framework in a more classic a approach by implementing a lexical analyzer with Parsers.Elem = Char and ParseResult[Token] beneath a syntactical parser with Parsers.Elem = Token and ParseResult[ASTNode].

As a starting point the framework already contains pre-defined base classes that support Java-like languages:

  • scala.util.parsing.combinator.syntactical.StdTokenParsers which is using
    • scala.util.parsing.combinator.lexical.StdLexical as lexer which is producing tokens defined in
    • scala.util.parsing.combinator.token.StdToken

This should be discussed in more detail with the classic Calculator example.

Unluckily these classes do not contain a feasible pattern for different lexer modes, which important to parser PHP: Missing lexer modes