Regex Engine

GitHub

This library implements a simple regular expression engine in Scala.

Implementation

The library uses Scala’s postfix operations to implement basic regular expressions as a DSL.

// matches double zero
val simpleConcatenation: Nfa = "00"

// matches 0 or 1
val zeroOrOne: Nfa = "0" | "1"

// matches 1 zero or more times
val kleene: Nfa = "1"*

// we can also concatenate expressions together by (ab)using the apply method
// matches 0 or 1 followed by at least three 1s
val nfa: Nfa = zeroOrOne("111")(kleene)

The DSL produces an NFA, which can then be compiled into DFA.

val dfa = nfa.compile()

At which point we can obtain a stream of matches from a given input.

// matches == "1111" #:: "01111" #:: "01111"
val matches = dfa.matches("111101111001111")

Matches are greedy i.e. if there are two possible matches from a given point in the input, the longer match will be returned from the stream.

Examples

// exclude Predef to avoid conflicts
import scala.Predef.{augmentString => _}

// matches any sequence of at least one zero
val expression: Nfa = "0"("0"*);

val dfa = expression.compile()

val matches = dfa.matches("101000101100010")

// if isEmpty returns false, we have a matches
val test = !matches.isEmpty // true

// if we only need one match, we can use head
val first = matches.head // "0"

// if we want to use all the matches, we have the full range of Stream methods available
// prints "0" "000" "0" "000" "0" across 5 lines
matches.foreach { println(_) }

// expressions can be arbitrarily complex
val complexExpression: Nfa = ("011"|"10")("1"*)|("00"|"11")("0"("1"*)|"")

Further Reading

For my introduction to regular expressions, click here.

For more information about the process of writing regex-engine, see my series of articles: