RegexEngine


Project maintained by ggilmore Hosted on GitHub Pages — Theme by mattgraham

RegexEngine

Intro:

This is a simple regex engine written in Scala that I made following this article.

Supported Grammar:

(borrowed and modified from Matt Might)

 <regex> ::= <term> '|' <regex>
         ::=  <term>

 <term>  ::= <factor>+

 <factor> ::= <factor> ('*'| '?'|'+')?
          ::= <base> ('*'| '?'|'+')?

 <base>  ::= <char>
         ::=  '\' <char>
         ::= '(' <regex> ')'

Basic Usage:

The RegexParser class is used to parse a string that represents a regex. Create a new RegexParser instance and use its parse() method in order to create a Machine instance that represents that regex.

The MachineRunner object has a method called testInput that takes a Machine and some inputString and returns a tuple of a boolean followed by a sequence of ProgressionWithChar instances. The boolean is true if inputString matches the regex described by the Machine and false otherwise. The sequences of ProgressionWithChars are a record of the sets of states that Machine was evaluating (still considered valid) as it consumed inputString.

The Machine companion object also has a helper function, called toDOTFileFormat that takes a Machine, and returns a string representation of that machine using the DOT graph description language. This graph has nodes as states, unlabeled edges representing transitions that aren't contingent on any character, and labeled edges representing transitions that are contingent on the character that the edge is labeled with.

Example of a graph generated from toDOTFileFormat's output from the regex ((ab|b?)*w), after a single b was consumed. The valid states are highlighted in red:

example

TODO:

-test cases?

-work on visualization of state machine