Lexical analysis of a Tiger subset

Description

Construct a lexer for a subset of tiger that includes the following

exp:
    lvalue
    int
    string
    exp binop exp
    lvalue := exp
    id ( explist )
    ( expseq )
    let declist in expseq end
    if exp then exp
    if exp then exp else exp

lvalue:
    # These are simple variables
    id

binop:
    + - * / = <> < > <= >= & |

declist:
    dec
    dec declist

dec: 
    type id = id
    var id : id := exp
    function id ( typefields ) : id = exp

expseq:
    exp
    expseq ; exp

explist:
    exp
    explist , exp

typefields:
    typefield
    typefield , typefields

typefield:
    id : id

And comments delimited by /* and */.

NOTE: the only valid typeid in our language are aliases of the built in "int" and "string" types. We do not have arrays or structures or other compound types.

Your assignment is to use the sample code in chapter 2 to implement (in tiger.lex) a lexer for this subset of tiger. The full description of tiger is in the Appendix A of the book, and you can get a short description of Tiger. Remember, you don't have to implement the whole language, just what I describe above.

You need to consider carefully how you will handle strings and comments, and in particular, what to do if unfinished strings or unfinished comments are in the programs you are lexing.

You will definitely need to read Chapter 2 to see examples of building a lexer, and the ML-Lex manual is also useful.

Example code

You need to make some changes to the sources.cm file, you can use one like:

Group is

driver.sml
errormsg.sml
tokens.sig
tokens.sml
tiger.lex
$/smlnj-lib.cm
$/basis.cm

You can parse a sample program using:

- CM.make "sources.cm";
- Parse.parse "test.tig";
  () : unit

Here's a simple program that should lex when you have completed the assignment.

/* factorial */
let
  function fac(n : int) : int =
    if n <= 1 then
        1
    else
        n * fac(n - 1)
in
    fac(5)
end