Reference no: EM13793
Scanning and Parsing
Implement the lexical and syntactic analysis of Minifun programming language. The scanner splits up the input into catches and tokens lexical errors (inputs that cannot form valid tokens). The parser checks that the input list of tokens conforms to a syntax specified using a context-free grammar. It is convenient to convert the initial parse tree, which follows the context-free grammar used for parsing, into a simpler Abstract Syntax Tree. It is recommended that your design follow the above stages, though it is not required, as long as you somehow implement the lexical and syntactic specification of Minifun.
Minifun Language
Minifun is a functional programming language and it is a subset of Scheme language (you can browse to learn about Scheme). It includes variables, functions, integers and basic integer arithmetic, booleans, relacional functions, "cond" conditional statement, and lists. It doesn't include strings, real numbers nor structures.
The Minifun grammar
1. <prog> ::= <s-exp> | <s-expr> <prog>
2. <s-exp> ::= <def> | <exp>
3. <def> ::= (define (<var> <var> ... <var>) <exp>) | (define <var> <exp>)
4. <exp> ::= <var>
| <con>
| (<prm> <exp> ... <exp>)
| (<var> <exp> ... <exp>)
| (cond (<exp> <exp>) ... (<exp> <exp>))
| (cond (<exp> <exp>) ... (else <exp>))
5. <var> ::= See definition below
6. <con> ::= See definition below
7. <prm> ::= + | - | * | / | = | < | > | <= | >=
The category of variables <var>, which are the names of functions and values, follow basically the same spelling rules. They can be made up of upper-case and lower-case letters (yes, case matters!), hyphens and underscores. They cannot contain spaces. Nor can they contain parentheses, curly braces, square brackets, apostrophes, commas, or quotation marks, as these all have special meanings in minifun.
The nonterminal <con> introduces constants: boolean and numeric constants. A number (integer) is a sequence of as many digits (0-9) as you wish. A boolean constant can be either #t (true value) or #f (false value).
A space is necessary to separate one name or number or operation <prm> (see above) from another. They are allowed, but not required, before and after parentheses, square brackets, and curly braces.
For the classification of minifun sentences, we also need three keywords: define, cond, and else. These keywords have no meaning. No keyword may be used as a variable. You must hand in to Blackboard a .zip archive containing your source code. Your main program must be called mfunc. When run with a single argument, a filename, mfunc should process the given file, produce appropriate diagnostic messages on standard error, and exit with one of the following return codes:
0: the input file is lexically/syntactically valid mimp
-1: the input file is not lexically/syntactically valid mimp