Reference no: EM133654621
Design and Implementation of Programming Languages
Project
The second project involves modifying the syntactic analyzer for the attached compiler by adding to the existing grammar. The full grammar of the language is shown below. The highlighted portions of the grammar show what you must either modify or add to the existing grammar.
function:
function_header {variable} body
function_header:
FUNCTION IDENTIFIER [parameters] RETURNS type ;
variable:
IDENTIFIER : type IS statement ; | IDENTIFIER : LIST OF type IS list ;
list:
( expression {, expression} )
parameters:
parameter {, parameter}
parameter:
IDENTIFIER : type
type:
INTEGER | REAL | CHARACTER
body:
BEGIN statement END ;
statement: expression ; |
WHEN condition , expression : expression ; |
SWITCH expression IS {case} OTHERS ARROW statement ENDSWITCH ; | IF condition THEN statement {ELSIF condition THEN statement}
ELSE statement ENDIF ; |
FOLD direction operator list_choice ENDFOLD ;
case:
CASE INT_LITERAL ARROW statement
direction:
LEFT | RIGHT
operator:
ADDOP | MULOP
list_choice:
list | IDENTIFIER
condition:
expression RELOP expression |
condition logical_operator condition | ( condition ) |
NOTOP condition
logical_operator:
ANDOP | OROP
expression:
( expression ) |
expression arithmetic_operator expression | NEGOP expression |
INT_LITERAL | REAL_LITERAL | CHAR_LITERAL |
IDENTIFIER ( expression ) | IDENTIFIER
arithmetic_operator: ADDOP | MULOP | MODOP | EXPOP
In the above grammar, the red symbols are nonterminals, the blue symbols are terminals and the black punctuation are EBNF metasymbols. The braces denote repetition 0 or more times and the brackets denote optional.
You must rewrite the grammar to eliminate the EBNF brace and bracket metasymbols and to incorporate the significance of parentheses, operator precedence and associativity for all operators. The precedence and associativity rules are as follows:
Among binary arithmetic operators the exponentiation operator has highest precedence followed by the multiplying operators and then the adding operators. But the unary negation operator ~ has higher precedence that all of the binary operators.
All relational operators have the same precedence.
Among the binary logical operators, & has higher precedence than |. But the unary logical operator ! has higher precedence than either of the binary logical operators.
All binary operators except the exponentiation operator are left associative.
The directives to specify precedence and associativity, such as %prec and %left, may not be used.
Your parser should be able to correctly parse any syntactically correct program without any problem.
You must modify the syntactic analyzer to detect and recover from additional syntax errors using the semicolon as the synchronization token. To accomplish detecting additional errors an error production must be added to the function body, to the variable declaration and to the when clause.
Your bison input file should not produce any shift/reduce or reduce/reduce errors. Eliminating them can be difficult so the best strategy is not introduce any. That is best achieved by making small incremental additions to the grammar and ensuring that no addition introduces any such errors.
An example of compilation listing output containing syntax errors is shown below:
1 -- Multiple errors 2
function main a integer returns real; Syntax Error, Unexpected INTEGER, expecting ':'
b: integer is * 2; Syntax Error, Unexpected MULOP
c: real is 6.0;
begin
if a > c then
8 b + / .4;
Syntax Error, Unexpected MULOP
else
case b is
when => 2;
Syntax Error, Unexpected ARROW, expecting INT_LITERAL
when 2 => c;
endcase;
Syntax Error, Unexpected ENDCASE, expecting OTHERS or WHEN
endif;
end;
Lexical Errors 0
Syntax Errors 5
Semantic Errors 0
You are to submit two files.
The first is a .zip file that contains all the source code for the project. The .zip file should contain the flex input file, which should be a .l file, the bison file, which should be a .y file, all .cc and .h files and a makefile that builds the project.
The second is a Word document (PDF or RTF is also acceptable) that contains the documentation for the project, which should include the following:
A discussion of how you approached the project
A test plan that includes test cases that you have created indicating what aspects of the program each one is testing and a screen shot of your compiler run on that test case
A discussion of lessons learned from the project and any improvements that could be made
Here is the recommended approach for project 2.
First build the skeleton for project 2, run it and be sure that you understand how it works.
Replace the lexical analyzer in the project 2 skeleton with your lexical analyzer from project
1. Be sure to remove the main method from your scanner.l. Also replace the listing.h and listing.cc files in the skeleton with the ones from your version of project 1. The add the necessary %token declarations for the new tokens. Note that the tokens.h from project 1 is no longer needed. It will be generated by bison from the %token declarations. Verify that the project builds correctly at this point. Then confirm that test cases test1.txt - test4.txt that were provided as test cases for the skeleton code parse properly and that the test case syntax1.txt produces a syntax error.
The simplest modification to make first is to modify the grammar to include variables and literals of the real data type. You will need to modify the type and primary productions. Use test5.txt to test this modification. Shown below is the output that should result when using that test case as input:
$ ./compile < test5.txt
1 // Function with a Real and Character Variables and Literals 2
function main returns character;
a: real is 7.8e-1;
begin
6 when a < .45, '\n' : 'A'; end;
Compiled Successfully
Next, add the if statement to the statement production in the grammar. The if statement allows zero or more elsif clauses represented in EBNF using braces. The grammar in the specification defines the language but is not in the form needed for bison. It contains EBNF symbols, that must be removed. In that part of the grammar, the EBNF braces are used to indicate that a if statement contains zero or more elsif clauses. Because bison does not support these EBNF symbols, a recursive production must be used instead. Study how the recursive cases production in the skeleton defines zero or more case statements. You should use a similar approach.
Be sure that you understand the difference between the meaning of ; and ';' in the bison input file. The former is the symbol that terminates a production. The latter represents the semicolon symbol in the target language as does the ; in the project specification. It is also important to distinguish between the statement and the statement_ productions that were provided in the skeleton code. The latter incorporates the semicolon that ends the if statement in the target language and also provides recovery should an error occur within a statement. Use test6.txt to test this modification. Shown below is the output that should result when using that test case as input:
$ ./compile < test6.txt
1 // Function with an If Statement 2
3 function main returns integer;
4 a: integer is #A;
5 begin
6 if a < 10 then
7 1;
8 elsif a < 20 then
9 2;
10 elsif a < 30 then
11 3;
12 else
13 4;
endif;
end;
Compiled Successful
The fold statement would be best implemented next. Note that it will require two subordinate production direction and operator. Use test7.txt to test this modification. Shown below is the output that should result when using that test case as input:
$ ./compile < test7.txt
1 // Left and Right Fold Statement 2
3
function main returns integer;
a: list of integer is (1, 2, 3);
begin
switch a(1) is
case 1 =>
fold right - (2,3, 4) endfold;
case 2 =>
fold left + a endfold;
others =>
13 0;
endswitch;
end;
Compiled Successfully
The provided skeleton allows an optional, 0 or 1, variable declaration. The requirements state that 0 or more should be permitted. This modification again requires replacing the EBNF braces with a recursive production. Use test8.txt to test this modification. Shown below is the output that should result when using that test case as input:
$ ./compile < test8.txt
1 // Multiple Integer Variable Initializations 2
function main returns integer;
b: integer is 5 + 1 - 4;
c: integer is 2 + 3;
begin
7 b + 1 - c;
8 end;
Compiled Successfully
The next feature to add is 0 or more parameter declarations in the function header. Because the parameters require comma separators, notice that the grammar is written to indicate that the parameters are optional because the parameters production must contain at least one. Study how optional elements are included in the grammar by noticing how an optional variable declaration was implemented in the skeleton. Also study how the elements production is implemented. Like the parameters, it involves comma separators. Two test cases are provided to test this change. Before using either of them, use one of the early test cases to confirm that your parser still allows no parameters. Then proceed to use, test9.txt, to verify that program with one parameter declaration parses correctly. Shown below is the output that should result when using that test case as input:
$ ./compile < test9.txt
1 // Single Parameter Declaration 2
3 function main a: integer returns integer;
4 begin
5 a + 1;
6 end;
Compiled Successfully
The next test case, test10.txt contains two parameter declarations. Shown below is the output that should result when using that test case as input:
$ ./compile < test10.txt
1 // Two Parameter Declarations 2
function main a: integer, b: real returns real;
c: real is .7;
begin
a + b * c;
end;
Compiled Successfully
The additional binary arithmetic operators for remainder and exponentiation and the unary minus operator should be added next. Notice how the existing operators are implemented in the skeleton code. There must be one production for each level of precedence. Because the remainder operator has the same precedence as the multiplying operators, it should be included as another right-hand side to the existing production. But because the exponentiation operator has higher precedence a production must be introduced. Because that operator is right associative, its production must be right recursive. Because the unary minus operator has higher precedence than all the binary arithmetic operators. In the Course Resources area of the classroom you will find Module 1 from CMSC 330. You may want to read section II-B if you
need a better understanding of how to construct a grammar so that it produces the correct parse tree to account for precedence and associativity.
Then proceed to use, test11.txt, to verify that program containing every arithmetic operator parses correctly. Shown below is the output that should result when using that test case as input:
$ ./compile < test11.txt
1 // Arithmetic Operators 2
3 function main returns integer;
4 begin
5 9 + ~2 - (5 - 1) / 2 % 3 * 3 ^ 1 ^ 2;
6 end;
Compiled Successfully
The additional logical operators, which include the | and ! operators should be added next. Each has a separate precedence level. The | operator has the lowest precedence of all operators and ! has the highest. So, two new productions must be added to the grammar. Shown below is the output that should result when using that test case as input:
$ ./compile < test12.txt
1 // Relational and Logical Operators 2
3 function main returns integer;
4 begin
5 when 5 > 8 & 3 = 3 | 9 < 1 & !(3 <> 7) | 6 <= 7 & 3 >= 9, 1 : 0;
6 end;
Compiled Successfully
At this point your parser should contain the complete grammar for the language. As a final check, use test cases test13.txt, which contains a nested if together with most elements of the language and test14.txt, which contains a nested switch statement. Shown below is the output that should result when using test13.txt as input:
$ ./compile < test13.txt
1 // Comprehensive Test with Nested If 2
3 function main a: integer, b: character, c: real returns real;
4 d: integer is #8e;
5 e: real is 3.75;
6 f: character is 'A';
7 g: list of integer is (1, 3, 5);
8 begin
9 if ~a > 5 & a < 1 & !(c = 5.8 | c <> .7E4) then
10 if c >= 7.5E-2 & c <= 5.2 then
11 when a >= d, a + 2 - 7.9E+2 / 9 * 4 : 8.9;
12 elsif g(1) = a ^ 2 % 3 then
13 a % 2 - 5 / c;
14 else
15 fold left + (1, 2, 3) endfold;
16 endif;
17 else
18 fold right - g endfold;
19 endif;
20 end;
Compiled Successfully
Shown below is the output that should result when using test14.txt as input:
$ ./compile < test14.txt
1 // Comprehensive Test with Nested Switch 2
function main a: integer, b: real returns real;
c: integer is 8;
d: real is .7E2;
begin
switch a is
8 case 1 => a * 2 / d ^ 2;
9 case 2 => a + 5.3E+2 - b;
case 3 =>
switch d is
12 case 1 => a % 2;
13 others => 9.1E-1;
14 endswitch;
15 case 4 => a / 2 - c;
others => a + 4.7 * b;
endswitch;
end;
Compiled Successfully
At this point, you are ready to implement the error recovery portion of the program. Verify first that test case syntax1.txt that was include with the project 2 test data produces the correct output. Then modify the grammar so that errors in the function header will be properly recovered from. You will need to introduce an additional production that includes the error token. You should use the statement_ production in the skeleton code as a model. Once you have implemented the error recovery production for the function header, use test case syntax2.txt to test it. Shown below is the output that should result when using syntax2.txt as input:
$ ./compile < syntax2.txt
1 // Error in Function Header, Missing Colon 2
function main a integer returns integer; Syntax Error, Unexpected INTEGER, expecting ':'
b: integer is 3 * 2;
begin
if a <= 0 then
7 b + 3;
8 else
9 b * 4;
endif;
end;
Lexical Errors 0
Syntax Errors 1
Semantic Errors 0
The fact that it continued to parse the remainder of the program is confirmation that it has been properly implemented.
Next implement error recovery in variable declarations. As before, an additional production is required. Once you have implemented the error recovery production for variable declarations, use test case syntax3.txt to test it. Shown below is the output that should result when using that test case as input:
$ ./compile < syntax3.txt
1 // Error in Variable Declaration 2
function main a: integer returns integer;
b integer is
Syntax Error, Unexpected INTEGER, expecting ':'
if a > 5 then
6 a * 3;
7 else
8 2 + a;
endif;
c: real is 3.5;
begin
if a <= 0 then
13 b + 3;
14 else
15 b * 4;
endif;
end;
Lexical Errors 0
Syntax Errors 1
Semantic Errors 0
Confirmation that error recovery has occurred is that no extraneous error messages are generated after the variable declaration.
The final addition to the grammar to enhance error recovery is to an error recovery to the case clause of the switch statement. As before, an additional production is required. Use test case syntax4.txt to test it. Shown below is the output that should result when using that test case as input:
$ ./compile < syntax4.txt
1 // Multiple Errors, Error in Case Clause and Missing Others Clause 2
function main a: integer returns integer;
begin
switch a is
case 1 => a 2;
Syntax Error, Unexpected INT_LITERAL, expecting ';'
case 2 => 5;
endswitch;
Syntax Error, Unexpected ENDSWITCH, expecting CASE or OTHERS
end;
Lexical Errors 0
Syntax Errors 2
Semantic Errors 0
The fact that the second error, the one indicating that the others clause is missing, confirms that recovery from the first error message has occurred.
As a final test use test case syntax5.txt, which contains a variety of syntax errors. Shown below is the output that should result when using that test case as input:
$ ./compile < syntax5.txt
1 // Multiple Errors 2
function main a integer returns real; Syntax Error, Unexpected INTEGER, expecting ':'
b: integer is * 2; Syntax Error, Unexpected MULOP
c: real is 6.0;
begin
if a > c then
8 b + / 4.7;
Syntax Error, Unexpected MULOP
else
switch b is
case => 2;
Syntax Error, Unexpected ARROW, expecting INT_LITERAL
case 2 => c;
endswitch;
Syntax Error, Unexpected ENDSWITCH, expecting CASE or OTHERS
endif;
end;
Lexical Errors 0
Syntax Errors 5
Semantic Errors 0
If you have implemented all the required error recovery, your compiler should detect all five syntax errors.
All of the test cases discussed above are included in the attached .zip file.
You are certainly encouraged to create any other test cases that you wish to incorporate in your test plan. Keep in mind that your parser should parse all syntactically correct programs, so it is recommended that you choose some different test cases as a part of your test plan. Your instructor may use a comparable but different set of test cases when testing your project.