Reference no: EM131651
The comma-separated values (CSV) file format is a delimited data format, commonly used as a portable representation of database-type data. Files of CSV format have fields separated by the comma character and records separated by newlines. For this programming assignment, you are going to use flex to write a converter that transforms any legal CSV file into an HTML table.
We will use the following rules extracted from RFC 4180 (https://tools.ietf.org/html/rfc4180) to define CSV:
1. Each record is located on a separate line, delimited by a line break (\n). For example:
Name,Birth Date,Career Titles,Highest Ranking\n
Agassi,04/29/1970,60,No. 1\n
2. Within each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and should not be ignored. Empty fields are allowed. The last field in the record must not be followed by a comma. For example:
Name,Birth Date,Career Titles,Highest Ranking\n
Becker,06/16/1981,,No. 112\n
3. Each field may or may not be enclosed in double quotes. If fields are enclosed with double quotes, there is no space between double quotes and commas or line breaks.
For example:
Name,Birth Date,Career Titles,Highest Ranking\n
Agassi,"04/29/1970",60,"No. 1"\n
4. Fields containing line breaks (\n), double quotes, and commas must be enclosed in double-quotes. For example:
Name,Birth Date,Career Titles,Highest Ranking\n
"Agassi, Andre",04/29/1970,60,"No. 1\n
5. A double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"Peter ""Pete"" Sampras",08/12/1971,64,No. 1\n
For this assignment, you will use flex to compose a format converter. For any legal CSV file, your converter should translate it to an HTML file containing a single table. The specific requirements are:
1. Each field in the CSV file corresponds to a single cell of the HTML table, enclosed by <td> and </td>.
2. Each record in the CSV file corresponds to a single row of the HTML table, enclosed by <tr> and </tr>.
3. Double quotes that are used to enclose a field should be eliminated and not appear in the generated HTML table.
4. Any double-quote inside a field should be reserved, but the one preceding it as an escape operator should be eliminated.
5. Any line break (\n) inside a field should be converted to a <br> in the HTML file.
6. If the < or > characters occur as part of the data, they should be translated into the strings < and > respectively. This will prevent data that looks like an HTML tag from acting like one.
7. All other content inside a field should be directly copied into the HTML table.
8. If the field is empty, you should enter the character (non-breaking space) in the HTML table cell. Empty cells do not display neatly in HTML tables.
9. You will need to generate additional tags to complete the HTML table (<table> and </table>). Also, the generated HTML table must have visible borders around each table cell. For example, use <table border=3> to set the table border to be 3 pixels.
10. Other HTML tags (<html></html>, <body></body>) are optional.
11. You can assume all input files are legal CSV files defined as above and there is no need to report any illegal input.
12. The converter should read its input from stdin and write to stdout. This is already the default in flex. For example, you should be able to type %mycsv2html <inputfile >outputfile to dump the generated table into outputfile. You can then check the generated HTML file by opening it by a web browser.
For information on Lex and Yacc, look at the Niemann book which is linked to from the texts page, and there are additional links in a folder dedicated to this topic.