How to break the stream into tokens

Assignment Help Computer Engineering
Reference no: EM131210976

Programming Assignment

Program Specification

1. Your program must read 8-bit ASCII strings from standard input -- for instance, using the cin object in C++, or stdin in C. You must consume all input from standard input.

2. Your must tokenize the entire input stream, using the lexical specification below to dictate how to break the stream into tokens.

Lexical Specification:

 

-> [a-zA-Z]

 

letter

 

digit

-> [0-9]

 

newline

-> \n

 

for

-> for

 

while

-> while

 

if

-> if

 

else

-> else

 

identifier

-> (letter|_)(letter|digit|_)*

 

integer

-> digit+

 

float

-> (digit+\.digit*)|(\.digit+)

 

string

-> "[^"\n]*"

 

whitespace

-> (' ' | \t)+

 

comment

-> #.*

 

operator

->  '!' | '%' | '&' | '|' | '+' | '-' |

 

'*' ','

| '/' | '{' | '}' | '[' | ']' | ';' |

| '<' | '>' | '=' | '<=' | '>=' | '!='
| ':='

3. You must then output tokens to standard output (e.g., via cout in C++ or stdout in C), using the token information chart below to dictate what information you print about each token. Note that the lexical pattern, when italicized, refers to the pattern from the lexical specification above.

1. If the input contains unterminated strings, then instead of generating a string token, generate a single ERR2 token. The position indicator for the token should correspond to the beginning quote starting the string. Consume all input up to (but not including) the first newline. If there is no next newline, consume all remaining input. The length associated with the token should be reported appropriately.

2. The alphabet for this assignment consists of the following:
a. ASCII 0x09 and 0x0a (tab and newline)
b. ASCII 0x20 through 0x7e (all printable ASCII characters).

3. If you are in the middle of processing a string (meaning, you have seen the opening quote but not the end quote), and you then see a character that is not part of the alphabet, treat the bad character as if it were a newline for the sake of processing the string. That is, generate an ERR2 token for an unterminated string, and have the token end right before the bad character (meaning tokenization should resume at the bad character).

4. When you see one or more consecutive characters that are not in the alphabet, group them together and generate an ERR3 token. The length associated with the token should be the number of consecutive characters that are not in our alphabet. Resume tokenizing as normal after the bad characters.

5. When outputting a token, your output must consist of the following, in order:
a. "TID:", with no spaces (or other characters) proceeding. All letters shall be output as capital ASCII letters.
b. The colon may optionally be followed by spaces.
c. The Token ID of the token you are outputting. Token IDs must start at 1 and increase by 1 for each token of the input.
d. A single comma (note the token ID shall NOT be followed by spaces)
e. The comma may optionally be followed by spaces
f. "TYPE:". All letters shall be output as capital ASCII letters.
g. The colon may optionally be followed by spaces.
h. An integer representing the Numeric Type of the token.
i. The integer must only be followed by a left parenthesis- "(", meaning no spaces before the "(".
j. The left parenthesis must be followed by the "English Type" of the token (as indicated in the table above - case sensitive!), with no spaces preceding.
k. The English Type must be followed by a right parenthesis and comma- "),", meaning no spaces before the "),"
l. The comma may optionally be followed by spaces.
m. "POS:". All letters shall be output as capital ASCII letters.
n. The colon may be optionally followed by spaces.
o. An integer representing the position of the first character in the original input that led to the token match. The position is numbered from 0, and represents the number of 8-bit ASCII characters in the input stream that precede the character in question.
p. The integer must be followed by a single comma, with NO spaces (or other characters) in between.
q. The comma may optionally be followed by spaces.
r. "LEN:". All letters shall be output as capital ASCII letters.
s. An integer representing the number of bytes matched in the current token.
t. If the chart above indicates "None" in the "Value to output" column, then print a single newline (ASCII 0x0a - ‘\n'). The newline may optionally be preceded by spaces. IF THERE IS NO VALUE TO OUTPUT, YOU ARE DONE PRINTING THIS TOKEN.
u. The rest of the bullets are for when you are outputting a value only.
v. Output a comma (with NO preceding spaces), optionally followed by spaces, followed by "VALUE:", optionally followed by spaces.
w. Output the value, per the "Value to output" column above. All items that have a value should be a simple copy of the input, except for strings, which should omit the quotation marks around the string.
x. Print a single newline (ASCII 0x0a - ‘\n'). The newline may optionally be preceded by spaces.
6. Your program must take an optional command-line argument that dictates which tokens get output.
a. If no command line argument is given, then you must output all tokens in the token stream.
b. If the command line argument is a 0, you must also output all tokens in the token stream.
c. If the command line argument is a 1, you must output all tokens EXCEPT comments, whitespace, errors and newlines.
d. If the command line argument is a 2, you must output ONLY tokens for comments, whitespace, errors and newlines.
e. If the command line consists of anything else other than the above four options, then you should IGNORE all input from stdin, assume the input length is 0, and populate the token stream with only a single token of type ERR1, which you will then output, per below.

7. After reading in the entire input and generating tokens, output all tokens per the above specification. When you are done outputting all tokens you are supposed to output, then output the following:
a. An additional newline (creating a blank line)
b. The string "Totals:" (case sensitive, as with all strings in this assignment)
c. Optional space(s)
d. The string "len"
e. Optional space(s)
f. An equals sign
g. Optional space(s)
h. An integer indicating the length of the input stream (always 0 with ERR1, remember!)
i. A comma
j. Optional space(s)

k. The string "tokens"
l. Optional space(s)
m. An equals sign
n. Optional space(s)
o. An integer indicating the number of tokens in the token stream.
p. A comma
q. Optional space(s)
r. The string "printed"
s. Optional space(s)
t. An equals sign
u. Optional space(s)
v. An integer indicating the number of tokens you OUTPUT
w. Optional space(s)
x. A single newline.
8. After you finish outputting, your program must exit.
2. Other Requirements
You will receive a 0 on this if any of these requirements are not met!

9. The assignment is due on February 13 at 8am Eastern time. Late assignments will lose one letter grade per 24 hours.

10. The program must be written entirely in C or C++

11. You must submit a single source code file, unless you choose to use multiple files, in which case you must submit a single ZIP file, and nothing else.

12. If submitting a ZIP file, when the file unzips, your source files must unzip into the same directory (including any header files you need).

13. If submitting a ZIP file, there must not be ANY other files contained within the ZIP file. Again, you will get a 0 if there are.

14. If your program is written in C, it must compile ON MY REFERENCE ENVIRONMENT into an executable with the following command line: cc *.c -o assignment1

15. If your program is written in C, it must compile ON MY REFERENCE ENVIRONMENT into an executable with the following command line: c++ *.cpp - o assignment1

16. Your program should print nothing to stderr under any circumstances.

17. Your program's output will be tested in the reference environment only. Even if it works on your desktop, if it doesn't work in the reference environment, you will get a 0. With C and C++ this is a common occurrence due to memory errors, so be sure to test in the reference environment!

18. You must submit the homework through the course website, unless otherwise pre-approved by the professor.

19. You may not give or receive any help from other people on this assignment.

20. You may NOT use code from any other program, no matter who authored it.

3. Test Cases

Below are six sample test cases for you, which I will use in my testing. Typically, I use anywhere from 20-50 test cases (generally more than fewer). I will definitely use the below cases. I strongly recommend you create your own test harness and come up with a large number of test cases to help you get the best possible grade.

Attachment:- Programming Assignment.rar

Reference no: EM131210976

Questions Cloud

Distributed with a standard deviation : A fire department aims to respond to fire calls in 4 minutes or less, on average. Response times are normally distributed with a standard deviation of 1 minute. Would a sample of 18 fire calls with a mean response time of 4 minutes 30 seconds provi..
Find another very important number in mathematics : Do some research on either one of these two numbers, or, better yet, find another very important number in mathematics.
Confidence interval for the mean price : Assume the population standard deviation is σ+28 and that the population is approximately normal. Construct a 95% confidence interval for the mean price for all the TI-89's being sold over the internet. Select one: A. (145.5, 159.2) B. (97.5, 207...
What is the labor cost per unit of wheat output : What is the labor cost per unit of wheat output? Per unit of cloth? What is the rental cost per unit of wheat? Per unit of cloth?
How to break the stream into tokens : Your must tokenize the entire input stream, using the lexical specification below to dictate how to break the stream into tokens.
City oppose a downtown casino : Input According to a survey, 59% of the residents of a city oppose a downtown casino. Of these 59% about 8out of 10 strongly oppose the casino. Complete parts (a) through (c).
Find the month and year of the last payment : Assume that the first payment is made in January of the current year. Find the month and year of the last payment. Find the date of the first month when the amount applied to the principal exceeds the amount of interest paid.
Mean and standard deviation for the time to complete : a. Compute the mean and standard deviation for the time to complete calls to English-speaking service representatives. b. Compute the mean and standard deviation for the time to complete calls to Spanish-speaking service representatives.
Guideline that fax transmissions : In order to save costs, a firm issued a guideline that fax transmissions of 10 pages or more should be sent by 2-day mail instead. Exceptions are allowed but they want the average to be 10 or below.

Reviews

Write a Review

Computer Engineering Questions & Answers

  Make a powerpoint presentation to share with the owner

A twenty year old company, SewWorld, comprised of six locations in three states, sells sewing machines, sewing related software, and accessories.

  Create component classes as necessary to use together

Use either the array template created in an earlier lab to handle the array or you may use the vector class from the STL to handle the array of student record objects.

  Advantages of frame delay-atm-ethernet mans

On the basis of the current offerings for frame delay, ATM, and Ethernet MANs, explain the relative advantages of each.

  The readings for this week mention six stages of the

the readings for this week mention six stages of the systems development life cycle. there are other models however

  Suppose that you are the database developer for a local

suppose that you are the database developer for a local college. the chief information officer has asked you to

  Write down an automated checkout program

Write down an automated checkout program

  Write a program that reads a sequence of input values

Write a program that reads a sequence of input values and displays a bar chart of the values in data.

  Express the definition of function print of the class two

express the definition of function print of the class two.

  What are the capabilities of odd parity

How many bits are needed to uniquely represent a code that includes.

  Give some suggestions for resolving those issues

define some of the issues surrounding widespread access to some of the multiple health knowledge sources available on the internet.

  Develop various features of application software

You will also develop various features of application software that you will use in the day to day running of Better Buy Bespoke.  You will use your expertise to create macros that will automate features and functions of the application software

  Compare the role and impact of a computing technology

information on understanding an inner workings of digital downloads and digital compression. I need to follow the outline below. I'm running out of information. I need to compare the role and impact of a computing technology on society.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd