Lexical analyzers often treat consecutive whitespace

Assignment Help Computer Engineering

Reference no: EM132706819

Assignment 1 - Compiler Design

1 Regular Expressions

1. Following the POSIX ERE syntax, give a regular expression that matches the date and temperature pairs reasonably similar to the following positive examples. Your expression must not match any of the negative examples.

Positive examples Negative examples

01-12-2000 0K 1-9-2019 -4K

01-09-2019 40.2?C 12-09-2017 22.?C

28-03-2019 50?F 28-13-2019 50 ?F

01-12-2530 -2.33?C 01-12-2017 -5

Notice the following:
• There should be no whitespace before the temperature unit (which can be 'K', '°C' or '°F').
• A Kelvin temperature cannot be negative.
• There should be some whitespace(s) between the date and temperature.
• The day field must have a numerical value between 01 and 31 and the month field between 01 and 12. Years always have four digits.
• A decimal number should have digits before and after the decimal point. Decimal numbers can have at most two digits to the right of the decimal point.

2. Give short answers: Your regular expression may also match some strings that do not correspond to actual dates on a calendar. What additional checks are needed to filter out those?

Lexical Analysis

Function definitions in Erlang, which is a concurrent functional programming language, look like this:

decrease_to_zero(Argument1) ->
case Argument1 > 0 of
true ->
LocalVariable = Argument1 - 1, decrease_to_zero(LocalVariable);
false ->
<<"zero?", Argument1>>
end.

Most elements here are similar to those in most programming languages. More specifically this code snippet contains integers (0, and 1), strings ("zero?"), arithmetic operators (-), comparison operators (>), other operators (like = which in Erlang denotes pattern matching), keywords of the language (case, of, and end), and other miscellaneous elements of the language ((, ), ,, ->, <<, and >>)2. In addition, this code contains Erlang variables (Argument1 and LocalVariable) and atoms (decrease_to_zero, true, and false).
In Erlang, variables begin with a capital letter and atoms begin with a lowercase letter. Both categories can subsequently contain letters, digits, and underscores.

Erlang's own lexical analyzer returns the following list of tokens for this code. For tokens that only have one element in the corresponding category, we only show the token itself; tokens that are of general categories are shown as pairs where the first element shows the token category and the second element shows the lexeme.

[{id, decrease_to_zero}, '(', {var, Argument1}, ')', '->',
case, {var, Argument1}, '>', {integer, 0}, of,
{boolean, true}, '->',
{var, LocalVariable}, '=', {var, Argument1}, '-', {integer, 1}, ',',
{id, decrease_to_zero}, '(', {var, LocalVariable}, ')', ';',
{boolean, false}, '->',
'<<', {string, "zero?"}, ',', {var, Argument1}, '>>',
end, '.']

Tasks:
1. Write regular expressions that can match Erlang's:
(a) integers
(b) atoms
(c) variables
(d) strings; it should be possible for a string to contain quote characters; use '\' as an escape character4
2. Draw a DFA (or NFA)5 for a lexical analyzer that can recognize all the token categories present in the excerpt. Mark each accepting state with the label of the corresponding token. Your analyzer must not recognize more syntactic elements of the language (e.g., other comparison operators) than those necessary for scanning the code above but should be able to recognize different programs with the same elements.
3. Describe in detail (i.e. including all transitions from the initial state until a token or an error is emitted) how the following strings will be split into tokens by the lexical analyzer you defined in part 2.
(a) >>>=>>>->
(b) >>>->>-
4. What tokens (or errors) will be returned by your lexical analyzer for the following inputs?
(a) case factorial(Input) of 120 -> "Input was 5"; Other -> Other end
(b) OneOfTheseIsATrap = <<"be careful">> < <<65>>.
(c) OneOfTheseIsATrap = <<"be careful">> <<<65>>.
5. Give short answers:
(a) Lexical analyzers often treat consecutive whitespace characters as a single token, or ignore them altogether (emitting no tokens). Give examples where such behaviours are undesirable (one where a lexical analyzer should not ignore whitespace characters and one where it should additionally return separate tokens for each).
(b) Lexical analyzers normally ignore (do not emit tokens for) "comments". Give an example where such behaviour is undesirable.

Attachment:- CSE_assignment 1.rar

Reference no: EM132706819

Questions Cloud

The dia is all-source defense intelligence agency : The DIA is an all-source defense intelligence agency that is designed to prevent strategic surprise and deliver a decision advantage to warfighters

Record the amortization expense for all depreciable capital : Record the amortization expense for all depreciable capital assets for the month of January 2020. Hall $850,000, Straight line 5 years, no residual value

Develop a job analysis for waiter in a restaurant : Develop a Job analysis (JD & JS) for one of the following jobs: Sale-person in a retail shop, Waiter in a restaurant, A job of your own choice.

Majority of intelligence community members : In the last few weeks you have reviewed a majority of Intelligence Community members.

Lexical analyzers often treat consecutive whitespace : Lexical analyzers often treat consecutive whitespace characters as a single token, or ignore them altogether - What tokens (or errors) will be returned by your

What is your proportional ownership in Joy Tours Ltd : What is your proportional ownership in Joy Tours Ltd.? What is the total cash inflow and rate of return on your investment in Joy Tours Ltd.

Which statement describes straight-line depreciation method : Which statement describes the straight-line depreciation method? It records the same amount of depreciation expense in each year of an asset's useful life.

Washington post investigation of top secret america : Compare and contrast the Washington Post's investigation of "Top Secret America" with our previous readings on the Intelligence Community.

What ways is non-state actor different from nation-state : In what ways is a non-state actor different from a nation-state? Please be specific and give examples of each to support your argument.

User Account

All Pages