Reference no: EM131215005
Problem Description
The previous assignment assumes its input includes spaces between every token in an expression, but many programmers tend not to use so many spaces, if the expression is unambiguous. For example, both of the following expressions would be considered to be equivalent:
1 + 2 * 3 1 + 2*3
If the expression on the right was provided as input to the first homework assignment, the lack of spaces would cause the "2*3" to be interpreted as a single token, so it would not perform any multiplication, and might not make any sense of the token at all.
Of course, it is a simple concept to imagine adding functionality to take a full character string, and then to create a new version that has all the spaces included. However, that requires both the time to create that new version, but also the memory to hold it. Any programmer intending for maximal efficiency and minimal time should like to eliminate that extra step.
In fact, the previous homework itself had an intermediate step creating a new variable -- it took a single character string, and created a list of shorter character strings, and then used a subscript variable to visit each element of that list.
This goal of this assignment is to eliminate all of those middle steps and to interpret the numeric expression directly from the original character string, without taking any time allocating new variables.
Identifying Tokens
The first phase of this project is to take the input string and divide it up into tokens (individual numbers and operators). The goal here is to let a caller ask for each token in turn and to supply it on demand.
The simplest approach is one presented in a recitation assignment that only requires implementing a single function -- a generator that yields one token at a time as it finds it in the input string.
Here is an intentionally incomplete illustration from the instructor's solution, as it stands at this time of writing:
(in newsplit.py)
def new_split_iter( expr )
"""divide a character string into individual tokens, which need not be separated by spaces (but can be!)
also, the results are returned in a manner similar to iterator instead of a new data structure
"""
expr = expr + ";" # append new symbol to mark end of data, for simplicity
pos = 0 # begin at first character position in the list
while expr[pos] != ";" # repeat until the end of the input is found
---- to be filled in by the student, using yield as necessary
yield ";" # inform client of end of data
You may identify whether an individual character is a numeric digit via expr[pos].isdigit() There are similar funtions isalpha for letters and isalnum for alphanumerics (letters or digits).
Where Python iterators fall short
To qualify as an iterator in Python, it is sufficient to support two operations:
-- get started (usually through a call to iter())
-- obtain the next element, (through an explicit or implicit call for next())
Unfortunately, this is insufficient for the parser in this assignment. Consider the example "1 + 2 * 3". The function that parses products will use the 'next' operation to obtain the plus sign, and determine that it is not a multiplication operator, returning control to the function that parses sums. But if that function were to ask for the 'next' symbol, it would move forward to the 2 and not see the plus sign. One could consider having every function pass its last scanned symbol to its caller, but that would likely make the interface look very clumsy.
The better solution, in the instructor's opinion, is to extend the definition of the iterator to allow it to examine data without moving forward. In fact, other programmers believe the same thing -- there is a proposed package online called 'itertools' that wishes to add a great deal more functionality to the default language's iteration. In fact, the instructor's solution was based partly on seeing the proposed interface to the more powerful iterator (but without seeing any of the implementation).
The file peekable.py is provided free for this course and for student use within the course. It is expected to never be modified by the students or submitted in solutions. This example follows the second model from the recitation for an iterator, but can its functionality to any other variety of iterator.
Assignment Specifications
The student submission will is to consist of two files:
-- A completed newsplit.py based on what was given above, defining a function new_split_iter that accepts a character string, and continually uses the yield statement to return the individual tokens as character strings.
-- A file named infix2.py that very much resembles the infix1.py from Homework 1. The primary difference is that the function parameters will no longer consist of a list and an integer subscript, but will instead receive all data through an iterator.
Here appear a very few select lines from the instructor's solution, as it appears at the time of this writing:
# import peek functionality for iterators
# and maybe the splitter, if you need it
from peekable import Peekable, peek
from newsplit import new_split_iter
def eval_infix_sum(iterator)
"""evaluate sum expression (zero or more additions or subtractions), using an iterator"""
..... code that no longer uses array subscripting
def eval_infix_iter(iterator)
"""evaluate an expression, given an iterator to its tokens"""
return eval_infix_sum(Peekable(iterator))
def eval_infix(expr)
"""accept a character string, split it into tokens, then evaluate"""
return eval_infix_iter(new_split_iter(expr))
Specification Details
You may assume:
• Correctly formed expressions consisting of integers and operators
• All digits within an integer are adjacent ("12" is valid, but "1 2" is not)
• All operators are single characters (accepting "//" is not required)
• There will be no division by zero operation in given expressions
You may not assume:
• anything about how many spaces appear between tokens (may be 0, 1, or more)
• anything about how many space characters appear at beginning and end of input
• anything about how many tokens may appear within a given input string
• any assumptions disallowed in the previous assignment about expressions
Unit Testing
Another feature of the Python language that has no equivalent in C++ is the ability to embed test code within each file, which may be used to test the functionality of that file. It includes a conditional test that determines whether that file is being run by itself (for testing) or is being used in a larger project.
Here is a portion of the instructor's infix2.py, following the code shown above:
if __name__ == "__main__":
print ( eval_infix("5 ") )
print ( eval_infix("15 ") )
print ( eval_infix( " 2 * 3 + 1 " ) )
print ( eval_infix( " 2 + 3 * 1" ) )
If the Python environment is told to run infix2.py directly, it will attempt these print statements. On the other hand, the Python environment is told to run some other file, this code is skipped.
Along the same lines, here is a test statement in the instructor's newsplit.py testing the iterator's results:
print (list( new_split_iter( "3+4 * 5" )))