Understanding of dynamic memory and linked data structures

Assignment Help Other Subject
Reference no: EM133541643 , Length: word count:2000

Foundations of Algorithms

Learning Outcomes

In this project, you will demonstrate your understanding of dynamic memory and linked data structures (Chapter 10) and extend your program design, testing, and debugging skills. You will learn about the problem of language generation and implement a simple algorithm for generating text based on the context provided by input prompts.

Background
The recent success of generative tools has spawned many new applica- tions and debates in society. A generative tool is trained on a massive dataset, for example, pictures or texts. Then, given a new input, re- ferred to as a prompt, patterns in the prompt get matched to the frequent patterns in the model learned by the tool, and the contextual extensions of the recognized pattern get generated.
Artem and Alistair want to design a tool for generating text state- ments from prompts. In the first version of the tool, they decide to learn a frequency prefix automaton from training statements. A sam- ple trained automaton is shown in Figure 1. In the automaton, nodes are annotated with unique identifiers and frequencies with which they were observed during training, and arcs are annotated with statement fragments. For instance, the root of the automaton has identifier 0 and a frequency of 8 (f=8). Given a prompt, for example, "Hi", the tool should identify the context by replaying the prompt starting from the initial node of the automaton, that is, iden- tify that the prompt leads to the node with identifier 9. Then, given the context, the tool should generate the most likely extension of the prompt, that is, the extension that follows the most frequent subsequent nodes. Thus, the tool should generate the statement "Hi#Sir" for the example prompt. Your task is to implement this tool.

Stage 0 - Reading, Analyzing, and Printing Input Data

The first version of your program should read statements from input, construct their frequency prefix automa- ton, and print basic information about the automaton to the output. The first four lines from the listing below correspond to the output your program should generate for the test0.txt input file in Stage 0.

Stage 1 - Process Prompts

The output of Stage 1 of your program should start with the header (line 5 in the listing).

Extend your program from Stage 0 to process the Stage 1 prompts; input lines 10-13 in test0.txt. To process a prompt, it is first replayed on the automaton, and then the continuation of the prompts is generated. The replay of a prompt starts in the initial state and follows the arcs that correspond to the characters in the prompt. While following the arcs, the encountered characters should be printed to stdout. If the entire prompt was replayed, print the ellipses (a series of three dots) to denote the start of text generation. To generate text, one proceeds with the walk from the state reached during the replay to a leaf state by selecting the most frequent following states. If, at some encountered state, two or more next states have the same frequency, the one that is reached via the ASCIIbetically greater label on the arc (the label with the first non-matching character greater in ASCII) should be chosen. Again, the characters encountered along the arcs should be printed to stdout.

For instance, the replay of the input prompt on line 10 in test0.txt leads to state 4 in the automaton in Figure 2e; the states and arcs visited along the replay are highlighted in green. The generation phase then continues the walk from state 4 to state 28; see highlighted in blue in the figure. State 26 is chosen to proceed with the walk from state 4 as it is reached via character "y" with the ASCII code of 121, while label "P" that leads from state 4 to state 5 while having the same frequency (f=1) has a smaller ASCII code of 80. The output that results from processing the prompt on line 10 of the input is shown on line 6 of the output listing.

If the automaton does not support a replay of the entire prompt, the output should be terminated once the first non-supported character is encountered. The replayed characters must be appended by the ellipses in the output, and no generation must be performed; see the output on line 7 in the listing for the input prompt on line 11 in test0.txt. Every output triggered by an input prompt, including the replay, ellipses, and the generated characters, should be truncated to 37 characters; see example in the output of the test1.txt input file.

Stage 2 - Compress Automaton & Process Prompts

The output of Stage 2 should start with the header (line 10 in the listing).
Extend your program to compress the automaton obtained in Stage 1 and use the compressed automaton to process the input prompts of Stage 2 (lines 16-18 in test0.txt). The first line of the input of Stage 2 (line 15) specifies the number of compression steps to perform. Each next compression step should be performed on the automaton resulting from all the previous compression steps. A single compression step of an automaton is defined by its arc. To find the arc that defines the next compression step to perform, traverse the automaton states starting from the initial state in the depth-first order, prioritizing states reachable via smaller (in ASCII) labels.

The arc between the currently visited state x and the next visited state y in the traversal of the states leads to the next compression step if: (i) x has a single outgoing arc and (ii) y has one or more outgoing arcs. The compression step is performed by first adding a new arc from x to every state reachable from y via an outgoing arc and then deleting y and all arcs that connect to y. The label of an added arc is the concatenation of the labels of the deleted arcs on the walk from the source to the target of this new arc in the original automaton. The automaton in Figure 3a is the result of the first compression step in the automaton in Figure 2e defined by the arc from state 0 to state 1. Figure 3b is the result of compressing the automaton in Figure 3a using the arc between states 18 and 19; highlighted in red in Figure 3a. The depth-first order of the states in the automaton in Figure 3a that starts from the initial state and prioritizes smaller labels is 0, 2, 18, 19, 20, 3, 4, 5, 6, 7, 8, 26, 27, 28, 9, 10, 14, 15, 16, 17, 11, 12, 13, 21, 22, 23, 24, 25, and the arc from state 18 to state 19 is the first arc between two consecutive states in this order that satisfies the compression conditions. The automaton in Figure 3c is obtained by compressing the arc between states 3 and 4 in the automaton in Figure 3b. The automaton in Figure 1 results from 12 requested compression steps in the original automaton constructed in Stage 1 of the program. It allows for one additional compression defined by the arc between states 21 and 24, but it was not requested.

The prompt replay and extension generation in Stage 2 must follow the corresponding principles described in Stage 1 but should be performed on the compressed automaton. The output should report the number of states in the compressed automaton (line 11 in the listing), the sum of frequencies of all the states in the compressed automaton (line 12), and all the generated statements (lines 14-16) after the delimiter line of 37 "-" characters (line 13). Every run of your program should terminate by printing the end message (line 17 in the output listing).

Attachment:- Algorithms.rar

Reference no: EM133541643

Questions Cloud

Summarize one or two key points that convey the weinstein : select and analyze a specific slide. Summarize one or two key points that convey the Weinstein's coaching style and approach to developing catchers.
What was your reaction to the beeline wheelchair clinic : What was your reaction to the Beeline wheelchair clinic? How do you feel about your role in helping an organization like this through live case in this class?
Conduct an audit on existing tools : Conduct an audit on existing tools and security infrastructure for the organisation based on existing methods of cyber security attacks.
Evaluate the historian statement with primary focus : Evaluate the historian's statement with primary focus on the Civil War/Reconstruction as a "revolution."
Understanding of dynamic memory and linked data structures : Understanding of dynamic memory and linked data structures (Chapter 10) and extend your program design, testing, and debugging skills
What are the top 10 information security requirements : What are the top 10 information security requirements that would be in a gap analysis matrix
How the 2 types of media have influenced american culture : Describe how the 2 types of media have influenced American culture since 1970. Identify at least 2 important examples of each media type
Create a bcc enterprise information security : Summary of the information security initiatives planned for implementation, including any additional hardware/software purchases.
Briefly explain the history and context of your topic : ETHC 445- Briefly explain the history and context of your topic, and note the forces that have shaped attitudes toward the topic over time.

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd