Reference no: EM133746276
Algorithms and Data Structures
Assignment: Dictionary Based on Linked List
LO1: Improve your proficiency in C programming and your dexterity with dynamic memory allocation.
LO2: Demonstrate understanding of a concrete data structure (linked list).
LO3: Practice multi-file programming and improve your proficiency in using UNIX utilities.
Background
A dictionary is an abstract data type that stores a collection of data records and supports lookup of key, delete a key, insert a data record. For example, in a telephone directory, the (string) key is a person or company name, and the data record is the tuple (name, phone number).
Your Task
Assignment:
In this assignment, you will create a simple dictionary based on a linked list to store information about Australian suburbs from a dataset. Users will be able to perform operations on this dictionary, such as searching for suburbs using attributes in the dataset (keys) or removing specific suburbs from the dictionary.
C Implementation:
Your program will build the dictionary by reading data from a file. It will insert each suburb record as a node in a linked list.
You will also implement the following tools:
A search tool that looks for a key in the list and outputs any records that match the key.
A removal tool that removes all records associated with a set of specified keys. This tool should output the remaining records at the end to a new CSV file.
Dataset and Assumptions
The opendatasoft website provides numerous databases, enabling organisations and individuals to seamlessly access and share data. The dataset used in this assignment is a simplified version derived from the "State Suburbs - Australia" database on that website which originates from ABS data. You can browse or visualize the original database using the link above. The whole simplified dataset, as well as smaller samples, can be found in the Dataset Download slide.
The provided text gives a detailed description of the data format and assumptions for the assignment. Here's a summary of the key points:
Fields and Data Types:
COMP20003 Code , Official Code Suburb , Year : integers
Longitude , Latitude : doubles
all other fields: strings (may contain spaces and commas)
Special Cases:
Official Code State , Official Code Local Government Area : comma-separated lists of integers (treated as strings for this assignment).
comma-containing strings: enclosed in double quotes ("") within the CSV file, the quotes are removed when stored in your program according to standard CSV rules.
CSV File Assumptions:
Note that normally a suburb lies entirely inside a single state, as well as a single local government area, but this is not a universal rule. Suburbs might be in multiple local governments areas and
multiple states. This is why the fields and
are not classified as integers. Each of these fields is actually a comma-separated list of integers.
For the purposes of this assignment, it is fully sufficient to consider this as a string.
Implementation Details
Assignment 1 will involve producing two programs, classified into two stages.
Stage 1 will implement the lookup of data by a given key ( Official Name Suburb ) and
Stage 2 will implement the deletion of data by a given key ( Official Name Suburb ), with the remaining data being output after all deletions are finished.
The implementation details are designed to not prevent the Stage 1 and Stage 2 executables from being the same - though both programs must be produced from your Makefile. In Assignment 2 you will add two additional stages using a new data structure that uses the dictionary you build in both stages to more efficiently perform particular operations.
Stage 1 - Data Retrieval
In Stage 1, you will implement the basic functionality for a dictionary allowing the lookup of data by key ( Suburb Name ).
Your should produce an executable program called dict1 . This program should take three
command line arguments:
The first argument will be the stage, for this part, the value will always be
The second argument will be the name of the input CSV data file.
The third argument will be the name of the output file.
Read the data from the CSV data file specified in the second command line argument. The data from the CSV file should be stored in a linked list of pointers to struct s for the data. The datatypes for each field should be consistent with those in the Dataset and Assumptions slide. Each record (row) should be stored in a separate node.
The program should:
Accept Official Name Suburb s from stdin , and search the list for all matching records.
You must output all matching records to the output file. If no matches for the query are found, your program should output NOTFOUND . When outputting data records, you should follow the guidelines described in the slide Dataset and Assumptions.
In addition to outputting the record(s) to the output file, the number of records found should be output to stdout .
You can assume all queries will be terminated by a newline character, allowing queries for entries with empty suburb names. While you will have to retrieve the field headers for each
column, you can assume you know the field referred to as Official Name Suburb , is the second (0-based index) column in the dataset.
For modularity reasons, to assist with Assignment 2, your approach should ideally:
Store information about the result of the search itself independently from the structure of the list, e.g. your approach should make it simple to add additional search-related information to the result of the query.
Be structured in a way that the processed data can be passed to another function to construct a new data structure.
Just like Stage 1, read the data from the CSV data file specified in the second command line argument. The data from the CSV file should be stored in a linked list of pointers to struct s for the data. The datatypes for each field should be consistent with those in the Dataset and Assumptions slide. Each record (row) should be stored in a separate node.
Your program should:
Accept Official Name Suburb s from stdin , then remove all the matching records from the dictionary.
Once the user has signified the end of input, you must output all remaining records to the output file in the CSV format.
In addition to outputting the output CSV file, the total number of records removed should be output to stdout .
Just like Stage 1, you can assume all queries will be terminated by a newline character, allowing queries for entries with empty suburb names. While you will have to retrieve the field headers for each column, you can assume you know the field referred to as Official Name Suburb , is the second (0-based index) column in the dataset.
You will not be required to implement deletion for Assignment 2, but your approach to deletion should ideally allow for the modified data to be passed to a further function if required in a similar way to the original data.
For testing, it may be convenient to create a file of keys to be searched, one per line, and redirect the input from this file. Use the UNIX operator.