Reference no: EM132715957 
                                                                               
                                       
CMM524 Advanced Data Management - Robert Gordon University
Learning outcome 1: Identify, handle and manipulate structured and unstructured data using modern databases.
Learning outcome 2: Efficiently handle and manipulate large datasets.
Learning outcome 3: Identify and implement appropriate data management techniques.
Learning outcome 4: Apply analysis techniques to extract knowledge from data.
1 Aim
This coursework examines the student's ability to design a relational database, manipulate and analyse large datasets, and to interpret analysis results.
2 Coursework Contribution
This coursework contributes 100% to the final module grade. For the weightings of different parts, please see the separate marking grid document.
3 The Tasks
3.1 Part 1: Designing a Relational Database
Little Panda is a takeaway who wants to open up its business to accept online orders. To do this, it needs a database to store its food menu, customer data and orders.
Your task is to design a relational database that runs on MySQL. Here are some requirements:
• Customers must register before they can make orders. They must provide enough details for home delivery.
• Menu item prices may change. Customers are charged prices at the time of order.
• Little Panda needs to know the status of an order so that they can follow up.
e.g. either it is "waiting to be cooked", "cooked and to be delivered", or delivered, etc. You can assume all orders are paid before they enter the system.
• Order details must be stored for accounting purpose, even after they are completed.
3.2 Part 2: Analysing the "UN City Population" Dataset
You are given the "UN city population" dataset. Perform the following analysis using Pig:
Question 1: Find the number of countries in the dataset.
Question 2: List the countries together with the number of cities in each country1.
Question 3: List countries in ascending order of female-to-male ratio, throughout the years2.
Question 4: List the top 10 most populated cities according to the most recent data in the dataset3.
Question 5: List the top 10 cities which have the highest population change per year in percentage since the start of the survey4.
Notes:
• You must use Pig.
• Annotate your program code properly so that the marker can understand how it works. The annotation also contributes to the grade.
• State any assumption that you made.
• If you cannot complete a task, an incomplete solution may also bring you partial credit.
3.3 Part 3: Analysing Datasets of Your Choice
In this part of the coursework you need to:
• Find a dataset, or multiple datasets.
? Dataset(s) must be public domain and of a considerable size.
? A dataset cannot be too small. e.g. Just a few lines.
? There is no need to go for a GB or TB-sized dataset unless the dataset is very interesting.
? DO NOT choose a dataset similar to the one in part 2.
• Propose 3 analysis tasks that you will perform on the dataset(s).
? Your proposed analyses should be insightful. e.g. give useful information for decision making.
? You may combine multiple datasets for some interesting analyses.
? DO NOT propose tasks similar to those in part 2.
• Implement the 3 proposed analyses using Pig.
• For each analysis, interpret the result.
Attachment:- Advanced Data Management.rar