UFCFR5-15-3 Data Management Fundamentals Assignment

Assignment Help Other Subject
Reference no: EM133066256

UFCFR5-15-3 Data Management Fundamentals - University of the West of England

Learning Outcome 1: Learn to model, clense, normalize, shard, map, query and analyze substantial real-world big data (230MB+);

Learning Outcome 2: Understand the data cleansing, normalization and sharding processes by writing PYTHON scripts to process and convert the data to first (cleansed) CSV and then (normalized) SQL;

Learning Outcome 3: Design and implement a relational (MySQL) database and then write a PYTHON script to pipe (import) the SQL into the appropriate tables ensuring all referential integerity constraints are met.

Learning Outcome 4: Construct and implement a set of SQL queries to extract data using various filters and constraints.
Map (forward engineer) the data to a NoSQL database of your choice (MongoDB, BaseX, CouchBase, ArangoDB etc.)

Learning Outcome 5: Write a short, reflective report on the learning outcomes you have achieved. Get exposure to and learn the use of a range of data oriented technologies (databases, python & sql.)

Context: Measuring Air Quality

Levels of various air borne pollutants such as Nitrogen Monoxide (NO), Nitrogen Dioxide (NO2) and particulate matter (also called particle pollution) are all major contributors to the measure of overall air quality.

For instance, NO2 is measured using micrograms in each cubic metre of air (µg m)3. A microgram (µg) is one millionth of a gram. A concentration of 1 µg m means that one cubic metre of air contains one microgram of pollutant.

To protect our health, the UK Government sets two air quality objectives for NO2 in their Air Quality Strategy
1. The hourly objective, which is the concentration of NO2 in the air, averaged over a period of one hour.
2. The annual objective, which is the concentration of NO2 in the air, averaged over a period of a year.

The following table shows the colour encoding and the levels for Objective 1 above, the mean hourly ratio, adopted in the UK.

Index

1

2

3

4

5

6

7

8

9

10

Band

Low

Low

Low

Moderate

Moderate

Moderate

High

High

High

Very High

µg/m³

0-

67

68-

134

135-

200

201-267

268-334

335-400

401-

467

468-

534

535-

600

601

or more

The Input Data

The following ZIP file provides data ranging from 2004 to 03 February 2021 taken from 18 monitoring stations in and around Bristol.
Monitors come and go and may suffer down times, so the data isn't complete for all stations at all times.

Note the following:

There are 18 stations (monitors): 188 => 'AURN Bristol Centre',
203 => 'Brislington Depot', 206 => 'Rupert Street', 209 => 'IKEA M32',
213 => 'Old Market',
215 => 'Parson Street School', 228 => 'Temple Meads Station', 270 => 'Wells Road',
271 => 'Trailer Portway P&R',
375 => 'Newfoundland Road Police Station', 395 => "Shiner's Garage",
452 => 'AURN St Pauls',
447 => 'Bath Road',
459 => 'Cheltenham Road \ Station Road', 463 => 'Fishponds Road',
481 => 'CREATE Centre Roof',
500 => 'Temple Way', 501 => 'Colston Avenue'

Each line represents one reading from a specific detector. Detectors take one reading every hour. If you examine the file using a programming editor, Notepad++ can handle the job, you can see that the first row gives headers and there are another 1408379 (1.4 million+) rows (lines). There are 23 data items (columns) per line.

Task 1: Crop, Cleanse and Refactor the Data

Design & write appropriate PYTHON scripts to carry out the following.

a. Crop the file to delete any records before 00:00 1 Jan 2010 (1262304000).
b. Filter for and remove any dud records where there is no value for SiteID or there is a mismatch between SiteID and Location.
(This script should print the lines number and mismatch field values for each dud record.)

Task 2: Create and Implement a Normalized Database.

a. Use MySQL Workbench or any other tool to create a ER model in the third-normal form to hold the given data.

b. Use the forward engineer feature of MySQL Workbench to generate the SQL schema and implement the database (pollution-db).
(If this does not work for you, e.g. MYSQL Worbench configuration issues, you can use PHPMyAdmin within XAMPP to create the tables by hand. You can then use the export feature to extract the SQL.)

Task 3: Write Python scripts to populate the database & generate SQL.

a. Design and write a PYTHON script (populate.py) that takes the cleaned CSV file as input and creates a new database instance (POLLUTION-DB2) and populates it.

b. Create a PYTHON script (INSERT-100.PY) that generates a SQL file (INSERT-100.SQL) that holds the first 100 inserts to the readings table.

Task 4: Design, Write and Run SQL Queries.

Write and implement (test run) the following four SQL queries:

a. Return the date/time, station name and the highest recorded value of nitrogen oxide (NOx) found in the dataset for the year 2019.
b. Return the mean values of PM2.5 (particulate matter <2.5 micron diameter) & VPM2.5 (volatile particulate matter <2.5 micron diameter) by each station for the year 2019 for readings taken on or near 08:00 hours (peak traffic intensity).
c. Extend the previous query to show these values for all stations in the years 2010 to 2019.

Task 5: Model, implement and query a selected NoSQL database.

Model the data for a specific monitor (station) to a NoSQL data model (key-value, xml or graph) to implement the selected database type/product & pipe or import the data.
You can select from any of the seven databases listed below but if you want, you can select one not currently on the list (after confirmation from the tutor).

Task 6: Reflective Report

A short report in Markdown format (<800 words) reflecting on the assignment, the problems encountered and the solutions found.
In addition you should discuss and outline some of the Python tools and libraries that could be used to visualize this data. What maps / charts with which content?

You should also briefly outline the Learning Outcomes you have managed to achieve in undertaking this Assignment.

Attachment:- Data Management Fundamentals.rar

Reference no: EM133066256

Questions Cloud

What is the budgeted production for the year : The estimated January 1 inventory is 6,500 units, and the desired December 31 inventory is 6,000 units. What is the budgeted production (in units) for the year
What is the research question or research gap : Discuss the references, and critically analyse them and discuss how they reflect the proposed topic. It should reflect theas information for each paper
What is the effective yield of this security : You also know that one year from now exchange rate will be at 1.34 $/£. What is the effective yield of this security if your client holds bond until maturity
How much life insurance will you need to purchase : Using a marginal tax rate of 22% and an inflation rate of 3.5%, how much life insurance will you need to purchase using the "human life value" approach
UFCFR5-15-3 Data Management Fundamentals Assignment : UFCFR5-15-3 Data Management Fundamentals Assignment Help and Solution, University of the West of England - Assessment Writing Service
What are the total return and the current yield : What are the total return, the current yield, and the capital gains yield for the discount bond in Question 3 at $878.00? At $1,134.20
Determine the target sales price : Leon Company is considering the production and sale of a new product with fixed costs of $32,000. Determine the target sales price
What is the estimated annual royalty fee : What is the estimated annual royalty fee that will be paid to Eatzy's should Neo decide to become a franchisee
Examine and apply web-based application architecture : Examine and apply web-based application architecture - Supplementary Assessment Programming Project

Reviews

Write a Review

Other Subject Questions & Answers

  Discuss the purpose of creating constraints on database

Discuss the purpose of creating constraints on database tables? What kind of rules can we impose using constraints?

  Understanding of the role of neuronal actions

Given your understanding of the role of neuronal actions in learning, memory, and amnesia, do you think it is possible to develop drugs and behavioral procedures to selectively block specific memories? Why or why not?

  Case study-clinical psycology

Discuss the role of pharmacological interventions and if you believe it would be helpful for Simon and why.

  Identify and evaluate organisational change issues

The purpose of this assessment is for you to develop a portfolio of your thinking and work around the discipline of business process modelling. Furthermore, you can retain your portfolio for use as a reference tool in future courses.

  Write a brief history of the cultural group

Analysis of common characteristics and distinguishing traits between the cultural group and the socio-cultural group.

  Identify the specific components of an institution

Every country in the world is constructed around the same set of institutional frameworks that differ only in how governments manage them. Identify the specific components of an institution.

  Discuss integrated delivery system improves the quality

Describe three national infrastructures that require interoperability between all hospitals in the United States. Suggest two ways that the use of an integrated

  Selections determine the audience and fill in the audience s

The audience would be the individuals we would want to hear the presentation. Some examples are just doctors, all health care professionals, nurses, patients, the elderly, or administrative staff.

  What rosemarie says also does during minute period

what Rosemarie says also does during 15-minute period she is at center. Which one of following observation techniques is aide using?

  Process and composition influence project team performance

How does work design, composition, context, and process influence project team performance?

  Explain why people drink bottled water

What are some reasons that explain why people drink bottled water?

  Sacrifice economic growth to prevent gentrification

Who really suffers as a result of gentrification, and is their displacement a direct result of gentrification or are more 'complex issue to blame?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd