UFCFR5-15-3 Data Management Fundamentals Assignment

Assignment Help Other Subject
Reference no: EM133066256

UFCFR5-15-3 Data Management Fundamentals - University of the West of England

Learning Outcome 1: Learn to model, clense, normalize, shard, map, query and analyze substantial real-world big data (230MB+);

Learning Outcome 2: Understand the data cleansing, normalization and sharding processes by writing PYTHON scripts to process and convert the data to first (cleansed) CSV and then (normalized) SQL;

Learning Outcome 3: Design and implement a relational (MySQL) database and then write a PYTHON script to pipe (import) the SQL into the appropriate tables ensuring all referential integerity constraints are met.

Learning Outcome 4: Construct and implement a set of SQL queries to extract data using various filters and constraints.
Map (forward engineer) the data to a NoSQL database of your choice (MongoDB, BaseX, CouchBase, ArangoDB etc.)

Learning Outcome 5: Write a short, reflective report on the learning outcomes you have achieved. Get exposure to and learn the use of a range of data oriented technologies (databases, python & sql.)

Context: Measuring Air Quality

Levels of various air borne pollutants such as Nitrogen Monoxide (NO), Nitrogen Dioxide (NO2) and particulate matter (also called particle pollution) are all major contributors to the measure of overall air quality.

For instance, NO2 is measured using micrograms in each cubic metre of air (µg m)3. A microgram (µg) is one millionth of a gram. A concentration of 1 µg m means that one cubic metre of air contains one microgram of pollutant.

To protect our health, the UK Government sets two air quality objectives for NO2 in their Air Quality Strategy
1. The hourly objective, which is the concentration of NO2 in the air, averaged over a period of one hour.
2. The annual objective, which is the concentration of NO2 in the air, averaged over a period of a year.

The following table shows the colour encoding and the levels for Objective 1 above, the mean hourly ratio, adopted in the UK.

Index

1

2

3

4

5

6

7

8

9

10

Band

Low

Low

Low

Moderate

Moderate

Moderate

High

High

High

Very High

µg/m³

0-

67

68-

134

135-

200

201-267

268-334

335-400

401-

467

468-

534

535-

600

601

or more

The Input Data

The following ZIP file provides data ranging from 2004 to 03 February 2021 taken from 18 monitoring stations in and around Bristol.
Monitors come and go and may suffer down times, so the data isn't complete for all stations at all times.

Note the following:

There are 18 stations (monitors): 188 => 'AURN Bristol Centre',
203 => 'Brislington Depot', 206 => 'Rupert Street', 209 => 'IKEA M32',
213 => 'Old Market',
215 => 'Parson Street School', 228 => 'Temple Meads Station', 270 => 'Wells Road',
271 => 'Trailer Portway P&R',
375 => 'Newfoundland Road Police Station', 395 => "Shiner's Garage",
452 => 'AURN St Pauls',
447 => 'Bath Road',
459 => 'Cheltenham Road \ Station Road', 463 => 'Fishponds Road',
481 => 'CREATE Centre Roof',
500 => 'Temple Way', 501 => 'Colston Avenue'

Each line represents one reading from a specific detector. Detectors take one reading every hour. If you examine the file using a programming editor, Notepad++ can handle the job, you can see that the first row gives headers and there are another 1408379 (1.4 million+) rows (lines). There are 23 data items (columns) per line.

Task 1: Crop, Cleanse and Refactor the Data

Design & write appropriate PYTHON scripts to carry out the following.

a. Crop the file to delete any records before 00:00 1 Jan 2010 (1262304000).
b. Filter for and remove any dud records where there is no value for SiteID or there is a mismatch between SiteID and Location.
(This script should print the lines number and mismatch field values for each dud record.)

Task 2: Create and Implement a Normalized Database.

a. Use MySQL Workbench or any other tool to create a ER model in the third-normal form to hold the given data.

b. Use the forward engineer feature of MySQL Workbench to generate the SQL schema and implement the database (pollution-db).
(If this does not work for you, e.g. MYSQL Worbench configuration issues, you can use PHPMyAdmin within XAMPP to create the tables by hand. You can then use the export feature to extract the SQL.)

Task 3: Write Python scripts to populate the database & generate SQL.

a. Design and write a PYTHON script (populate.py) that takes the cleaned CSV file as input and creates a new database instance (POLLUTION-DB2) and populates it.

b. Create a PYTHON script (INSERT-100.PY) that generates a SQL file (INSERT-100.SQL) that holds the first 100 inserts to the readings table.

Task 4: Design, Write and Run SQL Queries.

Write and implement (test run) the following four SQL queries:

a. Return the date/time, station name and the highest recorded value of nitrogen oxide (NOx) found in the dataset for the year 2019.
b. Return the mean values of PM2.5 (particulate matter <2.5 micron diameter) & VPM2.5 (volatile particulate matter <2.5 micron diameter) by each station for the year 2019 for readings taken on or near 08:00 hours (peak traffic intensity).
c. Extend the previous query to show these values for all stations in the years 2010 to 2019.

Task 5: Model, implement and query a selected NoSQL database.

Model the data for a specific monitor (station) to a NoSQL data model (key-value, xml or graph) to implement the selected database type/product & pipe or import the data.
You can select from any of the seven databases listed below but if you want, you can select one not currently on the list (after confirmation from the tutor).

Task 6: Reflective Report

A short report in Markdown format (<800 words) reflecting on the assignment, the problems encountered and the solutions found.
In addition you should discuss and outline some of the Python tools and libraries that could be used to visualize this data. What maps / charts with which content?

You should also briefly outline the Learning Outcomes you have managed to achieve in undertaking this Assignment.

Attachment:- Data Management Fundamentals.rar

Reference no: EM133066256

Questions Cloud

What is the budgeted production for the year : The estimated January 1 inventory is 6,500 units, and the desired December 31 inventory is 6,000 units. What is the budgeted production (in units) for the year
What is the research question or research gap : Discuss the references, and critically analyse them and discuss how they reflect the proposed topic. It should reflect theas information for each paper
What is the effective yield of this security : You also know that one year from now exchange rate will be at 1.34 $/£. What is the effective yield of this security if your client holds bond until maturity
How much life insurance will you need to purchase : Using a marginal tax rate of 22% and an inflation rate of 3.5%, how much life insurance will you need to purchase using the "human life value" approach
UFCFR5-15-3 Data Management Fundamentals Assignment : UFCFR5-15-3 Data Management Fundamentals Assignment Help and Solution, University of the West of England - Assessment Writing Service
What are the total return and the current yield : What are the total return, the current yield, and the capital gains yield for the discount bond in Question 3 at $878.00? At $1,134.20
Determine the target sales price : Leon Company is considering the production and sale of a new product with fixed costs of $32,000. Determine the target sales price
What is the estimated annual royalty fee : What is the estimated annual royalty fee that will be paid to Eatzy's should Neo decide to become a franchisee
Examine and apply web-based application architecture : Examine and apply web-based application architecture - Supplementary Assessment Programming Project

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd