Create a database for a data mining project

Assignment Help Other Subject
Reference no: EM132365924

Assignment - ER Model & Relational Schema

Overview
The purpose of this task is to develop student's skills in designing and implementing a relational database for a given case study.

Learning Outcomes Assessed
The following course learning outcomes are assessed by completing this assessment:
K4. Design a relational database for a provided scenario utilising tools and techniques including ER diagrams, relation models and normalisation.
K5. Describe relational algebra and its relationship to Structured Query Language (SQL). A1. Design and implement a relational database using a database management system.

Assessment Details

Background

You have been commissioned to create a database for a data mining project related to mobility using GPS track logs. Very large "trajectory" datasets are increasingly availability due to the proliferation of positioning sensors and location-based services. However, a successful integration of mobility data still requires the development of conceptual and database frameworks that will support appropriate data representation and manipulation capabilities.

GPS track logs come in many different kinds of formats, for instance GPX1 or NMEA2 files. These formats can support simple descriptive statistics such as: distance travelled, average speed, time in motion vs. time stationary, elimination of stationary segments. However, there are very few data mining algorithms or libraries that can be used on this kind of file. Additionally, when processing GPX files often there may have been added custom extensions to deal with related to the domain, for instance data like heart rate, cadence, power, and so on.

It is important to understand the difference between the raw data from the GPS device, the track log in GPX/NMEA, and a "route", often called a semantic trajectory. A route is derived from the track, and contains meaning, or semantic tags. For instance there will now be a start and end to the route, specific places that have been visited, and so on. This is in contrast to the raw data which is merely a time-based sequence of geographical coordinates. The track log has been "processed" or "transformed" into the route.

Therefore it is important to be able to transform from one file format into another, for instance to transform a GPX tracklog into an ESRI Shapefile3, or into GML4, KML5, RDF6 or GeoJSON7. format Track log data can also transformed into "LINESTRING" for insertion into a spatially-enabled relational database. MySQL for instance provides many built-in functions like POINT, LINESTRING, POLYGON8 etc. The main drawback with LINESTRING is that they often (depending on the database) do not contain timestamp data. A further solution is to store the track data as an array of objects, with keys corresponding to different attributes such as latitude, longitude, elevation, time from start, distance from start, speed, heart rate, etc. Metadata can also be stored along the route to specify details about each section. When parsing the array of track points, the metadata can be used to split a route into a series of Segments.

This Assessment's modelling task is to develop a database schema to store track logs, and to keep a record of any calculations and transformations that have been carried out on these track logs into different formats.
Summary of operations:
• One file format can be transformed into another file format
• Algorithms (simple) on individual track logs:- distance travelled, average speed etc - works on GPX, LINESTRINGS
• Algorithms (simple) on individual track logs:- creating stop points, other significant points - works on GPX, LINESTRINGS
• Algorithms (complex) on individual track logs:- intersection with landscape features/points of interest (POI's) etc
- works on Shapefiles.
• Algorithms on multiple track logs (data mining):- association rules (fuzzy spatio-temporal), clustering algorithms, Frechet distance of similarity between tracks - works on arrays of objects
• Algorithms on multiple track logs (semantic):- GeoSPARQL9 only works on RDF and concept hierarchies
• Algorithms on multiple track logs (‘group' or 'common' behaviours among moving entities):- some examples of these patterns are flocks, moving clusters, convoy queries, closed swarms, group patterns, periodic patterns - works on LINESTRINGS, arrays of objects

Some of the reports that will be important to run from the database design include:
• a list of all tracks (raw data) in the database
• a list of transformed formats available for a particular track
• a list of algorithms that have been applied to each of the different formats of tracks, and the results of these algorithms
No normalisation has been undertaken on these entities, so there may be many to many relationships that are not resolved. Your submission should have all many to many relationships resolved. You may add entities or attributes as you see fit.
The minimum entities you are expected to have are listed below:
• Each Track will have a unique ID, a name, a date and location, and will be comprised of multiple Points.
• Each Point will have a Latitude, Longitude, Date and Time.
• There will be many types of File Format, including the original "raw data" format of either GPX or NMEA, and transformed formats of Shapefile, LineString, GML, RDF and so on.
• Transformations are used to change from one file format to another.
• There are many Algorithms possible, some simple (e.g. descriptive statistics, preprocessing), and some complex (e.g. data mining, semantic operations), but all will have Results
• Results can be simple values (calculation of average Speed, distance travelled), a complex value (series of Points that constitute a cluster), or even a geometry (a derived line segment or polygon that represents an area of significance). A Result will reference in some way the file and algorithm from which it is derived. It should also have a date and name.
• Complex Algorithms (data mining) include segmentation, clustering, prediction
• Complex Algorithms (behavioural) include flocking, following, avoidance etc.
• Complex Algorithms also include those based specifically by querying semantic (RDF) formatted data.
• Algorithms,Transformations, File Formats, Results constitute the parts of a specific Experiment. There will be many Experiments. An Experiment will have a name and date range (start and finish), and notes.

Requirements

This assignment should be presented in a report format, including the following items:
• An ER Diagram with all entity names, attribute names, primary and foreign keys, relationships, cardinality and participation indicated. All many to many relationships should be resolved.
• A discussion of normalisation including the normal form that each entity is in and why that is optimal. Also, a discussion of how normalisation was achieved for that entity. We want 3NF unless there is a compelling reason to keep a particular relation in 2NF.
• A list of relationships with all table names, attributes, primary and foreign keys indicated as per the conventions given in the lecture slides (i.e. entity/table names in capitals, attributes as proper nouns, primary key underlined and foreign keys in italics).
• A database schema indicating the type and purpose of all attributes.

Reference no: EM132365924

Questions Cloud

Discuss the importance of preprocessing the datasets : Discuss the importance of preprocessing the datasets to ensure better data quality for data mining techniques. Give an example from your own.
Discuss the importance of preprocessing the datasets : Discuss the importance of preprocessing the datasets to ensure better data quality for data mining techniques. Give an example from your own personal.
Public key infrastructure : Which one of the following statements is most correct about data encryption as a method of protecting data? It is usually easily administered
How can use technology and teamwork to fulfill requirement : Getting along with others is a key component of a strong community, and an essential skill for life. Please write two paragraphs (150 words) to explain how can.
Create a database for a data mining project : ITECH2004 - Data Modelling - Federation University - Create a database for a data mining project related to mobility using GPS track logs. Very large
Difference between global ip address and private ip address : Explain the role of ARP and how it works. How might this be a security issue? What is the difference between a global IP address and a private IP address?
Determine how would you go about determining accuracy : Suppose you lead a task force that is developing a simulation to provide strategic planning recommendations for property use zoning for a county of 750,000.
Connection between motivation and job satisfaction : What is the connection between motivation and job satisfaction, or between motivation and performance? What are the pros and cons of working in groups
Discuss the primary goal of the vulnerability assessment : The primary goal of the vulnerability assessment and remediation is to identify specific, documented vulnerabilities and remediate them in a timely fashion.

Reviews

len2365924

9/4/2019 11:23:55 PM

Normalisation • All entities and relationship in appropriate normal form 0 • Discussion of normalisation for all entities and relationships 0 • Appropriate interpretation of each normal form, arguments for leaving the schema in the normal form you consider optimal. 0 Relational Schema 0 • Primary keys used 0 • Foreign keys correctly identified including parent entity 0 • Schema is a correct translation of the E-R diagram submitted with appropriate tables, columns, primary keys, and foreign keys 0 • Types and restrictions on attributes given 0 Total Mark [75 marks] 0.0 Total Worth [20%] 0.0

len2365924

9/4/2019 11:23:49 PM

Assessment Criteria Marking Scale Poor Excellent 1 ....................... 5 Presentation and Referencing • Overall presentation of the report 0 • Full APA referencing of all materials used and full disclosure of assistance from all sources including tutors and other students 0 ER Diagram • Completeness of diagram 0 • Correct notation and convention used 0 • All assumptions clearly noted 0 • Primary and foreign keys 0 • Resolution of many to many relationships 0

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd