Reference no: EM133144197 , Length: 1500 Words
ISYS5007 Data Management Plan - Curtin University
Project
Overview
Describe your scenario and the business problem or research question to be addressed.
Assumptions
State all assumptions.
For example, do you assume that you are part of an organisation like the Red Cross, Transperth, or some other organisation that has internal data assets that are not generally available as open data or on a web site?
Volume, Velocity, Variety, Veracity, Value
Make reasonable estimates of the 5Vs and describe the impact on storage and processing requirements.
Data Lifecycle
Discuss issues related to the data lifecycle, such as:
• data acquisition, including data sources and the means by which data will be acquired
(e.g. SQL, Open Data repository, Web page, sensor data, internally available data);
• business owners of the data
(e.g. Transperth, Red Cross, or internal departments like Sales, Marketing, or HR);
• data formats (e.g. XML, CSV, scraped HTML);
• cleaning and transformation;
• integration;
• reproducibility;
• plans for data publishing and re-use; and
• plans for data retirement.
Refer to illustrative examples in the appendix as necessary These should be specific to the assigned scenario and to your data management plan.
Data Analysis
Explain how data will be analysed to address the business question in the scenario.
This includes strategies that use data for
• preliminary exploration of the scenario;
• illustrative visualisations that you will develop to interpret data; and
• high level descriptions of prediction and modelling strategies.
Refer to high-level descriptions of algorithms in the appendix as necessary.
Assessment: Write a data management plan for ONE of the data science project scenarios given below OR seek approval in writing from the Unit Coordinator for an alternate scenario of your choice.
Your data management plan should identify data sources and articulate a reasonable approach to data acquisition, cleaning, integration, archival and analysis.
Consideration should be given to hardware/software and processing requirements.
Consider using measured, estimated, or simulated data to demonstrate the kind of data that you anticipate this project will use.
From a high-level perspective describe in English how you anticipate this data will be analysed. You may consider using pseudo code to illustrate this or use Python or a similar programming language if you have the necessary skills.
Note that actually analysing data is not a requirement for this assessment.
Scenario Choices
Scenario 1: Your organisation aims to use data to assist with COVID-19 economic recovery in WA.
Scenario 2: Your organisation aims to use data for targeted advertising of products and services that promote health and well-being.
Scenario 3: Your organisation aims to use data to inform the University's admissions process by identifying prospective students likely to have an aptitude for predictive analytics.
Scenario 4: Your organisation aims to use social media data to identify individual having mental health issues that are in need of professional
assistance.
Scenario 5: Your organisation aims to use data to predict the winner the 2020 AFL Grand Final.