Reference no: EM132517150 , Length: word count:4000
Assignment Written Practical Report
Learning Objective 1. apply knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehousing and big data architecture, data mining process, data visualisation and performance management) and resulting organisational change and understand how these apply to the implementation of business intelligence in organisation systems and business processes
Learning Objective 2. identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making and sustainable business performance management can effectively address real-world problems
Learning Objective 3. comprehend and address complex ethical dilemmas that arise from evidence based decision making and business performance management
Learning Objective 4. communicate effectively in a clear and concise manner in written report style for senior management with the correct and appropriate acknowledgment of the main ideas presented and discussed.
Part 1
The goal of Part 1 is to predict the likelihood of rainfall tomorrow (next day) based on today's weather conditions. Part 1 of Assignment 3 requires you to use the data mining tool RapidMiner to analyse and report on the weatherAUS.csv data set. Review the data dictionary for weatherAUS.csv data set (Table 1). The dataset contains over 170,000 daily observations from January 2008 through to December 2019 from 49 Australian weather
In completing Part 1 you will apply the business understanding, data understanding, data preparation, modelling and evaluation phases of the CRISP DM data mining process.
Table 1 Data dictionary for Australian Weather Data set variables
Variable Name
|
Data Type
|
Description
|
Date
|
Date
|
Date of weather observation
|
Location
|
Text
|
Common name of the location of the weather station.
|
MinTemp
|
Real
|
Minimum temperature in degrees Celsius.
|
MaxTemp
|
Real
|
Maximum temperature in degrees Celsius.
|
Rainfall
|
Real
|
Amount of rainfall recorded for the day in mm.
|
Evaporation
|
Real
|
So-called Class A pan evaporation (mm) in 24 hours to 9am.
|
Sunshine
|
Real
|
Number of hours of bright sunshine in the day.
|
WindGustDir
|
Polynominal
|
Direction of the strongest wind gust in the 24 hours to
midnight.
|
WindGustSpeed
|
Integer
|
Speed (km/h) of the strongest wind gust in the 24 hours
to midnight.
|
WindDir9am
|
Polynominal
|
Direction of wind at 9am
|
WindDir3pm
|
Polynominal
|
Direction of wind at 3pm
|
WindSpeed9am
|
Integer
|
Wind speed (km/hr) averaged over 10 minutes prior to
9am.
|
WindSpeed3pm
|
Integer
|
Wind speed (km/hr) averaged over 10 minutes prior to 3pm.
|
Humidity9am
|
Integer
|
Relative humidity (percent) at 9am.
|
Humidity3pm
|
Integer
|
Relative humidity (percent) at 3pm.
|
Pressure9am
|
Real
|
Atmospheric pressure (hpa) reduced to mean sea level at
9am.
|
Pressure3pm
|
Real
|
Atmospheric pressure (hpa) reduced to mean sea level at
3pm.
|
Cloud9am
|
Integer
|
Fraction of sky obscured by cloud at 9am. This is measured in "oktas", which are a unit of eighths. It records how many eights of the sky are obscured by cloud. A 0 measure indicates completely clear sky whilst
an 8 indicates that it is completely overcast.
|
Cloud3pm
|
Integer
|
Fraction of sky obscured by cloud (in "oktas": eighths) at
3pm. See Cload9am for a description of the values.
|
Temp9am
|
Real
|
Temperature (degrees C) at 9am.
|
Temp3pm
|
Real
|
Temperature (degrees C) at 3pm.
|
RainToday
|
Nominal
|
Integer: Yes if precipitation (mm) in the 24 hours to 9am
exceeds 1mm, otherwise No.
|
RISK_MM
|
Real
|
Amount of rain. A kind of measure of the "risk".
|
RainTomorrow
|
Nominal
|
Target variable. Did it rain tomorrow? Yes or No
|
An additional data set weatherAus-locations.csv is provided which can be joined with weatherAus.csv on the common variable/field location in order to provide more location specific data. See first record of weatherAUS-locations.csv data set
stnID
|
Location
|
stnNum
|
latitude
|
longitude
|
postcode
|
state
|
2002
|
Albury
|
72160
|
-36.069
|
146.9509
|
2640
|
nsw
|
It's a simple operation in RapidMiner to join two different files on a common variable/field name - for Assignment 3 use join operator and default inner join on locations variable/field
Part 1.1 Conduct an exploratory data analysis and data preparation of weatherAUS.csv data set using RapidMiner to understand the characteristics of each variable and the relationship of each variable to the other variables in the data set. Summarise the findings of your exploratory data analysis and data preparation in terms of describing key characteristics of each of the variables in the weatherAUS.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc and relationships with other variables, transformation of existing variables, creation of new variables in a table named Part 1.1 Results of Exploratory Data Analysis and Data Preparation.
Part 1.2 Build a Decision Tree model for predicting whether it is likely to rain tomorrow based on today's weather conditions and any other relevant variables using RapidMiner and a set of data mining operators and a reduced weatherAUS.csv data set in part determined by your exploratory data analysis in Part 1.1. Provide these outputs from RapidMiner (1) Final Decision Tree Model process, (2) Final Decision Tree diagram, and (3) Decision tree rules.
Briefly explain your final Decision Tree Model Process, and discuss the results of the Final Decision Tree Model drawing on the key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting whether it is likely to rain tomorrow based on today's weather conditions and any other relevant variables and relevant supporting literature on interpretation of decision trees ( 250 words).
Part 1.3 Build a Logistic Regression model for predicting whether it is likely to rain tomorrow based on today's weather conditions and any other relevant variables using RapidMiner and an appropriate set of data mining operators and a reduced weatherAUS.csv data set determined in part by your exploratory data analysis in Part 1.1. Provide these outputs from RapidMiner (1) Final Logistic Regression Model process and (2) Coefficients, and (3) Odds Ratios. Hint for Part 1.3 Logistic Regression Model you may need to change data types of some variables.
Briefly explain your final Logistic Regression Model Process, and discuss the results of the Final Logistic Regression Model drawing on the key outputs (Coefficients, Odds Ratios) for predicting whether it is likely to rain tomorrow and other relevant variables and relevant supporting literature on interpretation of logistic regression models (250 words).
Part 1.4 You will need to validate your Final Decision Tree Model and Final Logistic Regression Model using the Cross-Validation Operator, Apply Model Operator and Performance Operator in your data mining processes.
Discuss and compare the performance of the Final Decision Tree Model with the Final Logistic Regression Model for predicting whether it is likely to rain tomorrow based the results of the confusion matrix, and ROC charts for each final model. You should use a table here to compare the key results of the confusion matrix for the Final Decision Tree Model and Final Logistic Regression Model using the model performance metrics - (1) accuracy (2) sensitivity (3) specificity and (4) F1 score ( 250 words).
Part 2 Deep Learning, Artificial Intelligence and Ethics
Part 2.1 Define the concept of deep learning in context of artificial intelligence (AI) and identify and discuss an application of deep learning deploying a commonly used deep learning algorithm (advanced neural networks) successfully adopted in an industry and/or government sector drawing the relevant and current literature ( 1000 words)
Part 2.2 Based on your review of deep learning in context of AI identify and discuss the key ethical concerns raised by using a deep learning algorithm to support decision making in a specific context with appropriate in-text referencing support. ( 1000 words).
Part 3 Tableau Dashboard
Dashboard Scenario: You are required build a dashboard Australian Weather Location (AWL)that shows four different views of using Australian Weather data set that you used in Part 1 as specified in sub Tasks 3.1, 3.2, 3.3 and 3.4. An additional data set weatherAus- locations.csv is provided which will need to be joined with weatherAus.csv on the common variable/field location in order to provide location specific data views in the AWL dashboard. See first record of weatherAUS-locations.csv data set
stnID
|
Location
|
stnNum
|
latitude
|
longitude
|
postcode
|
state
|
2002
|
Albury
|
72160
|
-36.069
|
146.9509
|
2640
|
nsw
|
It's a simple operation in Tableau to join two different files on a common variable/field name - for Assignment 3 it is locations variable/field
You might want to consider creating fields with bins interval ranges for variables in weather- aus.csv data set where it makes sense. Note you can create bin variables in Tableau data source view and set appropriate range names as aliases names for each variable. You will also need to know how to create a geomap view of a data set in Tableau.
Part 3.1 Create a Tableau View of rainfall over month for each location and a specific state Provide a screen capture and describe Tableau view created and comment on the rainfall over one month across different locations and does this differ much for different states (5 marks 125 words).
Part 3.2 Create a Tableau View of total rainfall by year for each location and a specific state. Provide a screen capture of and describe the Tableau view you have created and comment on variation of total rain over year across different locations for a specific state (125 words)
Part 3.3 Create a Tableau View of total evaporation over month by location for a specific state. Provide a screen capture and describe Tableau view created and comment on the levels of evaporation over month for different locations and states (125 words).
Part 3.4 Create a Tableau GeoMap View of all Australian weather stations that provides latitude, longitude and total rainfall for a selected year. Provide a screen capture and describe the Tableau Geomap view you have created and comment on two or more selected locations for a state. (125 words).
Note: you need copy the four Text Table / Graph worksheet views and dashboard view created in Tableau using Worksheet Menu Copy or Export Image option and include in Part 3 section where relevant or in Appendix 3 of Assignment 3 report.
Part 3.5 Provide screen snapshot of AWL Dashboard and an accompanying rationale (drawing on relevant literature for good dashboard design) for the graphic design and functionality that is provided by AWL Dashboard for four specified Tableau views for sub Tasks 3.1, 3.2, 3.3 and 3.4 (500 words). Note Stephen Few is considered to be the Guru for good Dashboard Design and has wrote a number of books on this topic.
Attachment:- Written Practical Report.rar