Reference no: EM133121757
ITEC632 Data and Information Visualisation - Australian Catholic University
Assessment - Data Mining Project
Artefact - RapidMiner
The primary purpose of this assessment is to provide students with an opportunity to develop data mining skills for finding human interpretable patterns that describe the data analysis skills.
What are the types of employability skills that I will acquire upon completion of this assessment?
Context
Consider a set of observations on a large number of white wine varieties involving their chemical properties and ranking by wine tasters contained in white-wines.csv data set. Wine industry has been growing steadily as social drinking of wine is on the rise. The price of a wine largely depends on wine appreciation by wine tasters which may have a high degree of variability. Another key factor in wine certification and quality assessment is physicochemical tests which are laboratory-based and take into account factors like acidity, pH level, presence of sugar and other chemical properties.
For wine producers, it would be of interest if wine tasters' perception of wine quality after tasting can be related to the chemical properties of wine so that certification and quality assessment and assurance process of wines is more rigorous.
The white-wines.csv data set consists of 4898 white wine varieties in total (records). All wines are from one wine producing region. The white-wines.csv data set was collected on 12 different properties of wines. Quality is based on sensory data (wine tasters' perception of the quality of a wine), the rest are based on chemical properties of wines including density, acidity, alcohol content etc. All chemical properties of wines are coded as continuous numeric variables. Quality is an ordinal variable with a possible ranking from 1 (worst) to 10 (best). Each white wine variety is tasted by three independent tasters and final rank assigned is the median rank given by tasters. See Table 1 White Wines Data Set Data Dictionary for full details of white-wines.csv data set.
Instructions
Task 1) Exploratory Data Analysis
Conduct an exploratory data analysis of the white-wines.csv data set using the RapidMiner Studio data mining tool. Summarise the findings of your exploratory data analysis in terms of describing key characteristics of each variable in the wines.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc and relationships of variables with other variables if relevant in a table named Table 1 Results of Exploratory Data Analysis for the White-Wines.csv Data Set.
Discuss the key results of your exploratory data analysis presented in Table 1 and provide a rationale for why you have selected your five top variables for predicting a wine taster's ranking of a white wine drawing on the results of your EDA analysis and relevant literature (About 250 words).
Task 2) Building a predictive Linear Regression model
2.1 ) Build a Linear Regression model for predicting the quality ranking of a white wine using a RapidMiner data mining process and an appropriate set of data mining operators and a reduced set of variables from the white-wines.csv data set determined by your exploratory data analysis.
Provide these outputs from RapidMiner
a) Final Linear Regression Model process ( diagram )
b) Summary Table of Results of Final Linear Regression Model for white-wines.csv data set.
2.2) Briefly describe your final Linear Regression Model Process, and discuss the results of the Final Linear Regression Model for white wine.csv data set drawing on the key outputs (coefficient, standardized coefficients, t-statistics values, p-values and significance levels etc) for predicting Wine Quality and relevant supporting literature on the interpretation of a Linear Regression Model (About 250 words).
Attachment:- Applied Data Mining.rar
What is the interest rate on this disguised loan
: If the law firm takes the lease, it will invest $950,000 and in effect borrow $9,050,000, What is the interest rate on this disguised loan
|
Green plantation corporation management
: Due to COVID-19 pandemic in 2020-2021, Green Plantation Corporation's management decides to cut its 2021 dividend following the company's sluggish sales perform
|
Find the real return-nominal after-tax return
: Find the real return, nominal after-tax return, and real after-tax return for each of the following stocks:
|
Determine a recommended strategy
: The products identified in this workshop were chosen at random and are not intended to be an exclusive list of variable annuity products.
|
ITEC632 Data and Information Visualisation Assignment
: ITEC632 Data and Information Visualisation Assignment Help and Solution, Australian Catholic University - Assessment Writing Service
|
What is the length of firm cash conversion cycle
: A receivables conversion period of 42 days, and a payments cycle of 33 days. What is the length of firm's cash conversion cycle
|
What is the optimal cash conversion size
: The company spends, on the average, P30 for every cash conversion to marketable securities. What is the optimal cash conversion size
|
Perform on the account
: A portfolio has an asset mix of 5% safety, 35% income and 60% growth. When the manager reviews the account prior to the clients annual review, she notices that
|
What is tom effective annual rate
: He sold all stocks today for $126.19. During the year the stock paid dividends of $6.01 per share. What is Tom's effective annual rate?
|