Aggregation functions for data analysis

Assignment Help Advanced Statistics
Reference no: EM132294284

Assessment Task: Problem Solving - Using aggregation functions for data analysis

Learning Outcomes

Unit Learning Outcome

ULO1 - assessed through student ability to apply knowledge of multivariate functions, data transformations and data distributions to summarise data sets.
ULO2 - assessed through the student ability to analyse datasets by interpreting summary statistics, model and function parameters.
ULO4 - assessed through student ability to develop software codes to solve computational problems for real world analytics.

Graduate Learning Outcome
GLO1 - Discipline knowledge and capabilities
GLO4 - Critical thinking
GLO5 - Problem solving

Purpose
This assignment will test your knowledge and understanding of the aggregation functions and their applications for data summarization and prediction. This assignment will also test your ability in R programming, in using specific R commands as well as R packages.

Instructions
The work is individual. Solutions and answers to the assignment must be explained carefully in a concise manner and presented carefully. Use of books, articles and/or online resources on share price related to SIT718 Real World Analytics is allowed. Students are expected to refer to the suitable literature where appropriate.

The assessment consists of FOUR tasks. Students must attempt all tasks and provide an individual written report in appropriate word processor.

Using aggregation functions for data analysis

DownloadSIT718_Assessment-Task_3-T1_2019-data and script.zip it contains the data file [Energy19.txt ] and the R code [AggWaFit718.R ] to use with the following tasks, include these in your R working directory.

Energy Prediction of Domestic Appliances Dataset

The given dataset, "Energy19.txt", can be used to create models of energy use of appliances in a energy-efficient house. The dataset provides the Energy use of appliances (denoted as Y) using 671 samples. It is a modified version of data used in the study [1]. The dataset includes 5 variables, denoted as X1, X2, X3, X4, X5, and Y, described as follows:

X1: Temperature in kitchen area, in Celsius
X2: Humidity in kitchen area, given as a percentage
X3: Temperature outside (from weather station), in Celsius
X4: Humidity outside (from weather station), given as a percentage
X5: Visibility (from weather station), in km
Y: Energy use of appliances, in Wh

Assignment Tasks

1. Understand the data
(i) Download the txt file (Energy19.txt) from Future Learn and save it to your R working directory.
(ii) Assign the data to a matrix, e.g. using the.data <- as.matrix(read.table("Energy19.txt "))
(iii) The variable of interest is Energy use of appliances (Y). To investigate Y, generate a subset of 300 data, e.g. using:
my.data <- the.data[sample(1:671,300),c(1:6)]
(iv) Using scatter plots and histograms, report on the general relationship between
each of the variables X1, X2, X3, X4, X5 and the variable of interest Y. Include 5 scatter plots, 6 histograms, and 1 or 2 sentences for each of the variables, including the variable of interest Y.

2. Transform the data
(i) Choose any four from the five variables (X1, X2,..,X5). Make appropriate transformations to the chosen four variables and the variable of interest Y so that the values can be aggregated in order to predict the variable of interest. Assign your transformed data along with your transformed variable of interest to an array (it should be 300 rows and 5 columns). Save it to a txt file titled "name- transformed.txt" using
write.table(your.data,"name-transformed.txt")
where "name" is replaced with your name - you can use your surname or first name.
(ii) Briefly explain the transformations applied for the selected four variables and the variable of interest. (1- 2 sentences each)

3. Build models and investigate the importance of each variable
(i) Download the AggWaFit718.R file (from Future Learn) to your working directory and load into the R workspace using,
source("AggWaFit718.R")
(ii) Use the fitting functions to learn the parameters for
• A weighted arithmetic mean (WAM)
• Weighted power means (WPM) with p = 0.5, and p = 2,
• An ordered weighted averaging function (OWA), and
• A Choquet integral.

(iii) Include two tables in your report - one with the error measures and correlation coefficients, and one summarising the weights/parameters and any other useful information learned for your data.
(iv) Compare and interpret the data in your tables. Comment on
a. How good the model is,
b. The importance of each of the variables (the four variables that you have selected),
c. Any interaction between any of those variables (are they complementary or redundant?) and
d. Better models favour higher or lower inputs.
(1-3 paragraphs for part 3(iv))

4. Use your model for prediction
(i) Choose your best fitting model.
Using your best fitting model, predict the Energy use of appliances for the following input X1=18; X2=44; X3=4; X4=74.8; X5=31.4.
(ii) Give your result and comment on whether you think it is reasonable. (1-2 sentences).
(iii) Comment on the best conditions (in terms of your chosen four variables) under which a low Energy use of appliances will occur. (1-2 sentences).

Submit to the SIT718 Clouddeakin Dropbox. Your final submission should include the following three files:

1. A report, "name-report.pdf", in pdf format (created in any word processor), covering
all of the items in above (where "name" is replaced with your name -you can use your surname or first name). With plots and tables it should be up to 7 pages.

2. A data file named "name-transformed.txt" - just to help us distinguish them!).

3. The R code file (that you have written to produce your results) named "name-code.R" (where "name" is replaced with your name - you can use your surname or first name).

Attachment:- data and script.zip

Verified Expert

In this assignment the whole analysis was done using R 3.50. To see the relationship between the variables, scatter plot and histogram are done. After that the response variable is transformed. Then linear regression was performed to predict the variables of interest.

Reference no: EM132294284

Questions Cloud

How was the success of the program or policy measured : How was the success of the program or policy measured? At what point in program implementation was the program or policy evaluation conducted?
Observations about where quality improvements : Read the case study Patient Safety at Grand River Hospital & St. Mary's General Hospital in your Learning Resources.
Describe the role and benefits of health informatics : Describe the role and benefits of health informatics in the delivery of quality patient-centered care. Discuss professional responsibilities in the use.
Write response on survey your community : Survey your community for the resources that would be available to assist a low-income multiparous woman to cope with bed rest at home as part.
Aggregation functions for data analysis : SIT718 - Real World Analytics - Deakin University - Using aggregation functions for data analysis - Build models and investigate the importance of each variable
Develop stakeholder analysis and strategy development report : MBA501 Dynamic Strategy and Disruptive Innovation: Develop a Stakeholder Analysis and Strategy Development Report.
What types of support services should hospitals provide : What types of support services should hospitals provide to assist couples who have experienced a loss of pregnancy? What should the role of the nurse.
What strategies can the nurse use to cope with the stress : What strategies can the nurse use to cope with the stress of providing care for children who are dying and avoid burnout? The response must be typed.
Developing a nursing care plan for a patient : Developed a nursing care plan for a patient with a hearing impairment disability affecting speech development. Make sure all the steps of the nursing process.

Reviews

len2294284

4/26/2019 10:50:28 PM

• No more than 7 A4 sides, including Figures, Tables, Appendices and References. The report should be typed. Use minimal font 11pt and 2.5cm side margins. If the page limit is exceeded only the first 7 pages will be marked. • Assignment (a report in pdf format, software code and/or data) must be submitted via the assignment folder in the unit site (accessed via the unit Program page) • No e-mail or hardcopy submissions are accepted.

len2294284

4/26/2019 10:50:21 PM

its assignment thats need to be done in "R STUDIO" all the requirments are thee in assignment pdf three files should be there 1. manmeet.pdf(report file) 2. manmeet-transformed.txt(data file) 3. manmeet-code.R(R code file) The assessment consists of FOUR tasks. Students must attempt all tasks and provide an individual written report in appropriate word processor. The detailed problem description and data set will be released to students on Friday 5th.

Write a Review

Advanced Statistics Questions & Answers

  Find a right eigenvector for each distinct eigenvalue

Find a right eigenvector for each distinct eigenvalue, and show that the eigenvalue of multiplicity 2 does not have two linearly independent eigenvectors.

  Uthe following information for questions 1 -10

use the following information for questions 1 -10 customergenderpaymentamount

  Random variable and statistical test

Assume that in healthy American men the level of hemoglobin is normally distributed with mean ?=14 and standard deviation?=1.1.

  Breakeven point for managerial accounting

Compute the breakeven point for two product lines. Scotty's Scooters plans to sell a standard scooter for $40 and a chrome scooter for $50. Scotty's purchases the standard scooter for $25 and the chrome scooter for $30.

  Uestion about quantitative analysis

Color View is a manufacturer of color monitors for personal computers. The company uses the EOQ model with gradual replenishment to determine the production lot sizes for its various models.

  What is the expected value and the variance

Find the probability at least one of the children will still be awake if Santa Claus arrives at midnioht and what is the expected value and the variance of their combined waiting time under Cindy's plan?

  Explain the variation in starting salaries

Explain the variation in starting salaries for college graduates based on the college GPA. The following data were collected through a random sample of the clients with which this company has been associated.

  Determining present value and cash deposited

What is the present value of nine annual cash payments of $4,000, to be paid at the end of each year using an interest rate of 6%?

  Create a new data set

MFE 6390 - Statistics and Econometrics: Theory and Application - Create a new data set, or modify an existing one, by merging the two data sets

  Prepare a draft paper on the given topic

Prepare a draft that include the following:Sampling: A careful description of how you obtained the samples. Be very specific. Include sample sizes, population of interest, and description of sample. Also include a copy of the survey if you used on..

  Compute the mean and standard deviation

Form a frequency distribution having 9 class intervals and form a percentage distribution from the frequency distribution (from part a) - Compute the mean, standard deviation and Coefficient of variation

  How many items need to be deleted to get maximum reliability

Will the human subjects be informed of the nature of their involvement in the collection of data and of features of the research that reasonably might be expected to influence willingness to participate - Does the study involve concealment from and..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd