Analyse datasets by interpreting summary statistics

Assignment Help Other Subject
Reference no: EM132211496

Problem solving task - Using aggregation functions for data analysis

Learning Outcomes

This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning Outcomes (GLO):

ULO1 - assessed through student ability to apply knowledge of multivariate functions, data transformations and data distributions to summarise data sets.
ULO2 - assessed through the student ability to analyse datasets by interpreting summary statistics, model and function parameters.
ULO4 - assessed through student ability to develop software codes to solve computational problems for real world analytics.

Purpose

This assignment will test your knowledge and understanding of the aggregation functions and their applications for data summarization and prediction. This assignment will also test your ability in R programming, in using specific R commands as well as R packages.

Instructions

The work is individual. Solutions and answers to the assignment must be explained carefully in a concise manner and presented carefully. Use of books, articles and/or online resources on share price related to SIT718 Real World Analytics is allowed. Students are expected to refer to the suitable literature where appropriate.

Forest Fires Data Set

In order to predict the burned area of forest fires ("UCI Machine Learning Repository: Forest Fires Data Set", 2017), in the northeast region of Portugal ("Montesinho.Com - Nature Tourism In Montesinho Natural Park", 2017), analysis of the meteorological and other data is required (see details at "Forest Fires Dataset", 2017), also consider the information given in https://cwfis.cfs.nrcan.gc.ca/background/summary/fwi . For this assignment you are provided with a modified dataset "Forest718.txt".
Attribute Information:
X1: x-axis spatial coordinate within the Montesinho park map: 1 to 9 ("Montesinho.Com - Nature Tourism In Montesinho Natural Park", 2017)

X2: y-axis spatial coordinate within the Montesinho park map: 2 to 9 ("Montesinho.Com - Nature Tourism In Montesinho Natural Park", 2017)

X3: month - month of the year: 'jan=1' to 'dec=12' X4: day - day of the week: 'mon=1' to 'sun=7'
X5: FFMC - FFMC index from the FWI system: 18.7 to 96.20 (Happe, 2017) X6: DMC - DMC index from the FWI system: 1.1 to 291.3 (Happe, 2017)
X7: DC - DC index from the FWI system: 7.9 to 860.6 (Happe, 2017) X8: ISI - ISI index from the FWI system: 0.0 to 56.10 (Happe, 2017) X9: temp - temperature in Celsius degrees: 2.2 to 33.30
X10: RH - relative humidity in %: 15.0 to 100 X11: wind - wind speed in km/h: 0.40 to 9.40 X12: rain - outside rain in mm/m2 : 0.0 to 6.4
X13=Y: area - the burned area of the forest (in ha): 0.00 to 1090.84

Assignment tasks

1. Understand the data

(i) Download the txt file (Forest718.txt) from Future Learn and save it to your R working directory
(ii) Assign the data to a matrix, e.g. using the.data <- as.matrix(read.table("Forest718.txt"))

Your variable of interest is X13=Y: area - the burned area of the forest (in ha): 0.00 to 1090.84 (the thirteenth column in the dataset). Generate a subset of 200 data e.g. using:
my.data <- the.data[sample(1:517,200),c(1:13)]

(iii) Choose any FOUR variables from X5 to X11. Using scatter plots and histograms, report on the general relationship between each of the variables and your variable of interest Y. Include 4 scatter plots, 5 histograms and 1 or 2 sentences for each of the variables

2. Transform the data

(i) For the chosen four variables and the variable of interest Y make appropriate transformations so that the values can be aggregated in order to predict the variable of interest (the area). Assign your transformed data along with your transformed variable of interest X13=Y to an array (it should be 200 rows and 5 columns). Save it to a txt file titled "name-transformed.txt".

write.table(your.data,"name-transformed.txt",)

(iii) Briefly explain the general relationship between each of your transformed variables and your variable of interest (the area). (2-3 sentences each)

3. Build models and investigate the importance of each variable

(i) Download the AggWaFit.R file (from CloudDeakin) to your working directory and load into the R workspace using,

source ("AggWaFit718.R")

(ii) Using the fitting functions to learn the parameters for:

• A weighted arithmetic mean,
• Weighted power means with p = 0:5, and p = 2,
• An ordered weighted averaging function, and
• A Choquet integral. [10 marks]
(iii) Include two tables in your report - one on the error measures, and one summarising the weights/parameters that were learned for your data.

(iv) Compare and interpret the data in your tables. Be sure to comment on:

a. How good the model is.
b. The importance of each of the variables (the four variables that you have selected),
c. Any interaction between any of those variables (are they complementary or redundant?) and
d. Better models favour higher or lower inputs. (1-3 paragraphs)

4. Use your model for prediction
(i) Using your best fitting model, predict the area for the following input: X5=91.6; X6=181.3; X7=613; X8=7.6; X9=24.6; X10=44; X11=4; X12=0.
(ii) Give your result and comment on whether you think it is reasonable. (1-2) sentences)

(iii) Comment generally on the ideal conditions (in terms of your chosen four variables) under which an area will result. (1-2 sentences)

Your final submission, which should be submitted to the SIT718 CloudDeakin Dropbox, should include the following three files. Please follow the instructions below and do not compress your files.

1. A "name-report.pdf" report (created in any word processor), covering all of the items in above (items coloured blue usually have explicit instructions about what should be included). With plots and tables it should only be 3 - 5 pages.

2. A data file named "name-transformed.txt" (where `name' is replaced with your name
- you can use your surname or first name - just to help me distinguish them!).

3. The R code file (that you have written to produce your results) named "name- code.R" (where `name' is replaced with your name - you can use your surname or first name).

Attachment:- Assessment-Task.rar

Reference no: EM132211496

Questions Cloud

Write a program using a class that has array data members : Write a program using a class that has array data members NAME, SALARY, YEAR_HIRED. This program will write the data for 10 records to a random access file.
Find the mean and standard deviation of a number of points : Write a program, using C#, that will find the mean and standard deviation of a number of data points.
Use the arrays to do a reverse lookup of possible months : Write a program with two arrays: one to hold the number of days in each month (size 12) and one to hold the names of all the months (size 12).
Write a program that uses a dynamic list of strings : Write a program that uses a dynamic list of strings to keep track of a list of chores that you have to accomplish today. The user of the program can request.
Analyse datasets by interpreting summary statistics : SIT718 Real World Analytics - analyse datasets by interpreting summary statistics, model and function parameters - develop software codes to solve problems
Write a program that uses the x library to display a clock : Write a program that uses the X library to display a clock. The program should use the ctime() function to obtain the current time.
WAP that takes as input an unordered list of integers : Write a program that takes as input an unordered list of integers, creates a Btree of minimum degree t=4 and then outputs the sorted list of integers.
Use cubic lagrange polynomials to perform interpolations : Write a program that uses cubic Lagrange polynomials to perform interpolations.
Write a program that uses structure to store the information : Write a program that uses a structure named MovieData to store the following information about a movie: Title, Director.

Reviews

len2211496

1/9/2019 4:06:58 AM

Your final submission, which should be submitted to the SIT718 CloudDeakin Dropbox, should include the following three files. Please follow the instructions below and do not compress your files. 1. A “name-report.pdf” report (created in any word processor), covering all of the items in above (items coloured blue usually have explicit instructions about what should be included). With plots and tables it should only be 3 - 5 pages. 2. A data file named “name-transformed.txt” (where `name' is replaced with your name - you can use your surname or first name - just to help me distinguish them!). 3. The R code file (that you have written to produce your results) named "name- code.R" (where `name' is replaced with your name - you can use your surname or first name).

len2211496

1/9/2019 4:05:38 AM

The work is individual. Solutions and answers to the assignment must be explained carefully in a concise manner and presented carefully. Use of books, articles and/or online resources on share price related to SIT718 Real World Analytics is allowed. Students are expected to refer to the suitable literature where appropriate. The assessment consists of FOUR tasks. Students must attempt all tasks and provide an individual written report in appropriate word processor. The detailed problem description and data set will be released to students.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd