Conduct and report on exploratory data analysis

Assignment Help Other Subject
Reference no: EM132569340

Task - Exploratory Data Analysis and Linear Regression Analysis

Carefully study the Data Dictionary for Boston Housing Data Set (See Table 1) and accompanying description of each variable. It is important to understand this data set as it is used for Task 2 and Task 3 in Assignment 2. Each record in the housing.csv data set describes a Boston suburb or town. The data was drawn from the Boston Standard Metropolitan Statistical Area (SMSA) in 1970.

Variable

Description

Type

crim

Per capita crime rate by town

real

zn

proportion residential land zoned for lots over 25,000 sq.ft.

real

indus

proportion of non-retail business acres per town.

real

chas

Charles River dummy variable (1 if tract bounds river; else 0)

integer

nox

nitric oxides concentration (parts per 10 million)

real

rm

average number of rooms per dwelling

real

age

proportion of owner-occupied units built prior to 1940

real

dis

weighted distances to five Boston employment centres

Real

rad

index of accessibility to radial highways

Integer

tax

full-value property-tax rate per $10,000

real

ptratio

pupil-teacher ratio by town

real

bk

where Bk is the proportion of African Americans by town

real

Istat

% lower status of the population

real

medv

Median value of owner-occupied homes in $1000's

real

Note: You should conduct some desktop research to identify determinates/drivers of Housing prices in order to fully understand and interpret the key findings of the exploratory data analysis (EDA) and Linear Regression Models for the housing.csv data set for Task 2 and visual presentation of the housing.csv data set in Task 3.

Task 2.1) Conduct and report on exploratory data analysis (EDA) of the housing.csv data set using RapidMiner Studio data mining tool. Note this will require use of a number of RapidMiner operators

Provide following for Task 2.1:
a screen capture of your final EDA process, briefly describe your EDA process

(ii) summarise key results of your exploratory data analysis in Table 2.1 Results of Exploratory Data Analysis for housing.csv. Table 2.1 should include key characteristics of each variable in housing.csv set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc.

(iii) Discuss key results of exploratory data analysis presented in Table 2.1 and provide a rationale for selecting top 5 variables for predicting median house value (medv), in particular focusing on the relationships of independent variables with each other and with dependent variable median house value (medv) drawing on results of EDA analysis and relevant literature on determinates of house prices

Task 2.2) Build and report on Linear Regression model for predicting medv using RapidMiner data mining process and appropriate set of data mining operators and a reduced set of variables from housing.csv data set as determined by your exploratory data analysis in Task 2.1.

Provide the following for Task 2.2:

(i) A screen capture of Final Linear Regression Model process and briefly describe your Final Linear Regression Model process

(ii) Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for housing.csv data set.

(iii) Discuss the results of Final Linear Regression Model for housing.csv data set drawing on key outputs (coefficients, standardised coefficients, t-statistics values, p-values and significance levels etc) for predicting median house value (medv) and relevant supporting literature on interpretation of a Linear Regression Model.

Reference no: EM132569340

Questions Cloud

What field of business marketing : Select a company or brand/business you are familiar with or find interesting. What field of business marketing is this company in?
Hiring workers to sell clothing : Imagine you are hiring workers to sell clothing. What are some different ways you could pay your employees?
Different ways you could pay your employees : Imagine you are hiring workers to sell clothing. What are some different ways you could pay your employees?
Tableau desktop view of housing data : Create a Tableau Text Table or Graph view that displays median house values and potential impact of crime rate and other relevant data using data set
Conduct and report on exploratory data analysis : Conduct and report on exploratory data analysis (EDA) of the housing.csv data set using RapidMiner Studio data mining tool. Note this will require
Relevant and current literature on data warehouses : Drawing on relevant and current literature on data warehouses, write a short essay on data warehousing that addresses three sub tasks
How quality planning effects project scope management : Develop a high-level document that outlines how quality planning effects project scope management and Provide a document identifying who the stakeholders
Provide a comprehensive post-implementation plan : Provide a comprehensive post-implementation plan and Provide a set of recommendations as to how the project integration can be managed
Provide a complete project plan : Provide a summary of expanding your ICT unit to to incorporate Cloud Computing technologies - Provide an initial estimated quality

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd