Conduct and report on exploratory data analysis

Assignment Help Other Subject

Reference no: EM132569340

Task - Exploratory Data Analysis and Linear Regression Analysis

Carefully study the Data Dictionary for Boston Housing Data Set (See Table 1) and accompanying description of each variable. It is important to understand this data set as it is used for Task 2 and Task 3 in Assignment 2. Each record in the housing.csv data set describes a Boston suburb or town. The data was drawn from the Boston Standard Metropolitan Statistical Area (SMSA) in 1970.

Variable	Description	Type
crim	Per capita crime rate by town	real
zn	proportion residential land zoned for lots over 25,000 sq.ft.	real
indus	proportion of non-retail business acres per town.	real
chas	Charles River dummy variable (1 if tract bounds river; else 0)	integer
nox	nitric oxides concentration (parts per 10 million)	real
rm	average number of rooms per dwelling	real
age	proportion of owner-occupied units built prior to 1940	real
dis	weighted distances to five Boston employment centres	Real

rad	index of accessibility to radial highways	Integer
tax	full-value property-tax rate per $10,000	real
ptratio	pupil-teacher ratio by town	real
bk	where Bk is the proportion of African Americans by town	real
Istat	% lower status of the population	real
medv	Median value of owner-occupied homes in $1000's	real

Note: You should conduct some desktop research to identify determinates/drivers of Housing prices in order to fully understand and interpret the key findings of the exploratory data analysis (EDA) and Linear Regression Models for the housing.csv data set for Task 2 and visual presentation of the housing.csv data set in Task 3.

Task 2.1) Conduct and report on exploratory data analysis (EDA) of the housing.csv data set using RapidMiner Studio data mining tool. Note this will require use of a number of RapidMiner operators

Provide following for Task 2.1:
a screen capture of your final EDA process, briefly describe your EDA process

(ii) summarise key results of your exploratory data analysis in Table 2.1 Results of Exploratory Data Analysis for housing.csv. Table 2.1 should include key characteristics of each variable in housing.csv set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc.

(iii) Discuss key results of exploratory data analysis presented in Table 2.1 and provide a rationale for selecting top 5 variables for predicting median house value (medv), in particular focusing on the relationships of independent variables with each other and with dependent variable median house value (medv) drawing on results of EDA analysis and relevant literature on determinates of house prices

Task 2.2) Build and report on Linear Regression model for predicting medv using RapidMiner data mining process and appropriate set of data mining operators and a reduced set of variables from housing.csv data set as determined by your exploratory data analysis in Task 2.1.

Provide the following for Task 2.2:

(i) A screen capture of Final Linear Regression Model process and briefly describe your Final Linear Regression Model process

(ii) Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for housing.csv data set.

(iii) Discuss the results of Final Linear Regression Model for housing.csv data set drawing on key outputs (coefficients, standardised coefficients, t-statistics values, p-values and significance levels etc) for predicting median house value (medv) and relevant supporting literature on interpretation of a Linear Regression Model.

Reference no: EM132569340

Questions Cloud

What field of business marketing : Select a company or brand/business you are familiar with or find interesting. What field of business marketing is this company in?

Hiring workers to sell clothing : Imagine you are hiring workers to sell clothing. What are some different ways you could pay your employees?

Different ways you could pay your employees : Imagine you are hiring workers to sell clothing. What are some different ways you could pay your employees?

Tableau desktop view of housing data : Create a Tableau Text Table or Graph view that displays median house values and potential impact of crime rate and other relevant data using data set

Conduct and report on exploratory data analysis : Conduct and report on exploratory data analysis (EDA) of the housing.csv data set using RapidMiner Studio data mining tool. Note this will require

Relevant and current literature on data warehouses : Drawing on relevant and current literature on data warehouses, write a short essay on data warehousing that addresses three sub tasks

How quality planning effects project scope management : Develop a high-level document that outlines how quality planning effects project scope management and Provide a document identifying who the stakeholders

Provide a comprehensive post-implementation plan : Provide a comprehensive post-implementation plan and Provide a set of recommendations as to how the project integration can be managed

Provide a complete project plan : Provide a summary of expanding your ICT unit to to incorporate Cloud Computing technologies - Provide an initial estimated quality

User Account

All Pages