CSCE 822 - Data Mining and Warehousing Assignment

Assignment Help Other Subject
Reference no: EM132400011

CSCE 822 - Data Mining & Warehousing Assignment - University of South Carolina, USA

Attached melb_data.csv file is the Snapshot of Tony Pino's Melbourne Housing Dataset. Do the following data preprocessing and apply KNN and RandomForest algorithms to classify the property prices.

1. Fill the missing values in the dataset using imputation approaches as we talked in class. You can use the scikit-learn's module

from sklearn.impute import SimpleImputer

my_imputer = SimpleImputer()

data_with_imputed_values = my_imputer.fit_transform(original_data)

The default imputer use mean values to fill the missing values. You can try other imputation method as well.

2. Replace the categorical/nominal attributes with one-hot-encoding.

You can use Category Encoders package for use with scikit-learn in Python.

Read this blog for more approaches for data encoding - Smarter Ways to Encode Categorical Data for Machine Learning.

3. Install Weka system on your computer

Sort all the property samples by the property prices and divide the samples equally into 5 categories/classes: Top value, High value, medium value, low value, bottom value.

Apply the KNN algorithm of Weka with K=5 to 10 to classify the property instances into 5 classes. Calculate the accuracy for each K values.

Apply RandomForest algorithm of Weka and report the performance.

You need to split the whole dataset into training (66% samples) and testing datasets (34% samples). Do the random splitting 10 times to calculate the average accuracy.

from sklearn.model_selection import train_test_split

xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size = 0.2, random_state = 0)

 

K=5

K=6

K=7

K=8

K=9

K=10

KNN

Average accuracy

...

 

 

 

 

RandomForest

Average accuracy

 

 

 

 

 

Write report to discuss the performances of KNN and randomforest. You are encouraged to compare the performance of different missing value imputation methods or the categorical encoding methods.

Attachment:- Data Mining & Warehousing Assignment Files.rar

Reference no: EM132400011

Questions Cloud

CSCE822 Data mining Homework : CSCE822 Data mining Homework - Deep learning application for microscopy image classification. Download sample code, and run the code, and report your training
Whistle-blowing-motivation-decentralization-group norms : Pick one of the following terms for your research: Whistle-blowing, motivation, decentralization, group norms, or needs.
Reports produced by council saudi chambers : According to recent reports produced by the Council of Saudi Chambers, healthcare turnover is on the rise within the Kingdom of Saudi Arabia
PMBA6020 Accounting for Decision Making and Control : PMBA6020 Accounting for Decision Making and Control Assignment Help and Solution, Nanyang Business School, Singapore- Assessment Writing Service
CSCE 822 - Data Mining and Warehousing Assignment : CSCE 822 - Data Mining and Warehousing Assignment Help and Solution - University of South Carolina, USA. Fill the missing values in the dataset
Problem - Regression using SVR or Random Forest : Problem 2: Regression using SVR (Support vector regression) or Random Forest. Develop a regression model that can beat a theory model
Explain what might cause process to be out of control : What are some patterns that would indicate that the process is out of control? Additionally explain what might cause a process to be out of control
Organization values support the practice mission and vision : Values/Mission/Vision: How can you ensure that the organization's values support the practice's mission and vision?
GB601-about areas of success-opportunities for improvement : GB601- How did the numbers provide information to you as a base about areas of success, opportunities for improvement?

Reviews

Write a Review

Other Subject Questions & Answers

  Brief discussion of the results of their leadership

A detailed description of how the theory applies to this individual's leadership approach (e.g., if choosing transformational leadership, you should analyze and report on idealized influence, individualized consideration, etc.)

  Define quality in terms of product

Define quality in terms of what it means for a product to have high quality. Define quality in terms of what a customer's perception of a product might be.

  Develop a hazard assessment for your workplace

Using Subpart I Appendix B as a guide, develop a hazard assessment for your workplace or a workplace you are familiar with.

  HI300 Information Technology and Systems for Health Care

HI300 Information Technology and Systems for Health Care assignment help and solutions, Purdue University Global, assessment help - Develop an action plan.

  Sociological view of deviance differ from commonsense notion

How does a sociological view of deviance differ from the "commonsense" notion that bad people do bad things? Give an example to illustrate your answer.

  Identify some trait or feature of your personality

"Identify some trait or feature of your personality that you would like to improve or develop in the future. Summarize the information contained in each of the research resources you investigated and explain why you found the information in the resou..

  Design an advertisement for a business

Section ADVERTISING - Design an advertisement for a business (selling toys) which involves some graphic content and words

  Discuss how a nurses previous abusive relationships affect

Discuss how a nurse's previous abusive relationships might negatively or positively affect his or her practice with victims of domestic violence. Support your answer with literature.

  How crime rates are calculated

How are the statistical variables such as mean, mode, and median utilized in analyzing criminal data? Explain how crime rates are calculated and utilized to address specific issues or problems with the data sets

  Discuss what is meant by destination management

German University of technology in Oman - Discuss what is meant by destination management, what areas should be covered by the DMO/NTO and what are the possible

  What are risks or challenges with disclosure in a situation

What are the risks or challenges with disclosure in a situation where there may be a reason to breach confidentiality (such as potential harm to the client).

  Most economical route over wich to run the cable

The cost of running the cable under the water is $25 per meter, while the cost of running the cable over land is $20 per meter. What is the most economical route over wich to run the cable?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd