Reference no: EM132400011
CSCE 822 - Data Mining & Warehousing Assignment - University of South Carolina, USA
Attached melb_data.csv file is the Snapshot of Tony Pino's Melbourne Housing Dataset. Do the following data preprocessing and apply KNN and RandomForest algorithms to classify the property prices.
1. Fill the missing values in the dataset using imputation approaches as we talked in class. You can use the scikit-learn's module
from sklearn.impute import SimpleImputer
my_imputer = SimpleImputer()
data_with_imputed_values = my_imputer.fit_transform(original_data)
The default imputer use mean values to fill the missing values. You can try other imputation method as well.
2. Replace the categorical/nominal attributes with one-hot-encoding.
You can use Category Encoders package for use with scikit-learn in Python.
Read this blog for more approaches for data encoding - Smarter Ways to Encode Categorical Data for Machine Learning.
3. Install Weka system on your computer
Sort all the property samples by the property prices and divide the samples equally into 5 categories/classes: Top value, High value, medium value, low value, bottom value.
Apply the KNN algorithm of Weka with K=5 to 10 to classify the property instances into 5 classes. Calculate the accuracy for each K values.
Apply RandomForest algorithm of Weka and report the performance.
You need to split the whole dataset into training (66% samples) and testing datasets (34% samples). Do the random splitting 10 times to calculate the average accuracy.
from sklearn.model_selection import train_test_split
xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size = 0.2, random_state = 0)
|
K=5
|
K=6
|
K=7
|
K=8
|
K=9
|
K=10
|
KNN
|
Average accuracy
|
...
|
|
|
|
|
RandomForest
|
Average accuracy
|
|
|
|
|
|
Write report to discuss the performances of KNN and randomforest. You are encouraged to compare the performance of different missing value imputation methods or the categorical encoding methods.
Attachment:- Data Mining & Warehousing Assignment Files.rar
CSCE822 Data mining Homework
: CSCE822 Data mining Homework - Deep learning application for microscopy image classification. Download sample code, and run the code, and report your training
|
Whistle-blowing-motivation-decentralization-group norms
: Pick one of the following terms for your research: Whistle-blowing, motivation, decentralization, group norms, or needs.
|
Reports produced by council saudi chambers
: According to recent reports produced by the Council of Saudi Chambers, healthcare turnover is on the rise within the Kingdom of Saudi Arabia
|
PMBA6020 Accounting for Decision Making and Control
: PMBA6020 Accounting for Decision Making and Control Assignment Help and Solution, Nanyang Business School, Singapore- Assessment Writing Service
|
CSCE 822 - Data Mining and Warehousing Assignment
: CSCE 822 - Data Mining and Warehousing Assignment Help and Solution - University of South Carolina, USA. Fill the missing values in the dataset
|
Problem - Regression using SVR or Random Forest
: Problem 2: Regression using SVR (Support vector regression) or Random Forest. Develop a regression model that can beat a theory model
|
Explain what might cause process to be out of control
: What are some patterns that would indicate that the process is out of control? Additionally explain what might cause a process to be out of control
|
Organization values support the practice mission and vision
: Values/Mission/Vision: How can you ensure that the organization's values support the practice's mission and vision?
|
GB601-about areas of success-opportunities for improvement
: GB601- How did the numbers provide information to you as a base about areas of success, opportunities for improvement?
|