Discussing about data preprocessing steps

Assignment Help Database Management System
Reference no: EM131079516

Assignment Project: Data Mining using R

The goal of this project is to applying association rule mining, classification and clustering methods on the Mushroom and groceriesdata sets. For detailed information about the mush room data set, refer to the Machnie Learning Repository (https://archive.ics.uci.edu/ml/datasets.html) provided by the University of California, Irvine. You can download and read more about the data there.

The groceries Dataset

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer's basket. That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item.

Task 1: Data Pre-processing

Read the data in R. There are many ways to read in csv tables in R. For more details, please refer to data import/export in R
https://cran.r-project.org/doc/manuals/r-release/R-data.pdf

For the clustering experiments, the column for class labels need to be removed. Refer to lecture Module 10 to see how to do so.
Verify if any other pre-processing is beneficial for the analysis. For example, replacing missing values, attribute range normalization, converting numerical or string to nominal values etc.

Task 2: Data Mining

• Association Rule Mining experiments: Using R to explorer "association rules" on the groceries dataset.Try out different algorithms. Visualize the result you found. Report any interesting association rules discovered in the experiments and explain why they are interesting.

• Classification experiments: Using to construct classifiers on the mushroom dataset. Randomly split the data set in the training and test data set (80% v.s. 20%). Select at least one classifier from each of the following two categories of classifiers: Tree-based models, Bayes classifiers, and Rule-based classifiers. Compare the result of the chosen classifers.

• Clustering experiments: Using R explorer clusters on the mushroom dataset.Select and compare two clustering algorithms from R (e.g. k-means v.s. density-based). Use R to visually explore the resulting clusters.

• For all the above experimentations, try different parameter settings to fine tune the outcome. In principle select methods that work well on the given data set.

Task 3: Prepare a report

Your report should contain the following:

• Theoretical Discussion: Limited to two pages discussing about data preprocessing steps, the motivation for selecting a particular method, and how the parameters are chosen.

• Results: Include results and screenshots of the above experimentations.

• Discussion and error analysis: Try to interpret the results of your model. Discuss intuitions or hypothesis that can be obtained by visual inspections of the resulting classes or clusters. Mention about assumptions if any, discuss issues that might have affected the model's performance.

• References: If you are using information from other sources apart from R manual and official website, you should cite them.

Attachment:- Assignment.rar

Reference no: EM131079516

Questions Cloud

What is the current market price of these bonds : Jackson Corporation's bonds have 5 years remaining to maturity. Interest is paid annually, the bonds have a $1,000 par value, and the coupon interest rate is 11%. The bonds have a yield to maturity of 8%. What is the current market price of these bon..
The internal rate of return : The internal rate of return
Debt versus equity financing : You are considering a stock investment in one of two firms (AllDebt, Inc., and AllEquity, Inc.), both of which operate in the same industry and have identical operating income of $7.00 million. Calculate the income available to pay the asset funders ..
Production function in range : In a small European country, it is estimated that a $10,000 increase in capital per hour worked will increase real GDP per hour worked by $300. Based on this information, what is the slope of the per-worker production function in this range?
Discussing about data preprocessing steps : NIT6160 Assignment Project: Data Mining using R. Your report should contain the following: Theoretical Discussion: Limited to two pages discussing about data preprocessing steps, the motivation for selecting a particular method, and how the paramet..
Changing the level of capital : In a small European country, it is estimated that changing the level of capital from $8 million to $10 million will increase real GDP from $2 million to $3 million.
Evaluate three capital projects : Your company just informed you that they have a cost of capital of 14 percent and request that you evaluate three capital projects. The internal rates of return are as follows: Project Internal Rate of Return 1 12% 2 15% 3 13% Your recommendation is ..
Express f(t) as a fourier series expansion : Express f(t) as a Fourier series expansion.
The types of compensation top-level managers receive : Write a maximum 2 page MLA paper that will discuss the types of compensation top-level managers receive and their effects on managerial decisions.

Reviews

Write a Review

Database Management System Questions & Answers

  What type of entertainment events are the most popular

What type of entertainment events are the most popular? Has this always been the case or has there been a shift in recent years - Which area of venues used for organizing events is most popular in past 10 years?

  Describes a suitable methodology from the literature

Describes a suitable methodology from the literature for the purpose of designing, constructing and testing of a commercial BIDW for a multinational company.

  Developing a use case diagram

Developing activity diagrams. (Please note that for the use case, "Buy Items", there are two scenarios, namely, "General public customer buys items" and "Contractor customer buys items". You should develop an activity diagram for each of the..

  What are the different types of join operations

What is meant by Proactive, Retroactive and Simultaneous Update. What are the different types of JOIN operations?

  Find all of the strong association rules

Find all of the strong association rules. Provide support, c onfidence, and lift for all the rule. Provide the reasons why the rules you selected are interesting.

  Create 3 rows of data for each table ensuring

Write SELECT statements for the following (include a screenshot of the SQL and its execution, including the resulting data).

  How to perform document classification using tools in weka

How to perform document classification using tools in WEKA

  Explain the purpose of an info cube and dimensions

Explain the purpose of an "Info Cube" and "Dimensions"? How they are related to each other? Explain the process of building an "Info Cube".

  Planning & implementing a data warehouse project

ABC Industries is a diversified global organization that provides a variety of services, including financial and technical, and manufactures its own numerous products. Its manufacturing base is spread across the globe. ABC's production facilities ..

  Create a view named customeraddresses

Write a SELECT statement that returns these columns from the CustomerAddresses view that you created in exercise 1: CustomerID, LastName, FirstName, BillLine1.

  1 what is meant by data independence explain your answer2

1. what is meant by data independence? explain your answer.2. identify two benefits of separating application software

  Open source relational database management system

Imagine you are a part of a team that is tasked with writing a mobile application (app) that will allow users to send pictures to their friends. The manager does not want to waste time creating code modules.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd