NIT6160 Data Warehousing and Mining Assignment

Assignment Help Computer Engineering
Reference no: EM132383370

NIT6160 Assignment 2: Data Warehousing and Mining

Victoria University

Assignment 2

The groceries Dataset

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer's basket. That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item.

Task 1: Data Pre-processing
Read the data in R. There are many ways to read in csv tables in R. For more details, please refer to data import/export in R

For the clustering experiments, the column for class labels need to be removed. Refer to lecture Module 10 to see how to do so.

Verify if any other pre-processing is beneficial for the analysis. For example, replacing missing values, attribute range normalization, converting numerical or string to nominal values etc.

Task 2: Data Mining
• Association Rule Mining experiments: Using R to explorer "association rules" on the groceries dataset. Try out different algorithms. Visualize the result you found. Report any interesting association rules discovered in the experiments and explain why they are interesting.
• Classification experiments: Using to construct classifiers on the mushroom dataset. Randomly split the data set in the training and test data set (80% v.s. 20%). Select at least one classifier from each of the following two categories of classifiers: Tree-based models, Bayes classifiers, and Rule-based classifiers. Compare the result of the chosen classifiers.
• Clustering experiments: Using R explorer clusters on the mushroom dataset. Select and compare two clustering algorithms from R (e.g. k-means v.s. density-based). Use R to visually explore the resulting clusters.
• For all the above experimentations, try different parameter settings to fine tune the outcome. In principle select methods that work well on the given data set.

Task 3: Prepare a report
Your report should contain the following:
• Theoretical Discussion: Limited to two pages discussing about data preprocessing steps, the motivation for selecting a particular method, and how the parameters are chosen.
• Results: Include results and screenshots of the above experimentations.
• Discussion and error analysis: Try to interpret the results of your model. Discuss intuitions or hypothesis that can be obtained by visual inspections of the resulting classes or clusters. Mention about assumptions if any, discuss issues that might have affected the model's performance.
• References: If you are using information from other sources apart from R manual and official website, you should cite them.

Attachment:- Data Warehousing and Mining.rar

Reference no: EM132383370

Questions Cloud

What is the activation energy of a chemical reaction : 1. What is the activation energy of a chemical reaction, and how does an enzyme allow this obstacle to be overcome much more easily?
Develop a cost-centered business strategy for the airline : Analyze the costs and, using specifics, develop a cost-centered business strategy for the airline to move forward as a profitable firm
Find a publicly traded manufacturing company : Go online and search for information about companies that have been harmed or bankrupted by a disaster. Choose one such company and create a brief case study.
What membrane properties are affected by cholesterol : What membrane properties are affected by cholesterol, and how are they affected?
NIT6160 Data Warehousing and Mining Assignment : NIT6160 Data Warehousing and Mining Assignment help and solution, Victoria University, Assessment help - discussing about data preprocessing steps
Large scale of conversion to biofuel : 1. What challenges do you think would arise from a large scale of conversion to biofuel?
ENS3245 Steel Design Project Assignment : ENS3245 Steel Design Project Assignment Help and Solution - Edith Cowan University, Australia - Applicable loadings and critical combinations
Compare and contrast the process : Compare and contrast the process, products, and locations of male and female gametogenesis in mammals
Use of chemical substances to treat disease : Chemotherapy is the use of chemical substances to treat disease.

Reviews

Write a Review

Computer Engineering Questions & Answers

  Program requests the name of food and number of calories

The council suggest that at most 30% of the calories in our diet come from fat. however food labels give the number of calories and amount of fat per serving, they often do not give percentage of calories from fat.

  Compute the surface area and volume of a sphere

Write a main program that uses these functions to compute the surface area and volume of a sphere given the radius.

  What is normalization process and the different normal forms

What are the steps for designing a relational database from a domain class model? What is the normalization process and the different normal forms? Why is normalization important?

  A function that removes all occurrences of the integer

A function (myRemove num list) that removes all occurrences of the integer num from a simple list of integers, returning list with num removed.

  Prompt the user for two services from the menu

Prompt the user for two services from the menu. Output an invoice for the services selected. Output the cost for each service and the total cost.

  Draw arrows in the cells to store traceback information

Draw arrows in the cells to store traceback information. What is the score of the optimal global alignment and what alignment(s) achieves this score?

  Rules of inference to show that the hypotheses

Use rules of inference to show that the hypotheses "If it does not rain or if it is not foggy, then the sailing race will be held and the lifesaving demonstration will go on,"

  What is the purpose of a document management system

Software introduce ability do you have a document management system to manage business processes.

  Describe a sequence of accesses to an n-node splay tree t

Describe a sequence of accesses to an n-node splay tree T, where n is odd, that results in T consisting of a single chain of internal nodes with external node.

  What is the difference between serial and parallel transfer

What is the difference between a serial and parallel transfer? Explain how to convert serial data to parallel and parallel data to serial. What type of register is needed?

  Explain properties of continuous-time unit-impulse signal

In each of the following cases simplify the expression as much as possible using the properties of the continuous-time unit-impulse signal.

  How the enigma machine has changed the world of security

In this essay, you will explain how the Enigma machine has changed the world of security to this day. You will provide a timeline of the major milestones.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd