Prepare a report - Discussion and error analysis

Assignment Help Database Management System
Reference no: EM132319155

Data Warehousing and Mining Assignment -

The goal of this project is to applying association rule mining, classification and clustering methods on the Mushroom and groceries data sets. For detailed information about the mush room data set, refer to the Machine Learning Repository provided by the University of California, Irvine. You can download and read more about the data there.

The groceries Dataset - Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer's basket. That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item.

Task 1: Data Pre-processing

Read the data in R. There are many ways to read in csv tables in R. For more details, please refer to data import/export in R

For the clustering experiments, the column for class labels need to be removed. Refer to lecture Module 10 to see how to do so.

Verify if any other pre-processing is beneficial for the analysis. For example, replacing missing values, attribute range normalization, converting numerical or string to nominal values etc.

Task 2: Data Mining

  • Association Rule Mining experiments: Using R to explorer "association rules" on the groceries dataset. Try out different algorithms. Visualize the result you found. Report any interesting association rules discovered in the experiments and explain why they are interesting.
  • Classification experiments: Using to construct classifiers on the mushroom dataset. Randomly split the data set in the training and test data set (80% v.s. 20%). Select at least one classifier from each of the following two categories of classifiers: Tree-based models, Bayes classifiers, and Rule-based classifiers. Compare the result of the chosen classifiers.
  • Clustering experiments: Using R explorer clusters on the mushroom dataset. Select and compare two clustering algorithms from R (e.g. k-means v.s. density-based). Use R to visually explore the resulting clusters.
  • For all the above experimentations, try different parameter settings to fine tune the outcome. In principle select methods that work well on the given data set.

Task 3: Prepare a report

Your report should contain the following:

  • Theoretical Discussion: Limited to two pages discussing about data preprocessing steps, the motivation for selecting a particular method, and how the parameters are chosen.
  • Results: Include results and screenshots of the above experimentations.
  • Discussion and error analysis: Try to interpret the results of your model. Discuss intuitions or hypothesis that can be obtained by visual inspections of the resulting classes or clusters. Mention about assumptions if any, discuss issues that might have affected the model's performance.
  • References: If you are using information from other sources apart from R manual and official website, you should cite them.

Attachment:- Assignment File.rar

Reference no: EM132319155

Questions Cloud

Concept of intrepreneurship : Using the concept of Intrepreneurship, what do you learn about Yahoo! that is important to it achieving strategic success?
Communicate technical issues to a non technical audience : COIS51030 - Professional Computing - Gulf College of Oman - Understand and evaluate professional, legal and ethical issues relevant to Computing and communicate
Examine the company strategic philanthropy : ETH301-Research and describe the six rights of the consumer. Examine the company's strategic philanthropy and how it impacts profits, brand, image and turnover,
Briefly define the general standard of cans : List three points indicating that the situation above violates the relevant professional and ethical requirements relating to audit Engagements.
Prepare a report - Discussion and error analysis : NIT6160 Data Warehousing and Mining Assignment, Victoria University, Australia. Prepare a report - Discussion and error analysis
Constructive eviction and implied warranty of habitability : Constructive Eviction and Implied Warranty of Habitability. What causes of action does Steve have? What remedies does he have for the faulty heater?
What is petes return on the initial investment for this year : On January 1, Pete Rowe bought a ski chalet for $57,500. Pete is renting the chalet for $68 per night. He estimates he can rent the chalet for 180 nights.
Estimate what she may have to pay for property taxes : Ginny Fieg expanded her beauty salon by increasing her space by 18%. Ginny paid property taxes of $3,600 at 22 mills. The new rate is now 24 mills.
Compare and contrast theories within the three groups : Compare and contrast theories within the three groups. One of the best ways for comparisons is to create a table to compare the main points of each article.

Reviews

len2319155

6/10/2019 12:12:46 AM

This project is worth 20% of the total assessment of this unit and is due on Friday week 12 5 PM. Submission Instructions - This section is intended for submission instructions in learning systems. Grading - Theoretical discussion and data-preprocessing 5%, Results 10%, Error analysis & references 5% and Total 20%. Please there must be no similarity and coding must be clear.

Write a Review

Database Management System Questions & Answers

  Designing and documenting your system

Create a document named: surnameStudentIDAssign1.doc .  In your document, you must include the following section headings:  System Overview, Class Diagram, Class Descriptions, Testing.

  Write an sql statement to display data for all of the column

Write an SQL statement to display data for all of the columns. Provide a screenshot of the result as well.

  Physical schema created for a database

At this point in the design process, you would have a detailed physical schema created for a database. How would you approach converting your schema into a database, complete with sample data? Explain your approach in two to three paragraphs.

  Recognize input data required for each of processes

Recognize input data required for each of processes. Recognize logical name for each data output item and kind of data output (real number, integer, text).

  Systematic understanding of the current trends

Display a detailed knowledge and systematic understanding of the current trends in data warehousing, business intelligence and data mining.

  How the normalized tables resulted from the prospect table

Suggest an alternative to the normalized tables given and provide an explanation for your alternative. Consider how the data will be used.

  Create script file to list file in directory in sorted order

Create a script file to list the files in a directory in sorted order, showingonlythe file name, file size, and modification date.Each file's attributes must be on one line per file, in any order.

  Develop and implement suitable integrity constraints

COIT20247 - Database Design and Development - cq university - For that selected table, develop and implement suitable integrity constraints for its attributes

  Write the relational schema

Write the relational schema, draw its dependency diagram, and identify all dependencies, including all partial and transitive dependencies.

  Identify three database requirements that must be addressed

Identify the three database requirements that must be addressed during database design. Analyze why these requirements often conflict with each other.

  How to set title criteria for more than one title position

I am having a problem with criteria range. I do not know how to set two different criterias in one column. I need to ADVANCE FILTER all of the Clerks (1 and 2) and the Sect. 1 workers who make more than $5.50/hr

  Remove any duplicates that exist in the inventory worksheet

EWS09 H1- Remove any duplicates that exist in the Inventory worksheet. Duplicate records are any records with the same InventoryCode and ItemNumber.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd