Describe how hierarchical clustering methods works

Assignment Help Advanced Statistics
Reference no: EM132528571

Assignment 1 - Fundamentals of Data Mining

Q1. Describe how hierarchical clustering methods works.

Q2. Produce a hierarchical clustering (COBWEB) model for Iris data. How many clusters did it produce? Why? Did you expect that outcome? Describe your reasoning.

Change to use the classes to cluster evaluation. What can you conclude from it?

Use the acuity and cutoff parameters in order to produce a model that clusters major Iris types together. What values of parameters worked the best? Examine your findings/understanding of the produced results.

Q3. Describe how EM clustering methods works.

Q4. Use the EM clustering method on either the basketball or the cloud data set. How many clusters did the algorithm decide to make? Describe the model produced.

How does it compare to the COBWEB results?

If you change from "Use Training set" to "Percentage evaluation split - 66% train and 33% test" - how does the evaluation change? Discuss your findings/understanding of the produced results with respect to the specific dataset.

Q5. Describe the Association learning method. How are the frequent item sets created in an efficient way?

Q6. Use the Association rule learner APRIORI method to find the association rule in the Weather.nominal data set. How many rules did it produce? How large are the item sets? What was the largest one? What happened when you increased/decreased the confidence level? What about the number of rules? What happens when you increase the confidence parameter to 2? Why?

Q7. Use the Association rule learner APRIORI method to find the association rule on the supermarket data set. What is the size of the largest item set? What was the highest confidence level produced? How many rules with that confidence? Any interesting rules you found?

Assignment 2 - Fundamental of Data Mining

1. Use the Decision tree method (Classify Tab, "trees" folder, J48) to analyze the iris data (iris.arff can be found in Weka's Data folder or in Blackboard under Resources):

Give a brief description of the Decision Tree model

Discuss what you learned about the Iris dataset from the J48 classifier.

How did Decision tree method perform? (We will cover the evaluation techniques in more details later in the class. You can choose any of the available options for not. However, please specify what option you chose: training data set, cross-validation or % split was used).

How did Decision tree method provide you with the insight into your data set/rules/patterns and why?

2. Data preparation is an essential step in data mining. How the training data set is presented to a method can drastically affect the produced model's performance. Use the J48 Decision tree-learning scheme to analyze weather.numeric.arff and weather.nominal.arff (the data sets come with the Weka installation in Weka/data folder) data set. Make predictions for the 'temperature' attribute for both data sets.

Try to use J48 on weather.numeric.arff with no modifications to the dataset. Did you get an error? The method only performs on nominal class data - use the DiscretizeFilter (Unsupervised-Attribute- Discretize) filter, in the preprocess tab, before applying the learning method. Be sure to note how you discretized the dataset and take a moment to consider why you made the choice? Did you discretize all the attributes? How many bins did you discretize each attribute into?

Analyze the output of the model that learned the discretized attribute 'temperature'? What was the performance, can you improve it? What did the model tell you about the data? (Hint: you can modify the number of bins in the discretize filter in an attempt to improve the model performance or mimic the nominal dataset)

Analyze the output of the model that learned the nominal attribute 'temperature'? What was the performance, can you improve it? What did the model tell you about the data? How do the results differ from the model produced on the discretized version of the same attribute?

3. Use the J48 Decision tree learning scheme to analyze the bolts data (bolts.arff without the TIME attribute). The dataset describes the time needed by a machine to produce and count 20 bolts. (More details can be found in the file containing the dataset, you can open the file using a file editor to read the comments)

Why should you ignore the TIME attribute?

Analyze the model produced. What adjustments (if you were to make any) would have the greatest effect on the time to count 20 bolts (attribute: T20Bolt) (i.e. what is the most important/selective attribute/value pair in the tree)?

According to the classifier, how would you adjust the machine (the other attributes) to get the shortest time to count 20 bolts?

Need Assignment 1 and the ONLY the question number 2 in assignment 2.

Attachment:- Fundamentals of Data Mining Assignment Files.rar

Reference no: EM132528571

Questions Cloud

What investment will recommend to classmates : What investment will you recommend to your classmates? Give a brief explanation of your recommendation to motivate them to buy the said investment.
Develop a database prototype assignment : Develop a Database Prototype Assignment Help and Solution - Create a query that displays the list of clubs that provide kids playroom as one of their facilities
Which statement is true for keco company : Keco actually used 1.9 pounds per unit of product. The actual cost of this material was $6.25 per pound. Based on information alone which statement is true?
What the unadjusted cost of goods sold for may is : The unadjusted cost of goods sold (the cost of goods sold BEFORE adjustment for any underapplied or overapplied overhead) for May is closest to
Describe how hierarchical clustering methods works : Describe how hierarchical clustering methods works. Describe the Association learning method. How are the frequent item sets created in an efficient way
Company applies the profitability index decision rule : The Bosa Corporation is trying to choose between the following two mutually exclusive design projects:
How do exactly breakdown cost into variable and fixed cost : How do we exactly breakdown cost into variable and fixed cost? What are the risks and ramifications of the breakdown between fixed and variable cost
How will the calculation of the break-even point change : How will the calculation of the break-even point change (if at all) if the relative percentages of the products in the mix change from 60% to 40%?
Explain why we can consider entrenchment index : Explain why we can consider Entrenchment index as a proxy for corporate goverance.

Reviews

Write a Review

Advanced Statistics Questions & Answers

  Relationship between speed, flow and geometry

Write a project proposal on relationship between speed, flow and geometry on single carriageway roads.

  Logistic regression model

Compute the log-odds ratio for each group in Logistic regression model.

  Logistic regression

Foundations of Logistic Regression

  Probability and statistics

The tubes produced by a machine are defective. If six tubes are inspected at random , determine the probability that.

  Solve the linear model

o This is a linear model. If your model needs a different engine, then you need to rethink your approach to the model. Remember, there are no IF, Max, or MIN statements in linear models.

  Plan the analysis

Plan the analysis

  Quantitative analysis

State the hypotheses that you are going to test.

  Modelise as a markov chain

modelise as a markov chain

  Correlation and regression

What are the degrees of freedom for regression

  Construct a frequency distribution for payment method

Construct a frequency distribution for Payment method

  Perform simple linear regression

Perform simple linear regression

  Quality control analysis

Determining the root causes

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd