Describe how hierarchical clustering methods works

Assignment Help Advanced Statistics
Reference no: EM132528571

Assignment 1 - Fundamentals of Data Mining

Q1. Describe how hierarchical clustering methods works.

Q2. Produce a hierarchical clustering (COBWEB) model for Iris data. How many clusters did it produce? Why? Did you expect that outcome? Describe your reasoning.

Change to use the classes to cluster evaluation. What can you conclude from it?

Use the acuity and cutoff parameters in order to produce a model that clusters major Iris types together. What values of parameters worked the best? Examine your findings/understanding of the produced results.

Q3. Describe how EM clustering methods works.

Q4. Use the EM clustering method on either the basketball or the cloud data set. How many clusters did the algorithm decide to make? Describe the model produced.

How does it compare to the COBWEB results?

If you change from "Use Training set" to "Percentage evaluation split - 66% train and 33% test" - how does the evaluation change? Discuss your findings/understanding of the produced results with respect to the specific dataset.

Q5. Describe the Association learning method. How are the frequent item sets created in an efficient way?

Q6. Use the Association rule learner APRIORI method to find the association rule in the Weather.nominal data set. How many rules did it produce? How large are the item sets? What was the largest one? What happened when you increased/decreased the confidence level? What about the number of rules? What happens when you increase the confidence parameter to 2? Why?

Q7. Use the Association rule learner APRIORI method to find the association rule on the supermarket data set. What is the size of the largest item set? What was the highest confidence level produced? How many rules with that confidence? Any interesting rules you found?

Assignment 2 - Fundamental of Data Mining

1. Use the Decision tree method (Classify Tab, "trees" folder, J48) to analyze the iris data (iris.arff can be found in Weka's Data folder or in Blackboard under Resources):

Give a brief description of the Decision Tree model

Discuss what you learned about the Iris dataset from the J48 classifier.

How did Decision tree method perform? (We will cover the evaluation techniques in more details later in the class. You can choose any of the available options for not. However, please specify what option you chose: training data set, cross-validation or % split was used).

How did Decision tree method provide you with the insight into your data set/rules/patterns and why?

2. Data preparation is an essential step in data mining. How the training data set is presented to a method can drastically affect the produced model's performance. Use the J48 Decision tree-learning scheme to analyze weather.numeric.arff and weather.nominal.arff (the data sets come with the Weka installation in Weka/data folder) data set. Make predictions for the 'temperature' attribute for both data sets.

Try to use J48 on weather.numeric.arff with no modifications to the dataset. Did you get an error? The method only performs on nominal class data - use the DiscretizeFilter (Unsupervised-Attribute- Discretize) filter, in the preprocess tab, before applying the learning method. Be sure to note how you discretized the dataset and take a moment to consider why you made the choice? Did you discretize all the attributes? How many bins did you discretize each attribute into?

Analyze the output of the model that learned the discretized attribute 'temperature'? What was the performance, can you improve it? What did the model tell you about the data? (Hint: you can modify the number of bins in the discretize filter in an attempt to improve the model performance or mimic the nominal dataset)

Analyze the output of the model that learned the nominal attribute 'temperature'? What was the performance, can you improve it? What did the model tell you about the data? How do the results differ from the model produced on the discretized version of the same attribute?

3. Use the J48 Decision tree learning scheme to analyze the bolts data (bolts.arff without the TIME attribute). The dataset describes the time needed by a machine to produce and count 20 bolts. (More details can be found in the file containing the dataset, you can open the file using a file editor to read the comments)

Why should you ignore the TIME attribute?

Analyze the model produced. What adjustments (if you were to make any) would have the greatest effect on the time to count 20 bolts (attribute: T20Bolt) (i.e. what is the most important/selective attribute/value pair in the tree)?

According to the classifier, how would you adjust the machine (the other attributes) to get the shortest time to count 20 bolts?

Need Assignment 1 and the ONLY the question number 2 in assignment 2.

Attachment:- Fundamentals of Data Mining Assignment Files.rar

Reference no: EM132528571

Questions Cloud

What investment will recommend to classmates : What investment will you recommend to your classmates? Give a brief explanation of your recommendation to motivate them to buy the said investment.
Develop a database prototype assignment : Develop a Database Prototype Assignment Help and Solution - Create a query that displays the list of clubs that provide kids playroom as one of their facilities
Which statement is true for keco company : Keco actually used 1.9 pounds per unit of product. The actual cost of this material was $6.25 per pound. Based on information alone which statement is true?
What the unadjusted cost of goods sold for may is : The unadjusted cost of goods sold (the cost of goods sold BEFORE adjustment for any underapplied or overapplied overhead) for May is closest to
Describe how hierarchical clustering methods works : Describe how hierarchical clustering methods works. Describe the Association learning method. How are the frequent item sets created in an efficient way
Company applies the profitability index decision rule : The Bosa Corporation is trying to choose between the following two mutually exclusive design projects:
How do exactly breakdown cost into variable and fixed cost : How do we exactly breakdown cost into variable and fixed cost? What are the risks and ramifications of the breakdown between fixed and variable cost
How will the calculation of the break-even point change : How will the calculation of the break-even point change (if at all) if the relative percentages of the products in the mix change from 60% to 40%?
Explain why we can consider entrenchment index : Explain why we can consider Entrenchment index as a proxy for corporate goverance.

Reviews

Write a Review

Advanced Statistics Questions & Answers

  Investment in associate-equity method

On January 1, 2008, Jonsey Corporation purchased 30% of the common stock outstanding of Karsen Corporation for $200,000. During 2008, Karsen Corporation reported net income of $80,000 and paid cash dividends of $40,000.

  Probability of the patient having lupus

What is the probability new probability of the patient having lupus - discuss what properties might make a good screening test and what might make a good confirmatory test and why.

  Define generally accepted accounting principles

Explain what is meant by Generally Accepted Accounting Principles (GAAP). What is the FASB, and what does this organization do?

  Write r code in order to compute an approximate matrix

Write R code in order to compute an "approximate" correlation matrix from this data using pairwise deletion method.

  Detailed explanation of break even

Jason processes and bottles jams. His fixed costs are 250 per month and the variable cost per jar is 1.20. He sells the jam to local grocery stores for 3.20 each. How man jars must he sell per year to break even and what will be his profit if he s..

  Calculate the measures of central tendency

Display the data set in a chart, Explain briefly why that chart type was selected and Calculate the measures of central tendency and variability (mean, median, mode, standard deviation) for the data

  What is the null and the alternative hypotheses

What is the Null and the Alternative Hypotheses - what is your level of alpha and what is your conclusion - Has there been a significant reduction in the national unemployment rate between January and June

  Explain why the authors believe such a change is necessary

PSY1020M - ADVANCED STATISTICS - Explain why the authors believe such a change is necessary at this point in time, and give several arguments for and against

  What is the rmse of each of the two models

Calculate the mean, median, mode and standard deviation for three variables of your choice contained in the data set - What is its Adjusted R-Squared?

  What is the approximate probability

What is the approximate probability (represented in percent) that you will obtain two 6's on a single toss?

  Calculate the measures of location and dispersion of cgpa

Calculate the measures of location and dispersion of CGPA, age and work experience for all backgrounds and specializations. Combine these measures, wherever possible, for all the backgrounds and specializations separately.

  Project on Stock Prediction using sentimental analysis

Assignment - Project on Stock Prediction using sentimental analysis (Using R or Python Programming Language). List the outcome of data exploration

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd