Reference no: EM131063551
Problem 1: Download the letter recognition data from: https://archive.ics.uci.edu/ml/datasets/Letter+Recognition
The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15. Below is the attribute information, but more information on the data and how it was used for data mining research can be found in the paper:
P. W. Frey and D. J. Slate. "Letter Recognition Using Holland-style Adaptive Classifiers". (Machine Learning Vol 6 #2 March 91)
Attribute Information:
1. lettr capital letter (26 values from A to Z)
2. x-box horizontal position of box (integer)
3. y-box vertical position of box (integer)
4. width width of box (integer)
5. high height of box (integer)
6. onpix total # on pixels (integer)
7. x-bar mean x of on pixels in box (integer)
8. y-bar mean y of on pixels in box (integer)
9. x2bar mean x variance (integer)
10. y2bar mean y variance (integer)
11. xybar mean x y correlation (integer)
12. x2ybr mean of x * x * y (integer)
13. xy2br mean of x * y * y (integer)
14. x-ege mean edge count left to right (integer)
15. xegvy correlation of x-ege with y (integer)
16. y-ege mean edge count bottom to top (integer)
17. yegvx correlation of y-ege with x (integer)
Create a classification model for letter recognition using decision trees as a classification method with a holdout partitioning technique for splitting the data into training versus testing.
a. Changing the values for the depth, number of cases per parent and number of cases per leaf produces different tree configurations with different accuracies for training and testing. Choose at least five different configurations and report the accuracy for training and testing for each one of them. Which configuration will you choose as the best model? Explain your answer.
b. For the best tree configuration, report the misclassification matrix and interpret it. In your opinion, is accuracy a good way to interpret the performance of the model? If not, suggest other measures.
c. What are the most important three attributes for recognizing the letters?
Problem 2: On the same data from Problem 1, apply a K-nearest neighbor classifier to classify the data. Report the following:
1. If you are doing any data transformation, explain the transformation and why it is needed.
2. Report the misclassification matrix and the appropriate performance metrics for different values of K (K=1, 3, 5, and 7).
3. Interpret the results and also compare them with the ones obtained by using the decision trees.
Labor force participation rate
: Suppose we have a working age population is equal to 100 million. If the number of employed is equal to 60 million and the number of unemployed is equal to 3 million, what is the labor force participation rate?
|
Question regarding the increase and decrease in value
: How do changes in the value of the U.S. dollar impact Apple Inc? Please list examples of the impact on Apply Inc when the U.S. dollar has an increase and decrease in value? Please provide one reference.
|
Assume both bonds are selling at a premium
: Which one of these is included in the yield of a bond with a low credit rating but not included in a U.S. Treasury bond yield? Assume both bonds are selling at a premium.
|
Working age population
: If there are 13 million unemployed and a working age population of 160 million, what would the number of employed be to arrive at a labor force participation rate of 64%?
|
Create a classification model for letter recognition
: Create a classification model for letter recognition using decision trees as a classification method with a holdout partitioning technique for splitting the data into training versus testing
|
In addition to common-size financial statements
: In addition to common-size financial statements, common-base-year financial statements are often used. Common-base-year financial statements are constructed by dividing the current year account value by the base year account value.
|
Replacement analysis
: St. Johns River Shipyard's welding machine is 15 years old, fully depreciated, and has no salvage value. However, even though it is old, it is still functional as originally designed and can be used for quite a while longer. What is the NPV of the pr..
|
Allocate between leisure and work
: 1. A worker has 24 hours per day to allocate between leisure and work. Use graphs to answer the following questions. a. If leisure is a normal good, show how it is possible to derive a negatively-sloped labor supply curve. Explain how this is poss..
|
Strategic analysis portfolio
: Strategic Analysis Portfolio, The two companies that have been chosen are Apple Inc. and Samsung. Both the companies are in the same market domain and are relevant to the study owing to the nature of their business and strategic decisions that are be..
|