Reference no: EM133370593
Standard set of instructions for each HW, in this assignment groups will be set up for collaboration.
Make sure your group starts one thread for the collaborative problems. You are required to participate in the collaborative problem and subproblem separately. Please do not directly post a complete, "solution, the goal is for the group to develop a solution after everyone has participated. Please ensure,
you have a write-up with solutions to each problem and subproblems, you are also required to submit, code that will be compiled when grading the assignment. In each of the problems you are allowed to use built-in functions.
Problem 1
"In this problem the goal is to build a set of numerical images from a set of arrays. The data set, is from the Kaggle web site
This data has a training.csv, test.csv and sample submission.csv files. In this exercise the focus, will be on the train.csv data. The web site has the following data description:
The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero, through nine.
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each, pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit, "that was drawn by the user. The rest of the columns contain the pixel-values of the associated, "image.
Each pixel column in the training set has a name like pixel x, where x is an integer between, 0 and 783, inclusive. To locate this pixel on the image, suppose that we have decomposed x as, x = i ∗ 28 + j, where i and j are integers between 0 and 27, inclusive. Then pixel x is located on, row i and column j of a 28 x 28 matrix, (indexing by zero).
For example, pixel 31 indicates the pixel that is in the fourth column from the left, and the second, row from the top, as in the ascii-diagram below.
This data is set up in a csv file which will require the reshaping of the data to be 28 × 28 matrix, representing images. There are 42000 images in the train.csv file. For this problem it is only, necessary to process approximately 100 images, 10 each of the numbers from 0 through 9. The, goal is to learn how to generate features from images using transforms and first order statistics.
1. Read-in and store the data in a data structure of your choice so that the data is, "reshaped into a matrix of size 28 × 28 which represents each digit as an image.
2. Display the images for indices 0, 1, 3, 6, 7, 8, 10, 11, 16, and 21. These indices, "represent the numerical values from 0 to 9."
Problem 2
In this problem each image from the train.csv (42,000 images in total) is to be processed to generate a set of features using the discrete cosine transform and Eigen decomposition.
1. Take the 2 dimensional Discrete Cosine Transform (DCT) of each matrix from, "Problem 1, the matrix represents each number (0-9).
2. Extract the vertical, horizontal and diagonal coefficients from the transform (using the indexes indicated by the masks provided).
3. For each of the three sets of DCT coefficients perform Eigen decomposition.,
4. Retain the top 20 Eigen vectors of each direction.,
5. Using your top Eigen vectors reduce the DCT transformed data. This will create, a new data set that represents each image as a smaller subset of values.,
6. Save the new data in a file of your choice, *.txt, *.csv, etc. The name is up to you (you will use this in the subsequent question)."
Problem 3
"In this problem use the developed numerical features from Question 2 (if you are not able to generate the features, they are provided in the module for HW 3). In this problem the following,
"is to be completed:
"Use the Fisher's Linear Discriminant Ratio (FDR) from the Data Processing document, specifically Equation 20.,
1. For each feature and combination of numbers apply the FDR, e.g., 0 vs 1, 0 vs, "2, ..., 0 vs 9, ..., 7 vs 8, 7 vs 9, and 8 vs 9 (which should result in a 60 x 45 matrix where 60 represents the number of features and 45 represents the number of pairwise comparisons),
2. Place the results in a table and provide an initial analysis of which feature, provides the best class separation.
Problem 4
Cross-Validation [2], [7] This is a Collaborative Problem Not covered in lecture notes
"In this problem you are to develop and implement a k-fold cross validation algorithm. You are, "allowed to use either the Iris data set or the developed numerical features from HW2 to test your, "implementation. In this problem the following is to be completed:
1. Develop (pseudocode) an algorithm to randomly shuffle input data. Then divide the data into groups of testing and training sets based on the number of desired folds/experiments, the term used will be k-fold cross validation. Use the 5-fold cross validation in Figure 1 as a reference.,
2. Implement your k-fold cross validation algorithm.,
3. Test your implementation using the numerical features generated question 2.,
4. Perform analysis to determine if your implementation is correct. Explain your, "method of analysis and conclusions."
5. Module 8 Note this is a Collaborative Problem - Parzen Window__<br>,
Problem 5
"In this problem the following is to be completed:,
1. Using your 5-fold cross validation implementation from Problem 4, the Gaussian, "kernel in Eq. 27 (Parzen Window) of the Machine Learning document, implement an, "algorithm to process training observations and compare with test observations,
2. Using all observations and the petal length from the Iris data replicate the subfigures in Figure 2.,
3. Using all observations, the petal length and the petal width from the Iris data, "replicate the subfigures in Figure 3 without contour lines."
Attachment:- collaborative problems.rar