Reference no: EM132966858
Applications/Programming Problems
For this question, answer all relevant writing parts in your homework report.
Consider the case where you are given access to some data, but you discover that the data is too bulky for you to carry around. You will eventually want to use this data for some downstream task: if your data is weather statistics, you may want to use it to build a prediction model, or if it's an image, you may want to include it in a presentation. Unfortunately, you have no information about what this downstream task is- all you know is that you want to compress your data so that it fits in the memory that you have available, and that you have less memory available than what the data occupies right now. In this case, it is reasonable to assume you want to attempt the following:
- To the best of your ability, you want to go for lossless compression. This means that you want to eliminate all the possible redundancies in your data. For example, say that your data is a matrix of n columns in Rd but only m < n of these columns are linearly independent. Then storing the remaining of the n - m columns in their entirety is a redundancy, and there is room for lossless compression.
- If lossless compression is not possible, you want to work toward lossy compression that incurs the least amount of data loss. This means that in the pursuit of compression, you will loose some information, but you still want to quantify how much information you lost and minimize some notion of this quantity.
We will see how Singular Value Decomposition can help us achieve either of these outcomes. Let us consider the application of compression to image data. We will say that matrix X ∈ Rm×n represents a grayscale image, with each element of these matrix representing the grayscale value at that pixel location. Assume through this question that it takes 1 unit of memory to store one element of a matrix
1. How many units of memory does it take to store the matrix M ? Your answer should be in terms of m and n.
2. Assume now that you have access to some algorithm A that provides you with a low-rank approximation of M. Essentially, given M, the algorithm A returns a matrix L and another matrix R such that M ≈ LRT where L and R are taller than they are wide. Let us label the number of columns in L and R as k. How many rows must L and R have respectively?
3. This makes L and R a new representation of our matrix M, because (an approximation of) M can be recovered from them (since M ≈ LRT ). Given your answer in (2), what can you 3.This makes L and R a new representation of our matrix M , because (an approximation of) say about how many units of memory it takes to store this new representation of M ? Give an inequality relating k, m and n that describes the condition that must be met for L and R to be a more memory-efficient representation of our matrix M.
4. In parts (2) and (3), you assumed that an algorithm could magically give you the matrices L tion. If the full-rank SVD of M = USVT, then you can simply take L = US and R = V to and R. However, you have already learnt one such algorithm: the Singular Value decomposi- get the left and right matrices. In the full-rank approximation, we have that our estimate LRT exactly equals the original M. How many units of memory are required to store this matrix? Is it better or worse than part (1)?
To turn exact equality into an approximation that gives us a lower rank approximation of M, we should take only the first k columns of our decomposition matrices. Then, we can model L = UkSk and R = Vk where Uk Rm×k, Sk ∈ Rk×k and Vk Rn×k and recover M~k = LRT as an approximation of M.
Why does this make for a good approximation of our original M? And under what notion of approximation is this M˜k close to M.? A common metric of measuring the distance between Why does this make for a good approximation of our original M ? And under what notion of matrices is the Frobenius norm. The Frobenius norm is the matrix generalization of the typical vector norm that we are familiar with. Then, we can measure the distance between two matrices of the same size using the Frobenius norm of their difference:
||M - M˜k||F = √Σij(M - M˜k)2ij
Consider M = USVT = Σi=1r σiuiviT where v1, v2, ..., vr are the right singular vectors of M and u1, u2, ..., ur are the left singular vector of M, all corresponding to M 's top singular values (arranged in descending order of size) σ1, σ2, ..., σr. Then M˜k = Σi=1k σiuiviT where k ≤ r.
5. This notation makes it explicit that M˜k has rank k . Why?
It can be shown that M˜k is the best rank-k approximation of M under the Frobenius norm. Intuitively, this can be argued as follows: the rows of M˜k are the projections of the rows of M onto the subspace spanned by the first k singular vectors of M . Hence the rows of M˜k represent some notion of the "best-fit" k-dimensional subspace of M . Since we are working
with the Frobenius norm as the metric, we can carry over our intuition for "best-fit" from vector norms. We will test this notion empericially.
We will now move onto the programming parts.
6. Download the image provided in the HW release titled camera.png, which is a 226 226 full-rank matrix. You should open it using the Image.open function from the Pillow library. This should give you a [226, 226, 3] matrix of which you should only take the first channel, yielding a [226, 226] matrix. You can then check (approximately) that this matrix is full rank using the matrix rank function in the numpy.linalg library.
(a) Study the documentation for numpy.linalg.svd function to calculate SVD carefully, and make sure how it returns the singular values and the left and right singular vectors. Using this function, calculate the SVD of M and recover S, U and V . Reorder S, U and V in decreasing order of the size of the singular values (this should be easy in NumPy). Make a plot where the horizontal axis is the rank of your singular value and the vertical axis is the nth largest singular value in S. Precisely, your horizontal axis should range from 1 to 226, and the corresponding points on the vertical axis should be the largest and the smallest singular values. Include this plot in your report titled Singular, and retain the matrices for S, U and V in your code for further parts.
(b) Use the method outlined above to get Sk, Uk and Vk for four different values of k 226. Compute L and R using these matrices and visualize your corresponding M˜k. Include what you observe. Remember that if you use k = 226, computing LRT should recover all your images in the report, labelled with corresponding values of k, and reflect on the original image exactly (barring numerical issues).
(c) For a particular value of k define Ik as the weight of the first k singular values relative to the total sum of all singular values. Therefore
Ik = ∑ki=1 σi/∑128i=1 σi
This is a representation of the percent "information" about M carried by the first k singular vectors. Make another plot with the same horizontal axis as (a) ranging from k = 1 to k = 226 and vertical axis as Ik. Include this plot in your report, titled Information.
(d) Finally, make a third plot with the same horizontal axis as (a) and (c) ranging from k = 1 to k = 226 and the vertical axis referring to the memory usage in units of memory using your formula in part (3). Include this plot in your report, titled Memory.
(e) Using your graphs in part (a), (c) and (d), visual intuition from part (b), and your inequal-constraints. Justify why this would make a good value for k. Include the resulting M˜k ity from (3), pick a value of k that is suitable for storing the given image with memory in your report titled Image of rank k.