Compute the covariance in several ways

Assignment Help Computer Engineering

Reference no: EM133240965

Variable Importance

Compared to many other machine learning models, a logistic model can be easily interpreted. The sign of a coefficient tells us whether the corresponding input variable has positive or negative effect on the prediction of the output class. The amount of a coefficient is related to the importance of a variable for the prediction. However, wi > wj (using the previous notation, wi is the i-th component of w) does not simply imply that the i-th input variable is more important than the j-th input variable. Obviously, this also depends on the scaling of the input variables. For example, the importance of an input variable measuring a distance in the physical world should be independent of whether the associated unit is meters or millimeters.

The notebook Variable importance using logistic regression.ipynb demonstrates how to analyze the importance of the variables in a logistic regression model. Please have a close look. (Note that for a non-linear model the importances and their ranking may be different.) As argued above, we do not consider the amount of a coefficient directly, but the corresponding z-statistic, which in our case is the coefficient over its standard error. The z-statistic of a coefficient is invariant under linearly rescaling of the corresponding input variable.

In the example in the notebook, we have to deal with a categorical variable. A categorical variable takes values that correspond to a particular category (class, concept, object), for example {Orange, Apple, Banana}, and these categories are not necessarily ordered in a meaningful way. Such a variable needs to be encoded before a (generic) machine learning system processes the data. Simply encoding {Orange,Apple,Banana} by {0,1,2} and treating the variable as measured on an interval scale (i.e., treating the categories as numbers), does not make sense - a banana is not two times an apple.

You already heard about the most popular encoding for output categorical vari- ables, the one-hot encoding. A one-hot encoding of {Orange, Apple, Banana} is {(1,0,0)T,(0,1,0)T,(0,0,1)T}, that is C = 3 classes are encode by C (output) variables. In the notebook, however, C - 1 ("dummy") variables are used for the categorical input variable.

In this questions of the assignment, you should concisely explain why C - 1 vari- ables are used instead of the one-hot encoding. Your submission should include answers to the following questions: How many solutions (i.e., optimal values for the coefficients) would the linear regression optimization problem (without reg- ularization) have if the one-hot encoding was used? Why? Why would it be difficult to interpret the variable importance if the one-hot encoding was used?Recall that when you add normally distributed random variables their variances add up, that is, if x1 ∼ N (0, σ12) and x2 ∼ N (0, σ2), then x1 +x2 ∼ N(0,σ12 +σ2). And remember if x ∼ N(0,1) then σx ∼ N(0,σ2).

You can compute the covariance in several ways. One basic way to derive it is the following. Consider the standard normally distributed random vector xˆ ∼ N(0,I) and its transformation x = Axˆ. We have x ∼ N(0,AAT). For the exercise, find A, and then the off-diagonal elements of AAT give you the covariance between x1 and x2.

Reference no: EM133240965

Questions Cloud

Cause dysfunction and result in disease : Evaluate two factors or actions that promote our health status. Evaluate two factors or actions that can cause dysfunction and result in disease.

What is the economic cost of running business : You resigned from your $100,000 per year job to open up your own business. To run your business, you used office space that you had inherited from your family

How to define and call a php function : BCS 350 Farmingdale State College How to define and call a PHP function and How to iterate a PHP numeric array and a PHP associative array? Show us with example

What happens to ms rem monthly purchase : Suppose that Ms. Rem makes a required monthly 12-bottle purchase from her favorite wine club, which offers both high price (let's assume higher-quality) wine, a

Compute the covariance in several ways : Compute the covariance in several ways. One basic way to derive it is the following. Consider the standard normally distributed random vector

What went wrong with the past team-building efforts : What went wrong with the past team-building efforts? How can action learning be used to help promote teamwork in this organization

Find the core dump file in your current directory : COMPSCI 70 University of California, Write C++ (or C) program that will crush (i.e. abort abnormally) with the most famous Unix/Linux error message

Consumer surplus of ev customers change : Consider the following depiction of the market for electric vehicles (EVs) in California, suggesting that 20,000 EVs will be sold at a price of $40,000.

What is fiat money : Please define the following, (a) What is money? (b) What is fiat money and how is if different from commodity money? (c) What are the five standards of money?

User Account

All Pages