Compute the covariance in several ways

Assignment Help Computer Engineering
Reference no: EM133240965

Variable Importance

Compared to many other machine learning models, a logistic model can be easily interpreted. The sign of a coefficient tells us whether the corresponding input variable has positive or negative effect on the prediction of the output class. The amount of a coefficient is related to the importance of a variable for the prediction. However, wi > wj (using the previous notation, wi is the i-th component of w) does not simply imply that the i-th input variable is more important than the j-th input variable. Obviously, this also depends on the scaling of the input variables. For example, the importance of an input variable measuring a distance in the physical world should be independent of whether the associated unit is meters or millimeters.

The notebook Variable importance using logistic regression.ipynb demonstrates how to analyze the importance of the variables in a logistic regression model. Please have a close look. (Note that for a non-linear model the importances and their ranking may be different.) As argued above, we do not consider the amount of a coefficient directly, but the corresponding z-statistic, which in our case is the coefficient over its standard error. The z-statistic of a coefficient is invariant under linearly rescaling of the corresponding input variable.

In the example in the notebook, we have to deal with a categorical variable. A categorical variable takes values that correspond to a particular category (class, concept, object), for example {Orange, Apple, Banana}, and these categories are not necessarily ordered in a meaningful way. Such a variable needs to be encoded before a (generic) machine learning system processes the data. Simply encoding {Orange,Apple,Banana} by {0,1,2} and treating the variable as measured on an interval scale (i.e., treating the categories as numbers), does not make sense - a banana is not two times an apple.

You already heard about the most popular encoding for output categorical vari- ables, the one-hot encoding. A one-hot encoding of {Orange, Apple, Banana} is {(1,0,0)T,(0,1,0)T,(0,0,1)T}, that is C = 3 classes are encode by C (output) variables. In the notebook, however, C - 1 ("dummy") variables are used for the categorical input variable.

In this questions of the assignment, you should concisely explain why C - 1 vari- ables are used instead of the one-hot encoding. Your submission should include answers to the following questions: How many solutions (i.e., optimal values for the coefficients) would the linear regression optimization problem (without reg- ularization) have if the one-hot encoding was used? Why? Why would it be difficult to interpret the variable importance if the one-hot encoding was used?Recall that when you add normally distributed random variables their variances add up, that is, if x1 ∼ N (0, σ12) and x2 ∼ N (0, σ2), then x1 +x2 ∼ N(0,σ12 +σ2). And remember if x ∼ N(0,1) then σx ∼ N(0,σ2).

You can compute the covariance in several ways. One basic way to derive it is the following. Consider the standard normally distributed random vector xˆ ∼ N(0,I) and its transformation x = Axˆ. We have x ∼ N(0,AAT). For the exercise, find A, and then the off-diagonal elements of AAT give you the covariance between x1 and x2.

Reference no: EM133240965

Questions Cloud

Cause dysfunction and result in disease : Evaluate two factors or actions that promote our health status. Evaluate two factors or actions that can cause dysfunction and result in disease.
What is the economic cost of running business : You resigned from your $100,000 per year job to open up your own business. To run your business, you used office space that you had inherited from your family
How to define and call a php function : BCS 350 Farmingdale State College How to define and call a PHP function and How to iterate a PHP numeric array and a PHP associative array? Show us with example
What happens to ms rem monthly purchase : Suppose that Ms. Rem makes a required monthly 12-bottle purchase from her favorite wine club, which offers both high price (let's assume higher-quality) wine, a
Compute the covariance in several ways : Compute the covariance in several ways. One basic way to derive it is the following. Consider the standard normally distributed random vector
What went wrong with the past team-building efforts : What went wrong with the past team-building efforts? How can action learning be used to help promote teamwork in this organization
Find the core dump file in your current directory : COMPSCI 70 University of California, Write C++ (or C) program that will crush (i.e. abort abnormally) with the most famous Unix/Linux error message
Consumer surplus of ev customers change : Consider the following depiction of the market for electric vehicles (EVs) in California, suggesting that 20,000 EVs will be sold at a price of $40,000.
What is fiat money : Please define the following, (a) What is money? (b) What is fiat money and how is if different from commodity money? (c) What are the five standards of money?

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd