Solution-Regression modeling, Developing a regression model

Regression modeling

Assignment Help Microeconomics

Reference no: EM13214

Here is a procedure you can use when developing a regression model.

Regression Methodology

1) Collect data. This will be the most time consuming part of your job. Be sure to consider dummy variables and time. Be selective (Remember garbage in = garbage out).

Sources:

Interviews, focus groups

Market Studies

Market Experiments (can be very valuable)

Gvt. Data

Industry trade association data

2) Check for problems with covariance - different explanatory variables may be correlated and fighting each other in the model. For example - GDP vs. housing starts vs. Income. Pick the best of all variables that give you the same information. You can check covariance on Excel.

3) Regress your data first as a linear model

4) Check t-ratios and p-values of your coefficients. Make sure t-ratios are at least 1.75-2.0 (choice is up to you, decide on your desired confidence level). P-values tell you exactly what confidence level is.

5) Check R² . This tells you percent of variability you have explained with your model. If it is very low - try adding more explanatory variables and/or a transformation. For multi-variable can also use corrected R²

6) Check p-value on f for regression. This tells you confidence that your model is statistically valid. Note the standard error - the smaller this is the less error you will have in your forecast. SE =standard deviation of the sample mean.

7) Weed out bad variables and develop a model.

8) Try a transformation and see if your diagnostics improve. (A detailed residual analysis is necessary to completely analyze your model - refer to a stats book for this. Diagnostic Plots like standardized residuals vs fitted values, nscores. Check for autocorrelation for time series. Check for heteroscedasticity (variable variance of residuals) - can often reduce with proper transformation. Check normal probability plots. Check outliers. We will not go into this level of detail in this course)

CAVEATS:

1) R² will increase as you add more explanatory variables - this is not necessarily a better model. You are not trying to maximize R². You are trying to develop a simple accurate model that predicts demand.

2) collecting more bad data will only improve your t-ratios

3) avoid collecting too many explanatory variables - they might mask the effect of the important ones

Sample Regression Model

Here is a detailed example of the development of a regression model

You open a jewelry store in Naperville. You make most of your money from diamond sales. You want to better understand and forecast your diamond demand:

Data collection: What variables do you think might influence your demand for diamonds? You develop the following list:

income
price
wealth
GDP
stock market
price of diamonds at WalMart
demographics -age
wedding stats
time
advertising
season (high in holiday time)

You opened your store on Jan 1, 2002 and have gathered the following data.

carats/mo	$/carat	$/carat	$/mo	Naperville	$/yr	$B
Demand	Price	Pwalmart	adv $	Ave Age	Income	GDP	Dow Jones	month
47	1500	1200	200	50	35000	1155	11000	Jan
51	1500	1000	200	49	35350	1260	12000	Feb
40	1500	800	250	48.02	35703.5	1207.5	11500	Mar
121	900	1500	225	47.0596	36060.54	1102.5	10500	Apr
127	900	1600	180	46.11841	36421.14	1080.45	10290	May
121	900	1400	120	45.19604	36785.35	1058.841	10084.2	Jun
117	900	1000	500	44.29212	37153.21	1037.664	9882.516	Jul
73	1200	650	126	43.40628	37524.74	1016.911	9684.866	Aug
89	1200	890	600	42.53815	37899.98	996.5727	9491.168	Sep
77	1200	658	140	41.68739	38278.98	976.6412	9301.345	Oct
92	1200	986	220	40.85364	38661.77	957.1084	9115.318	Nov
92	1200	659	120	40.03657	39048.39	937.9662	8933.012	Dec
62	1500	888	50	39.23584	39438.88	919.2069	8754.352	Jan
77	1500	1020	125	38.45112	39833.26	900.8228	8579.264	Feb
55	1500	325	562	37.6821	40231.6	882.8063	8407.679	Mar
88	1500	1500	300	36.92846	40633.91	865.1502	8239.526	Apr
131	1100	1500	251	36.18989	41040.25	847.8472	8074.735	May
112	1100	800	401	35.46609	41450.66	830.8902	7913.24	Jun
115	1100	800	200	34.75677	41865.16	814.2724	7754.976	Jul
131	1100	1350	150	34.06163	42283.81	797.987	7599.876	Aug
140	1100	1350	687	33.3804	42706.65	782.0272	7447.879	Sep
119	1100	800	200	32.71279	43133.72	766.3867	7298.921	Oct
128	1100	825	500	32.05853	43565.06	751.059	7152.943	Nov
148	950	789	140	31.41736	44000.71	736.0378	7009.884	Dec
156	950	1200	263	30.78902	44440.71	721.317	6869.686	Jan
166	950	1200	241	30.17324	44885.12	706.8907	6732.292	Feb
175	950	1600	523	29.56977	45333.97	692.7529	6597.646	Mar
146	950	600	526	28.97838	45787.31	678.8978	6465.694	Apr
126	1200	600	641	28.39881	46245.18	665.3199	6336.38	May
94	1500	600	320	27.83083	46707.64	652.0135	6209.652	Jun
118	1300	600	265	27.27422	47174.71	638.9732	6085.459	Jul
122	1250	600	145	26.72873	47646.46	626.1937	5963.75	Aug
131	1250	700	285	26.19416	48122.92	613.6699	5844.475	Sep
138	1200	700	521	25.67027	48604.15	601.3965	5727.585	Oct

You select the following variables:

Price

Pwm

adv $

Dec

Feb

mos in business

GDP

Dow Jones

Ave Age

Income

This is your regression output:

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.999662
R Square	0.999324
Adjusted R Square	0.99903
Standard Error	1.080827
Observations	34

ANOVA
	df	SS	MS	F
Regression	10	39723.51	3972.351	3400.44
Residual	23	26.86831	1.168188
Total	33	39750.38

	Coefficients	Standard Error	t Stat	P-value	Upper 95.0%
Intercept	218.594	279.748	0.781396	0.442539	797.2961
Price	-0.09904	0.001088	-91.0638	6.28E-31	-0.09679
Pwm	0.030112	0.000706	42.67671	2.1E-23	0.031571
adv $	0.01156	0.001222	9.457305	2.17E-09	0.014089
Dec	9.288621	0.887907	10.46125	3.24E-10	11.12539
Feb	8.619078	0.763213	11.29314	7.35E-11	10.1979
mos in business	5.951601	8.343354	0.713334	0.482815	23.21112
GDP	-0.35999	12383.11	-2.9E-05	0.999977	25616.01
Dow Jones	0.03772	1300.226	2.9E-05	0.999977	2689.757
Ave Age	2.211563	3.854305	0.57379	0.571679	10.18479
Income	-0.00506	0.013574	-0.37298	0.712582	0.023018

What do you think??

GDP, Dow Jones and Income look insignificant, but is that reasonable?

The problem is that they are fighting each other. They each have similar information. You only need one of these variables. You need to think about which makes the most sense or test and see which one works best alone in model.

Can check covariance (variables that are correlated with each other). This is undesirable. Each explanatory variable should give you new information.

In excel, it's also under tools, data analysis, covariance. Highlight data you want to check.

Covariance table:

	Price	Pwm	adv $	Dec	Dow Jones	Ave Age	Income
Price	43194
Pwm	-26060	117538
adv $	-4557	-6724	30249
Dec	-6	-14	-10	0
Feb	12	10	-10	0
mos in business	-263	-1076	541	0
GDP	6630	17638	-9701	-1	305052		-707947
Dow Jones	63141	167984	-92392	-14	2905255
Ave Age	212	769	-397	0	12203	52
Income	-102177	-449231	222270	5	-6742349	-28900	16335056

The higher the number, the greater the covariance. Income and Dow Jones have strong correlation. As do Income and GDP and GDP and Dow Jones. We just need one of these variables.

Let's keep only income as this is logical. Re-run the regression:

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.999662
R Square	0.999324
Adjusted R Square	0.999108
Standard Error	1.036802
Observations	34

ANOVA
	df	SS	MS	F
Regression	8	39723.51	4965.439	4619.188
Residual	25	26.87398	1.074959
Total	33	39750.38

	Coefficients	Standard Error	t Stat	P-value	Upper 95.0%
Intercept	214.2703	261.6642	0.818875	0.420595	753.1774
Price	-0.09905	0.001025	-96.6054	1.08E-33	-0.09694
Pwm	0.030119	0.000669	45.02481	1.87E-25	0.031497
adv $	0.011552	0.001167	9.900353	3.92E-10	0.013955
Dec	9.285746	0.85082	10.91388	5.34E-11	11.03804
Feb	8.598333	0.674088	12.75551	1.92E-12	9.986641
mos in business	5.805573	7.746753	0.74942	0.460598	21.7603
Ave Age	2.122573	3.488305	0.608483	0.548357	9.306867
Income	-0.00483	0.012633	-0.38262	0.705234	0.021185

Income still looks bad - check covariance again.

	Price	Pwm	adv $	Dec	Income
Price	43194.2
Pwm	-26060	117537.7
adv $	-4556.53	-6723.71	30249.34
Dec	-6.40138	-13.9689	-9.96021	0.055363
Feb	11.72145	9.870242	-9.76384	-0.00519
mos in business	-262.5	-1075.62	541.0441	0.029412
Ave Age	212.2758	769.0929	-397.053	-0.04747	-28899.6
Income	-102177	-449231	222270	4.870607	16335056

Income is varying along with advertising and the negative of price and age (should investigate this).

Remove it:

Remove income
SUMMARY OUTPUT

Regression Statistics
Multiple R	0.99966
R Square	0.99932
Adjusted R Square	0.999137
Standard Error	1.019641	0.01945
Observations	34

ANOVA
	df	SS	MS	F
Regression	7	39723.35	5674.764	5458.249
Residual	26	27.03136	1.039668
Total	33	39750.38

	Coefficients	Standard Error	t Stat	P-value	Upper 95.0%
Intercept	114.302	14.08797	8.113449	1.36E-08	143.2603
Price	-0.09905	0.001008	-98.2357	5.91E-35	-0.09697
Pwm	0.030188	0.000634	47.64864	7.83E-27	0.031491
adv $	0.011618	0.001135	10.23668	1.3E-10	0.013951
Dec	9.328693	0.829424	11.2472	1.74E-11	11.0336
Feb	8.649533	0.649737	13.31237	4.06E-13	9.985087
mos in business	2.842606	0.208486	13.63453	2.35E-13	3.271155
Ave Age	0.792431	0.283334	2.796804	0.009581	1.374834

Looks good. You might question age - as it doesn't make sense. Might investigate replacing age with income. This is the art. We decide to keep the model as is. This is our demand equation:

Demand = 114.302-0.09905P+0.030188Pwm+0.011618Adv +9.329*Dec+ 8.65*Feb+2.84*Mos+0.79*Age

The above model looks good, but perhaps a transformation would be even better. Let's try log-linear. I take logs of all my original data and place in a new table.

ln(Demand)	ln(Price)	ln(Pwm)	ln(mos)	ln(age)	ln(Inc)	ln(adv)	Dec	Feb
3.85	7.31	7.09	0.00	3.91	10.46	5.30	0	0
3.93	7.31	6.91	0.69	3.89	10.47	5.30	0	1
3.69	7.31	6.68	1.10	3.87	10.48	5.52	0	0
4.80	6.80	7.31	1.39	3.85	10.49	5.42	0	0
4.84	6.80	7.38	1.61	3.83	10.50	5.19	0	0
4.80	6.80	7.24	1.79	3.81	10.51	4.79	0	0
4.76	6.80	6.91	1.95	3.79	10.52	6.21	0	0
4.29	7.09	6.48	2.08	3.77	10.53	4.84	0	0
4.49	7.09	6.79	2.20	3.75	10.54	6.40	0	0
4.34	7.09	6.49	2.30	3.73	10.55	4.94	0	0
4.52	7.09	6.89	2.40	3.71	10.56	5.39	0	0
4.52	7.09	6.49	2.48	3.69	10.57	4.79	1	0
4.13	7.31	6.79	2.56	3.67	10.58	3.91	0	0
4.34	7.31	6.93	2.64	3.65	10.59	4.83	0	1
4.01	7.31	5.78	2.71	3.63	10.60	6.33	0	0
4.48	7.31	7.31	2.77	3.61	10.61	5.70	0	0
4.88	7.00	7.31	2.83	3.59	10.62	5.53	0	0
4.72	7.00	6.68	2.89	3.57	10.63	5.99	0	0
4.74	7.00	6.68	2.94	3.55	10.64	5.30	0	0
4.88	7.00	7.21	3.00	3.53	10.65	5.01	0	0
4.94	7.00	7.21	3.04	3.51	10.66	6.53	0	0
4.78	7.00	6.68	3.09	3.49	10.67	5.30	0	0
4.85	7.00	6.72	3.14	3.47	10.68	6.21	0	0
5.00	6.86	6.67	3.18	3.45	10.69	4.94	1	0
5.05	6.86	7.09	3.22	3.43	10.70	5.57	0	0
5.11	6.86	7.09	3.26	3.41	10.71	5.48	0	1
5.16	6.86	7.38	3.30	3.39	10.72	6.26	0	0
4.98	6.86	6.40	3.33	3.37	10.73	6.27	0	0
4.84	7.09	6.40	3.37	3.35	10.74	6.46	0	0
4.54	7.31	6.40	3.40	3.33	10.75	5.77	0	0
4.77	7.17	6.40	3.43	3.31	10.76	5.58	0	0
4.80	7.13	6.40	3.47	3.29	10.77	4.98	0	0
4.88	7.13	6.55	3.50	3.27	10.78	5.65	0	0
4.93	7.09	6.55	3.53	3.25	10.79	6.26	0	0

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.986941
R Square	0.974052
Adjusted R Square	0.965749
Standard Error	0.069079
Observations	34

ANOVA
	df	SS	MS
Regression	8	4.478337	0.559792
Residual	25	0.119299	0.004772
Total	33	4.597636

	Coefficients	Standard Error	t Stat	Lower 95.0%	Upper 95.0%
Intercept	27.29298	0	65535	27.29298	27.29298
ln(Price)	-1.21469	0.085059	-14.2805	-1.38987	-1.0395
ln(Pwm)	0.258325	0.040836	6.325837	0.17422	0.342429
ln(adv)	0.023881	0.022677	1.053086	-0.02282	0.070585
Dec	0.060605	0.055786	1.086398	-0.05429	0.175498
Feb	0.01679	0.044114	0.380596	-0.07406	0.107644
mos in business	0.115075	0.036956	3.113827	0.038963	0.191188
Ave Age	-1.30596	0	65535	-1.30596	-1.30596
Income	-1.09152	0	65535	-1.09152	-1.09152

Remove income
SUMMARY OUTPUT

Regression Statistics
Multiple R	0.986941
R Square	0.974052
Adjusted R Square	0.967066
Standard Error	0.067738
Observations	34

ANOVA
	df	SS	MS
Regression	7	4.478337	0.639762
Residual	26	0.119299	0.004588
Total	33	4.597636

	Coefficients	Standard Error	t Stat	Lower 95.0%	Upper 95.0%
Intercept	13.76921	1.097963	12.54069	11.51231	16.0261
ln(Price)	-1.21469	0.083407	-14.5633	-1.38613	-1.04324
ln(Pwm)	0.258325	0.040043	6.451113	0.176014	0.340635
ln(adv)	0.023881	0.022237	1.073942	-0.02183	0.069589
Dec	0.060605	0.054702	1.107913	-0.05184	0.173047
Feb	0.01679	0.043257	0.388134	-0.07213	0.105707
mos in business	0.115075	0.036239	3.175493	0.040586	0.189565
Ave Age	-0.76836	0.155205	-4.95063	-1.08739	-0.44933

remove Dec, Feb and Adv

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.971709
R Square	0.944219
Adjusted R Square	0.938641
Standard Error	0.092459
Observations	34

ANOVA
	df	SS	MS
Regression	3	4.341174	1.447058
Residual	30	0.256462	0.008549
Total	33	4.597636

	Coefficients	Standard Error	t Stat	Lower 95.0%	Upper 95.0%
Intercept	10.67027	0.984992	10.83285	8.65865	12.68189
ln(Price)	-1.18737	0.105218	-11.2848	-1.40226	-0.97249
ln(Pwm)	0.237193	0.051394	4.615168	0.132232	0.342154
mos in business	0.282874	0.020884	13.54498	0.240223	0.325524

No improvement - in fact I had to remove variables that seem intuitive. I will stick with my linear model (I reality I might also investigate squaring some of the variables).

I will stick with my first linear demand model.

Demand = 114.302-0.09905P+0.030188Pwm+0.011618Adv +9.329*Dec+ 8.65*Feb+2.84*Mos+0.79*Age

Now we can forecast our sales. These are the types of questions I can answer with my model.

What type of demand loss should I expect if I raise my prices?
What level of demand increase should I anticipate if I spend an additional $100 of advertising?
How will my demand changes as incomes rise? As my customers age?
If I want to open a similar store in Westmont and forecast the following:

Price at Walmart = $1200/carat

Age = 45

Advertising = $500/month

What should expect to sell in the first 3 months of the year (Jan-Mar) if I open the store in Jan?

Let's answer this last one:

I use the demand equation for forecast sales in each month. Check this yourself on excel and see if you can replicate these results.

		Jan	Feb	Mar	Total	Ave
Pwm =	1200	95.6766	107.1666	101.3566	304.1998	101.3999
Age=	45
Adv =	500
Price	1000

Reference no: EM13214

Questions Cloud

Commanding heights project : The purpose of this project is to promote better understanding of globalization, world trade and economic development, including the forces, values, events etc.

Ict governance : ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

Replacement analysis : This document show the Replacement Analysis of modling machine. Is replacement give profit to company or not?

Characterization technology for nanomaterials : Calculate the reciprocal lattice of the body-centred cubic and Show that the reciprocal of the face-centred cubic (fcc) structure is itself a bcc structure.

Regression modeling : Developing a regression model with Sample Regression Model

Linear demand function : Demand estimation and forecasting and income elasticity of demand

Information technology and the changing fabric : Illustrations of concepts from organizational structure, organizational power and politics and organizational culture.

Concepts from organizational structure : Illustrations of concepts from organizational structure, organizational power and politics and organizational culture.

Linear programming problem : Linear programming is a mathematical technique used to determine the optimal solutions to certain specific problems.

User Account

All Pages

Regression modeling

Reference no: EM13214

Reference no: EM13214

Questions Cloud

Reviews

Write a Review

Microeconomics Questions & Answers

The free rider problem

Failure of the super committee is good thing for economy

Case study analysis about optimum resource allocation

Fixed cost and vairiable cost

Problem - total cost, average cost, marginal cost

Oligopoly and demand curve problem

Impact of external costs on resource allocation

Shifts in demand and movements along the demand curve

Article review question

Long-term growth, international trade & globalization

European monetary union (emu) in crisis

Development game “settlers of catan”

Assured A++ Grade

Academics

Major Subjects

Majors

Get In Touch

TERMS & POLICIES

HELP & SUPPORT