Estimate the required regression models

Assignment Help Econometrics
Reference no: EM133689713

Introductory Econometrics

Data Analysis Project

Introduction - One of the hypotheses that have been widely discussed in the literature of environmental economics is the Environmental Kuznets Curve (EKC). It states that the relationship between a country's national income and the extent of environmental degradation is in an inverted U- shape. That is, the extent of environmental degradation increases with national income at a diminishing rate and starts decreasing as the national income increases further beyond a certain level. In this project, we will test the EKC hypothesis empirically using data from the World Bank

Data collection
To do this project, you need to download the following data from the World Bank's (WB's) World Development Indicators (WDI) website

Please follow the steps below to download these data from the WB's website:
Expand "Country" tab on the left-hand side of the website and choose all countries. To do this, you need to select "Countries" out of three options, then select all countries by ticking the box on the next line. You should see that you have selected 217 countries. (see Image 1 at the end of this document)
Expand "Series" tab and search the required data series by the WB indicator name or data code listed above. Go through the search results and tick the box next to the intended variables (pay attention to the measurement as well). (see Image 2)
Move to the "Time" tab and select "2020" by ticking the box next to it. (see Image 3)
Click "Apply Changes" on the right-hand side of the website. (see Image 3)
Under "Download options," choose "Advanced Options". (see Image 3)
In the popup window, select "Names only" within "Variable format:" option. (see Image 4)
Click "Download" and save the file in your local drive.

Data Cleaning/Formatting
Before analyzing the data on R/RStudio, you need to follow several steps to clean and rearrange them.
First, opening the data downloaded from WDI on Excel. You notice that the data are arranged as

Field/variable names appear on row 1 of the worksheet "Data" and actual data are stored on rows 2-869. Scrolling down the sheet, you find the following texts in lines 873 and 874:
Data from database: World Development

Please delete these two lines and save the Excel file under the same name. (see Image 5)
Next, we need to convert the data format from a long form (data on 4 variables from 217 countries are stacked vertically in one column) into a wide form (data are stored in a table form so that the first column stores the country name and subsequent columns store the data on one variable in each column). There are many ways to perform this transform, but one possible (probably the easiest) way is to execute the following on R/RStudio:

dat = readxl::read_excel("[path]/Data_Extract_From_World_Development_Indicators.xlsx", sheet
= "Data")
datw = spread(dat, "Series Name", "2020")

We are familiar with the first line, which reads the specified Excel spreadsheet into the workspace (you might need to adjust the file path). The second command converts the data from a long form into a wide form and save the new data as "datw." A new data matrix "datw" should contain the country name in the first column and the data on four variables in columns 2-5. (Please check this on R/Rstudio.)
We also want to shorten the variable names so that they are easier to handle. We can try:

datw = rename(datw, CO2 = "CO2 emissions (metric tons per capita)", GDPpc = "GDP per capita (constant 2015 US$)",
PopDen = "Population density (people per sq. km of land area)", UrbPop = "Urban population (% of total population)")

Now, a new data matrix "datw" contains the country name in the first column and the data on four variables (CO2, GDPpc, PopDen, and UrbPop) in columns 2-5.
Next, we want to convert missing values from ".." into "NA" and eliminate them from dataset. This can be done by:

datw[datw==".."] = NA datw = na.omit(datw)

The first line changes ".." into "NA" (the default value for missing observations in R). The second line eliminates these missing observations from "datw."
Finally, we need to change the data type from character to numerical for the six variables. This can be done by:.

class(datw$CO2) = "double" class(datw$GDPpc) = "double" class(datw$PopDen) = "double" class(datw$UrbPop) = "double"

Now, we are ready to analyze the data.

Data Analysis
Analyze the WDI data using R/RStudio and answer the following 11 questions.

Create a new variable "CO2k" by converting the data on CO2 emissions from metric tons per capita into kilograms (kg) per capita (by multiplying the original data "CO2" by 1,000). Then, create a scatter plot of CO2 emissions per capita in kg (vertical axis) against per capita GDP (horizontal axis). Please label each axis clearly.

Under the assumption that CO2k (CO2 emissions per capita in kg) is distributed independently and identically in the population, construct a 90% confidence interval of the population mean of CO2 emissions per capita (in kg) manually (that is, using the sample mean, sample variance, and the appropriate critical values obtained from either R or statistical tables). Interpret the calculated confidence interval.

Estimate a multiple regression model with CO2 emissions per capita (in kg) as the dependent variable, and GDP per capita, GDP per capita squared, population density, and the share of population living in urban areas as explanatory variables. Write down the estimated sample regression equation.

For the regression model estimated in Question 3, interpret the reported R- squared as well as the standard error of the regression. Briefly comment on the model's goodness of fit to the observed data.

For the regression model estimated in Question 3, provide interpretations of the estimated coefficients for PoPDen and UrbPop.

For the regression model estimated in Question 3, test if the true population coefficient for PoPDen is negative at a 10% test size, using a critical value approach. State clearly the null and alternative hypothesis.

For the regression model estimated in Question 3, construct a 99% confidence interval of the true population coefficient for UrbPop manually (that is, using the estimated coefficient, standard error and the appropriate critical values obtained from either R or statistical table). Interpret the obtained confidence interval.

Using the regression model estimated in Question 3, calculate the predicted values of CO2k for a range of GDP observed in the sample (with 1,000 increments) whilst keeping the values of PopDen and UrbPop at their respective sample means. Create a two- dimensional diagram with the predicted values of CO2k (vertical axis) is plotted against GDP (horizontal axis). Briefly describe the relationship between CO2 emissions per capita and GDP per capita as implied by the estimated regression model. Does this have the shape you expected? Explain why/why not?

Based on the model estimated in Question 3, find the level of GDP per capita where the effect of GDP per capita on CO2 emissions changes its sign. Briefly comment on how this relates to your answer to Question 8 above.

Following the prompts provided below, test a joint hypothesis that the true population coefficients for PopDen and UrbPop are both equal to zero at a 5% significance level.

Formulate the null and alternative hypotheses.

Write down the regression model(s) that need to be estimated to test the hypotheses formulated in (i).

Estimate the required regression model(s) and calculate the necessary test statistics.

Obtain the relevant critical value(s) and determine whether to reject or not to reject the hypotheses.

Further Instructions

This is an individual project, not a group project. You are required to work and compose your report individually.
You need to conduct all data analysis and compose your report using RMarkdown. A starter RMarkdown file "ECOM5000_pj_2024_Starter.rmd" will help you on initial setup and load in data. You need to complete the rest of coding to perform the necessary data analysis.
You need to submit two files through Blackboard:
A complete RMarkdown file (".rmd" file), and
A report (in pdf format) created by knitting your markdown file (i). It should contain your R code for data analysis, output from data analysis, and text-based answer.

Reference no: EM133689713

Questions Cloud

How does ethical approach encourage problem solving skills : How does an ethical approach encourage problem solving skills in the workforce? What role does HR play in fostering this type of culture?
How difficult was it for you to meet the carbohydrate : How difficult was it for you to meet the carbohydrate and protein requirements given your athletes energy intake and weight goals?
Describes the selected topic in social and cultural context : Describes the selected topic in social and cultural context. Examine cultural norms and values as they relate to the topic.
Review the drug navane drug : Provide a review of the drug Navane drug, including its drug name, classification, indication (use), and mechanism of action.
Estimate the required regression models : ECOM5000 Introductory Econometrics, Curtin University - Estimate the required regression models and calculate the necessary test statistics
How does the first amendment apply to the situation : Any legal issues regarding the grading of your student's essay and you could display the student's work. How does the First Amendment apply to this situation?
Determine the key audience and stakeholders : State your goal that you want to achieve as a result of your advocacy. Determine the key audience and stakeholders.
What are the professional issues related to best practices : What are the professional issues related to best practices in relation to family involvement and early education that need to be emphasized in this case?
What is the definition of death : What is the definition of death? How is death medically defined? How is death legally defined?

Reviews

Write a Review

Econometrics Questions & Answers

  What qualitative effect would each of the following events

If you were in charge of macroeconomic policies in a small open economy what qualitative effect would each of the following events have on your target for external balance ?

  Write all the set functions for each data member

Declare a class named Cuboids with its three data members of type float to have the measurements of the three dimensions - Write all the set functions for each data member to set their values.

  What is volume of output that maximizes commodity profit

Its total cost of production is given by TC = 800 + 18 Q + 2 Q2, and thus its marginal cost is MC = 18 + 4 Q. The market price is currently P = $54. In the short run, what is the volume of output that maximizes Commodity Inc.'s profits.

  What is the total social welfare of regulatory commission

A gas utility serves a small town with natural gas. The total cost function for the utility as a function of the amount Q of gas sold is: TC(Q) = 50,000 + 300Q + 0.4Q2 The demand function for the customers is Q = 374 - 0.22P

  Compute the bowens accounting profits

howard bowen is a large-scale cotton farmer. the land and machinery he owns has a current market vlaue of $4 million. bowen owes his local bank $3 million. last year bowen sold $ 5 million worth of cotton. his variable operating costs were $4.5 mi..

  What would be the impact on interest rates

Many argue that this was the position of the U.S. economy in 2003. If the Fed decided to expand the money supply in the graph, what would be the impact on interest rates?

  What is profit maximizing quantity and price for media cable

Each of these customers are willing to purchase cable service, but only if the price is just equal to, or lower than, his or her willingness to pay. Morgan's willingness to pay is $180; Larry's, $100; Clyda's, $70; Janet's, $40; and Tom's, $0.

  What will happen to the ppf over time

Now suppose that a new technology is discovered that allows twice as many loaves of bread to be

  In order to maximize profit which product mix should pushed

The Morton Company produces and sells two products, A and B. Following financial data on the products is available: Product A Product B Selling price $10.00 $12,00 Variable costs $5.00 $10.00 Fixed costs $2000.00 $600.00 Machining time 0.5 hrs 0.25 h..

  Which compensation scheme should the government use

Do you see any problem with the performance-related pay scheme when the employee is risk averse?

  Explain how with trade nebraska can end up

Suppose there are two states that do not trade: Iowa and Nebraska. Each state produces the same two goods: corn and wheat. For Iowa the opportunity cost of producing 1 bushel of wheat is 3 bushels of corn. For Nebraska the opportunity cost of prod..

  The total amount of tax revenue raised is equal to what

Assume that the demand curve is given by the following: p=20 and the supply curve is given by Q=p-5. If the government puts in place a tax of 5 that must be paid by the seller the total amount of tax revenue raised is equal to what

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd