Applied EconometricsAssignment
- Subject Code :
MGTA604
Instructions
For this assignment, you are required to complete this RMarkdown file and submit it to the dropbox on the MGTA604 MyLearningSpace page prior to the deadline of 4pm on December 14th.
- Please note that this assignment is to be doneindependently. You cannot discuss thespecificswith your peers or any other individuals directly or via other means (e.g.discussion groups).
- You can discuss general methods and techniques.
- If you have any questions about what you can or cannot discuss with peers the instructor is happy to answer them.
- Note: The instructor is a resource and you are welcome to ask questions. Questions that can be answered will be posted to the course news as an FAQ.
- Note: you are welcome to borrow code from this course and the internet. If you borrow any significant blocks of code you must cite the website or course
- Your code should be readable:
- It should be easy for someone new to take a quick look at your code and understand how it works.
- Dont be afraid to break your code into more lines (define variables for intermediate steps).
- Include comments to briefly explain what a line of code is doing (added into your code # comment).
- Your code should be runnable.
- I should be able to run the RMarkdown file with the provided .csv data files and get results consistent with the report.
- Your writing should be clear and well organized.
- Write your answers in the space provided and indicated with
- Ensure that you write your answers in a professional manner.
- Be precise and concise.
- Ensure that you hand in the compiled html file as well as the RMD file.
Ensure that you answer each question.
The Case
As a developer of an app for displaying future gasoline prices to the residents of Houston, you are interested in acquiring a deep understanding of the dynamics underlying local gasoline prices.
The Dataset
Your dataset consists of a daily mean gasoline prices for ten cities in the US and Canada for the period from 2015-2018. The data also indicates whether or not a hurricane has made landfall within the state on that particular day.Note that this data cannot be shared in raw format beyond the students in this class.
In addition to this data, there is a separate data file which provides information on refiner production, oil prices and other potentially informative variables. This data was sourced from EIA.gov.
library(dplyr)
## Warning: The package `vctrs` (>= 0.3.8) is required as of rlang 1.0.0.
## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'tibble'
## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'pillar'
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# load financials
dta_gas <- read.csv("a2_gas_prices.csv")
dta_prod <- read.csv("eia_gas_prod.csv")
dta_oil <- read.csv("eia_oil_price.csv")
dta_cons <- read.csv("eia_gas_consumed.csv")
dta_hurr <- read.csv("hurricanes.csv")
Additional functions
Preliminary Question 1.
Please add a brief comment to the addLags function where is present which inform a reader of the code regarding what each line of the code accomplishes.
# error functions
MSE <- function(errs) { return(mean(errs*errs)) }
MAE <- function(errs) { return(mean(abs(errs))) }
# add lags to the dataset dta
# var is the column/variable for which lags should be added
# lags is a list of lag lengths
# e.g. c(1,3,4) will add lags of length 1, 3 and 4
#
# NOTE: addLags assumes a univariate dataset which is preordered by time.
# If you wish to apply this to a dataset containing multiple different objects
# you will need to apply this function once for each object and use rbind to
# append the results.
#
addLags <- function(dta,var,lags)
{
ldta <- dta # ...
# ...
for (l in lags)
{
ldta <- cbind(ldta,lag(dta[var],l)) # ...
names(ldta)[ncol(ldta)] <- paste(var,"_l",as.character(l),sep="") # ...
}
# ...
return(ldta)
}
sumLags <- function(dta,var,lags)
{
ldta <- dta # ...
lcol <- paste(var,"_lsum",sep="")
ldta[,lcol] <- 0
# ...
for (l in lags)
{
new_lag <- lag(dta[var],l)
new_lag[is.na(new_lag)] <- 0
ldta[,lcol] <- ldta[,lcol]+new_lag
}
# ...
return(ldta)
}
# out of sample validation (rolling horizon)
# f is the formula
# y_name is the dependent variable in dta
# dta is the dataframe of observations
# start - finish is the range of observations in dta which indicate the current horizon
# the function fits with data up to the horizon and evaluates with the next
# observation after the horizon
#
tsValidate <- function(f, y_name, dta, start, finish,errf=MSE)
{
errs <- start:finish*0 # init a new vector to store errors
e = 1
# repeat the fit/eval cycle for each observation in start-finish
for (s in start:finish)
{
fit_dta <- dta[1:s,]
fit <- lm(f, data=fit_dta)
yhat <- predict(fit, dta[s+1,])
err <- yhat-dta[s+1,y_name]
errs[e] <- err
e <- e+1
}
# apply the error function and return the value
return(errf(errs))
}
Preprocessing
We now create a single dataset for the oil and gas market eia data.
dta_xvars <- merge(dta_prod, dta_oil, by="date")
dta_xvars <- merge(dta_xvars, dta_cons, by="date")
Add the hurricane data to the gasoline data.
dta_gas_full <- merge(dta_gas, dta_hurr, by=c("Date","RegionID"),all.x=TRUE)
dta_gas_full[is.na(dta_gas_full$hurricane),"hurricane"] <- 0
Problem 1: Timeseries modeling and forecasting
Using linear regression, you will perform timeseries forecasting for the next period Houston gasoline price.
Forecasting Question 1 - Variable selection
Using the provided data, identify and justify four variables which you believe will be effective forecasters for the Houston gasoline price. (2 paragraphs of 4-6 sentences)
answer here
Forecasting Question 2 - Creating the dataset
Create a single dataframe for the analysis containing the dependent variable, these 4 forecasting variables and any other variables you would like to include in the your forecasting models.
## insert code here
# dta_houston <- ...
Forecasting Question 3
Provide visualizations for two of the forecasting variables which will be useful to the reader. Describe in words the visualization and what the reader should notice. (2-4 sentences).
Visualization 1
# code for visualization 1
description
Visualization 2
# code for visualization 2
description
Forecasting Question 4 - Explanatory analysis
Fit a linear model using all forecasting variables. Assess the coefficients using robust standard errors and evaluate the results from an explanatory perspective. (3-5 sentences)
# code for fitting and evaluating the full model
evaluation
Forecasting Question 5 - Forecasting analysis
Fit three forecasting models which use different combinations of variables. In model 1 use the full specification. In model 2 remove one or more of what you think are the strongest predictors and in model 3 remove one or more of what you think are the weakest predictors. Evaluate the out-of-sample (testing) error for these models with both the MSE and MAE. Display the results using an attractive bar chart.
# code for fitting and evaluating forecasting models
description of different specifications
evaluation
Forecasting Question 6 - Forecasting analysis interpretation
Describe the results from Question 5, and evaluate if the best forecaster in the following context: You want to use your forecaster to drive an app called Fill Up Now Or Finish Your BBQ which will provide recommendations to the Houston public about whether they should fill up their cars now or wait until tomorrow. (4 sentences)
Forecasting Question 7 - Stationarity of the process
Evaluate the presence of stochastic trends in the Houston gasoline price. Use an appropriate version of the ADF test. Discuss whether the process is stationary and its implications to your forecasting.(3-4 sentences)
# code for adf
discussion
Problem 2: Natural experiment and panel data
The second problem is that you are worried that hurricanes and other large events can cause havoc with your forecasts. Observing that there have been a handful of hurricanes affecting different regions in the Gulf states over the horizon of your data, you would like to use these events to study the following research question:Do hurricanes result in higher gasoline prices?
Natural experiment Question 1 - Quasi-experimental design
For the above research question describe a natural experiment that can be evaluated with the provided data. The design should be a differences-in-differences design and should include a description of the treatment and how the control group and treatment group would be defined (be sure to justify your decisions). (4-6 sentences)
description
Natural experiment Question 2 - Panel data methods
As discussed in the panel data class, panel data methods using a treatment variable and entity/time fixed effects can be used to implement a difference-in-difference design. Describe the treatment variable for your design and prepare the data for the linear model (note that in addition to defining the treatment variable you should make sure that time and entity are factor variables).
description
# define panel dataframe
# dta_panel <-
Natural experiment Question 3 - Analysis
Create, run and evaluate the results for the linear model. Remember to use clustered standard errors. Evaluate the coefficient estimates and hypothesis tests. (2-4 sentences)
# run the analysis
# fit <- lm(formula, data=dta_panel)
evaluate results
Natural experiment Question 4 - Interpretation
Interpret the results from the perspective of Fill Up Now Or Finish Your BBQ app designer. (3-5 sentences)
Interpretation