diff_months: 20

Applied Econometrics Assignment

Download Solution Now
Added on: 2022-12-12 05:05:05
Order Code: 480626
Question Task Id: 0
  • Subject Code :



For this assignment, you are required to complete this RMarkdown file and submit it to the dropbox on the MGTA604 MyLearningSpace page prior to the deadline of 4pm on December 14th.

  • Please note that this assignment is to be done independently. You cannot discuss the specifics with your peers or any other individuals directly or via other means (e.g. discussion groups).
    • You can discuss general methods and techniques.
    • If you have any questions about what you can or cannot discuss with peers the instructor is happy to answer them.
    • Note: The instructor is a resource and you are welcome to ask questions. Questions that can be answered will be posted to the course news as an FAQ.
    • Note: you are welcome to borrow code from this course and the internet. If you borrow any significant blocks of code you must cite the website or course
  • Your code should be readable:
    • It should be easy for someone new to take a quick look at your code and understand how it works.
    • Don’t be afraid to break your code into more lines (define variables for intermediate steps).
    • Include comments to briefly explain what a line of code is doing (added into your code # comment).
  • Your code should be runnable.
    • I should be able to run the RMarkdown file with the provided .csv data files and get results consistent with the report.
  • Your writing should be clear and well organized.
    • Write your answers in the space provided and indicated with ‘…’
    • Ensure that you write your answers in a professional manner.
    • Be precise and concise.
    • Ensure that you hand in the compiled html file as well as the RMD file.

Ensure that you answer each question.

The Case

As a developer of an app for displaying future gasoline prices to the residents of Houston, you are interested in acquiring a deep understanding of the dynamics underlying local gasoline prices.

The Dataset

Your dataset consists of a daily mean gasoline prices for ten cities in the US and Canada for the period from 2015-2018. The data also indicates whether or not a hurricane has made landfall within the state on that particular day. Note that this data cannot be shared in raw format beyond the students in this class.

In addition to this data, there is a separate data file which provides information on refiner production, oil prices and other potentially informative variables. This data was sourced from EIA.gov.

## Warning: The package `vctrs` (>= 0.3.8) is required as of rlang 1.0.0.
## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'tibble'
## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'pillar'
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##     filter, lag
## The following objects are masked from 'package:base':
##     intersect, setdiff, setequal, union
# load financials
dta_gas <- read.csv("a2_gas_prices.csv")
dta_prod <- read.csv("eia_gas_prod.csv")
dta_oil <- read.csv("eia_oil_price.csv")
dta_cons <- read.csv("eia_gas_consumed.csv")
dta_hurr <- read.csv("hurricanes.csv")

Additional functions

Preliminary Question 1.

Please add a brief comment to the addLags function where … is present which inform a reader of the code regarding what each line of the code accomplishes.

# error functions
MSE <- function(errs) {  return(mean(errs*errs))  }
MAE <- function(errs) {  return(mean(abs(errs)))  }

# add lags to the dataset dta
# var is the column/variable for which lags should be added
# lags is a list of lag lengths 
# e.g. c(1,3,4) will add lags of length 1, 3 and 4
# NOTE: addLags assumes a univariate dataset which is preordered by time.
# If you wish to apply this to a dataset containing multiple different objects
# you will need to apply this function once for each object and use rbind to 
# append the results.
addLags <- function(dta,var,lags)
  ldta <- dta     # ...
  # ...
  for (l in lags)     
    ldta <- cbind(ldta,lag(dta[var],l))    # ...
    names(ldta)[ncol(ldta)] <- paste(var,"_l",as.character(l),sep="")    # ...
  # ...

sumLags <- function(dta,var,lags)
  ldta <- dta     # ...
  lcol <- paste(var,"_lsum",sep="")
  ldta[,lcol] <- 0 
  # ...
  for (l in lags)     
      new_lag <- lag(dta[var],l)
      new_lag[is.na(new_lag)] <- 0
      ldta[,lcol] <- ldta[,lcol]+new_lag
  # ...

# out of sample validation (rolling horizon)
# f is the formula
# y_name is the dependent variable in dta
# dta is the dataframe of observations
# start - finish is the range of observations in dta which indicate the current horizon
# the function fits with data up to the horizon and evaluates with the next
# observation after the horizon
tsValidate <- function(f, y_name, dta, start, finish,errf=MSE)
  errs <- start:finish*0 # init a new vector to store errors
  e = 1
  # repeat the fit/eval cycle for each observation in start-finish
  for (s in start:finish) 
    fit_dta <- dta[1:s,]
    fit <- lm(f, data=fit_dta)
    yhat <- predict(fit, dta[s+1,])
    err <- yhat-dta[s+1,y_name]
    errs[e] <- err
    e <- e+1
  # apply the error function and return the value


We now create a single dataset for the oil and gas market eia data.

dta_xvars <- merge(dta_prod, dta_oil, by="date")
dta_xvars <- merge(dta_xvars, dta_cons, by="date")

Add the hurricane data to the gasoline data.

dta_gas_full <- merge(dta_gas, dta_hurr, by=c("Date","RegionID"),all.x=TRUE)
dta_gas_full[is.na(dta_gas_full$hurricane),"hurricane"] <- 0

Problem 1: Timeseries modeling and forecasting

Using linear regression, you will perform timeseries forecasting for the next period Houston gasoline price.

Forecasting Question 1 - Variable selection

Using the provided data, identify and justify four variables which you believe will be effective forecasters for the Houston gasoline price. (2 paragraphs of 4-6 sentences)

answer here

Forecasting Question 2 - Creating the dataset

Create a single dataframe for the analysis containing the dependent variable, these 4 forecasting variables and any other variables you would like to include in the your forecasting models.

## insert code here
# dta_houston <- ...

Forecasting Question 3

Provide visualizations for two of the forecasting variables which will be useful to the reader. Describe in words the visualization and what the reader should notice. (2-4 sentences).

Visualization 1

# code for visualization 1


Visualization 2

# code for visualization 2


Forecasting Question 4 - Explanatory analysis

Fit a linear model using all forecasting variables. Assess the coefficients using robust standard errors and evaluate the results from an explanatory perspective. (3-5 sentences)

# code for fitting and evaluating the full model


Forecasting Question 5 - Forecasting analysis

Fit three forecasting models which use different combinations of variables. In model 1 use the full specification. In model 2 remove one or more of what you think are the strongest predictors and in model 3 remove one or more of what you think are the weakest predictors. Evaluate the out-of-sample (testing) error for these models with both the MSE and MAE. Display the results using an attractive bar chart.

# code for fitting and evaluating forecasting models

description of different specifications


Forecasting Question 6 - Forecasting analysis interpretation

Describe the results from Question 5, and evaluate if the best forecaster in the following context: You want to use your forecaster to drive an app called “Fill Up Now Or Finish Your BBQ” which will provide recommendations to the Houston public about whether they should fill up their cars now or wait until tomorrow. (4 sentences)

Forecasting Question 7 - Stationarity of the process

Evaluate the presence of stochastic trends in the Houston gasoline price. Use an appropriate version of the ADF test. Discuss whether the process is stationary and its implications to your forecasting.(3-4 sentences)

# code for adf


Problem 2: Natural experiment and panel data

The second problem is that you are worried that hurricanes and other large events can cause havoc with your forecasts. Observing that there have been a handful of hurricanes affecting different regions in the Gulf states over the horizon of your data, you would like to use these events to study the following research question: Do hurricanes result in higher gasoline prices?

Natural experiment Question 1 - Quasi-experimental design

For the above research question describe a natural experiment that can be evaluated with the provided data. The design should be a differences-in-differences design and should include a description of the treatment and how the control group and treatment group would be defined (be sure to justify your decisions). (4-6 sentences)


Natural experiment Question 2 - Panel data methods

As discussed in the panel data class, panel data methods using a treatment variable and entity/time fixed effects can be used to implement a difference-in-difference design. Describe the treatment variable for your design and prepare the data for the linear model (note that in addition to defining the treatment variable you should make sure that time and entity are factor variables).


# define panel dataframe
# dta_panel <-

Natural experiment Question 3 - Analysis

Create, run and evaluate the results for the linear model. Remember to use clustered standard errors. Evaluate the coefficient estimates and hypothesis tests. (2-4 sentences)

# run the analysis
# fit <- lm(formula, data=dta_panel)

evaluate results

Natural experiment Question 4 - Interpretation

Interpret the results from the perspective of “Fill Up Now Or Finish Your BBQ” app designer. (3-5 sentences)


  • Uploaded By : Katthy Wills
  • Posted on : December 12th, 2022
  • Downloads : 0
  • Views : 161

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan


80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing


30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%


20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more