MAE256 T2 2022 Plagiarism Assignment
- Subject Code :
MAE256
General Details
(1) This is an INDIVIDUAL Assignment. We strongly discourage plagiarism, as it will be penalized as much as possible. However, it is not collusion if you discuss the questions with other students, but you need to submit your own original work. Note that we may request that you come in and explain your assignment in person if we feel your assignment is too similar to another students work.
(2) This assignment in total has 30 marks that correspond to 20% of your final grade.
(3) Once completed, you will need to submit your Microsoft Word document via CloudDeakin. You must submit a single file only that contains a cover page with your name and student ID.
If you are submitting your assignment as a PDF document, please ensure that you are also submitting it as a Word document to enable word counting.
Please ensure the Word document is self-contained (i.e. all Excel output tables for summary statistics and regressions, and all figures should be in the word document). You will not need to submit a hard copy.
(4) Whenever you are asked to estimate a regression model, please provide your summary output estimation results in a tabular format from Excel in your Word document (using the copy/paste tool) to evidence the actual regression you run.
Regression Analysis using Cross Section Data
The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of the coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identified from an outbreak in the Chinese city of Wuhan in December 2019, and attempts to contain it there failed, allowing it to spread across the globe. The World Health Organization (WHO) declared a Public Health Emergency of International Concern on 30 January 2020 and a pandemic on 11 March 2020. As of 20 July 2022, the pandemic had caused more than 566 million cases and 6.3 million deaths, making it one of the deadliest in history. As a preventative method, COVID-19 vaccines have been approved and widely distributed in various countries since December 2020. In this assignment, you are provided with a dataset that contains information on the number of COVID-19 cases, the number of deaths caused by COVID-19, as well as some other socio economic characteristics from 107 countries on 6 different continents as of 30th November 2020. (This date is chosen to cover only the pre-vaccinationperiods). You will be examining the relationships among these variables using regression analysis as outlined in the following set of questions. More specifically, the dataset [MAE256 T2 2022 Assignment Data] is created using information from https://ourworldindata.org/explorers/coronavirus-data-explorer and provided on the unit site, and contains the following variables in the same order as presented below:
Variable definitions
continent: the continent of each country in the dataset
country: the name of each country in the dataset
total_cases: total number of infected people as of 20 Nov 2020
new_cases: number of newly infected people on 20 Nov 2020
total_deaths: total number of deaths due to Covid-19 as of 20 Nov 2020
new_deaths: number of new deaths due to Covid-19 on 20 Nov 2020
total_cases_per_million: total number of infected people per million population as of 20 Nov 2020
new_cases_per_million: number of newly infected people per million population on 20 Nov 2020
total_deaths_per_million: total number of deaths per million population due to Covid-19 as of 20 Nov 2020
new_deaths_per_million: number of new deaths per million population due to Covid-19 on 20 Nov 2020
population: population size of the country
median_age: the median age of the population in years
aged_70_older: percent of the population above the age of 70
gdp_per_capita: per capita GDP in thousands of US dollars
human_development_index: a summary measure of average achievement in key dimensions of human development: a long and healthy life, being knowledgeable and having a decent standard of living. Takes a value between 0 and 100.
(i) Present the descriptive statistics of the variables total_cases and total_deaths. Comment on the means and measures of dispersion (standard deviation, skewness, and kurtosis) of these two variables.
(ii) Estimate the following simple regression model of total_deaths on total_cases:
Write down the estimated sample regression function and interpret the slope coefficient for an additional 1000 total cases.
(iii) Now estimate the following multiple regression model
Report your regression results in a sample regression function. Interpret the estimated coefficient of log(gdp_per_capita) and explain whether the sign of the coefficient matches your predictions.
(iv) Using the regression results from (iii), predict the number of total_deaths_per_million for the USA and compare your prediction with the actual number of total_deaths_per_million observed for the USA.
(v) Using the estimated model in (iii), test whether the coefficient of log(gdp_per_capita) is statistically significant at the 5% level of significance.
(vi) Now estimate an extended version of the multiple regression model estimated in (iii) by adding the variables of aged_70_older and human_development_index as in the following equation:
Based on your estimates, how would you interpret the effect of age_70_older and human_development_index on the number of total_deaths_per_million? What can you conclude when you compare the goodness of fit of this regression model and that of the regression model in part (iii)?
(vii) Using your estimates from (vi), test if aged_70_older and human_development_index are individually statistically significant at the 5%
level of significance. Also, test whether these two variables are jointly significant at the 5% level of significance.
(viii) Now create a dummy variable indicating whether a country is in Asia or not. Using the variable Asia estimate the following model:
Interpret the meaning of the coefficients for log(population) and Asia.
Test whether the coefficient of Asia is less than or equal to -0.75 against the alternative that it is greater than -0.75 at the 5% level of significance.
(ix) Using the model estimated in (viii), test whether the model is overall statistically significant at the 1% level.