diff_months: 12

MXN500: Statistical Data Analysis Assessment

Download Solution Now
Added on: 2023-05-25 09:33:35
Order Code: clt316832
Question Task Id: 0
  • Subject Code :

    MXN500

  • Country :

    Australia

Introduction

Background

Large scale climate drivers such as the El Niño Southern Oscillation (ENSO) are known to have an impact on Australian rainfall patterns. ENSO has three phases, El Niño, Neutral and La Niña. Generally, in the La Niña phase of ENSO conditions are cooler and wetter along the Eastern Australian coast. In comparison during the El Niño phase conditions are hotter and drier. In this problem solving task, you will be using your knowledge of regression to explore the relationship between ENSO and Australian rainfall.

Note that it is not possible to measure the strength of ENSO directly, so in regression models the Southern Oscillation Index (SOI) is commonly used to represent the strength of ENSO. The SOI is a climate index that measures the normalised pressure difference between Taihiti and Darwin. You do not need to provide units when displaying SOI on a plot axis as it is an index. ENSO is considered to be in the La Niña phase when there are sustained SOI values above 7, the El Niño phase when there are sustained SOI values below -7, and the Neutral phase otherwise. A csv file containing the monthly SOI values can be obtained from blackboard. More details about ENSO and SOI can be found on the Bureau of Meteorology website (http://www.bom.gov.au/climate/enso).

Datasets

Two datasets are provided for this assignment. Similarly to the data used in problem solving task 1, there is a precipitation dataset, total_seasonal_rainfall.csv. This dataset contains the seasonal rainfall totals recorded at BRISBANE AERO station and includes variables:

  • The GHCN Daily station id (character)
  • The GHCN Daily station name (character)
  • The Year of the observation (numeric)
  • The Season of the observation (ordinal, categorical)

In the dataset, seasonal_soi_data.csv, the variables included are:

  • The Season of the observation (ordinal, categorical)
  • The Year of the observation (numeric)
  • The SeasonalSOI is the mean seasonal SOI value (numeric)
  • The Phase of the ENSO (ordinal, categorical).

Preprocessing

Question 1.1

(1 mark)

Download the the files total_seasonal_rainfall.csv and seasonal_soi_data.csv from blackboard. Combine all the variables from in these two datasets into the single dataset total_seasonal_rainfall. Print the first three rows to show the form of your new dataset.

Question 1.2

(3 marks)

Convert the relevant variables to factors in your dataset. Be sure to set the factor levels appropriately for later analysis. Show your code and show the factor levels.

Exploratory Visualisaion

For each season, we are interested in whether a simple linear regression

could be used to model the relationship between the mean seasonal SOI value and total seasonal precipitation.

Question 2.1

(5 marks)

Create a visualisation that explores the relationship between SeasonalSOI and total_seasonal_prcp for each season. Use geom_smooth() to add the null model and the linear model from equation (1).

Question 2.2

(3 marks)

Using your visualisation, for which seasons would you expect there to be a significant linear relationship between total seasonal precipitation and the mean seasonal SOI value? Give detailed reasoning.

Simple linear Regression

For the season with the strongest relationship as determined in the Exploratory Analysis section, fit the regression model from equation (1) and answer the following questions.

Question 3.1

(2 marks)

Fill in the blanks in the following sentence so that it refers to the terms in the regression model.

For the BRISBANE AERO Station and the season, a linear model was specified to model how the total seasonal precipitation, , is related to the mean seasonal SOI value, . The parameter describes the rate of change in the total seasonal precipitation with an increase in mean seasonal SOI value. The parameter represents the total seasonal precipitation when the mean seasonal SOI value is 0.

Question 3.2

(2 marks)

Write down your linear model substituting the parameter values into the equation.

Question 3.3

(2 marks)

Provide a 95% confidence interval for the parameter estimates.

Question 3.4

(1 mark)

How much variability in the data is explained by this model?

Question 3.5

(4 marks)

Visualise the fitted values compared with the residuals, and visualise the standardised quantiles of the residuals compared with the theoretical quantiles. Discuss the validity of the underlying assumptions of linear regression.

Question 3.6

(4 marks)

Print out a summary of your linear model and interpret the results. As part of this you must discuss the physical meaning of the model, which parameters are significant and whether the linear model is significantly different compared to the null model.

Question 3.7

(2 marks)

Discuss whether your fitted model is a good model to use to predict seasonal rainfall totals using the mean seasonal SOI.

Polynomial Lines of Best Fit

Many climate scientists hypothesise that when it comes to rainfall, that wet can get wetter, but dry can’t get drier. In other words, how Australian rainfall responds to the different phases of ENSO may not be equal in both La Niña and El Niño phases. For this reason, one might want to check if polynomial regression better suits the data.

Question 4.1

(2 marks)

For BRISBANE AERO and the season you chose earlier, fit a linear regression using polynomial explanatory variables of up to order 2 and the SeasonalSOI. Write down the equation with your estimated parameter values.

Question 4.2

(3 marks)

Print out a summary of your fitted model, interpret the significance of the results and explain the related the physical meaning.

Question 4.3

(3 marks)

Create a prediction interval for a mean seasonal SOI value of 25 and a mean seasonal SOI value of -25. Comment on using this model for prediction in relation to your physical understanding of rainfall and your understanding of extrapolation.

Question 4.4

(3 marks)

Decide whether a linear or polynomial regression is preferred using a statistical test. Be sure to describe your statistical test in detail.

Linear Regression with Categorical Explanatory Variables

Within the analysis so far the role of ENSO phases has been modelled solely using the SOI. For model simplicity and for physical understanding, it may be useful to consider only the phases.

Question 5.1

(3 marks)

Consider again BRISBANE AERO and your chosen season. To better understand the role of different Phases of ENSO, fit a categorical regression model of the from

Print the model summary.

Question 5.2

(1 mark) Write down the linear model substituting the parameter values into the equation.

Question 5.3

(3 marks)

Interpret the significance of the results of your fitted model and explain the related the physical meaning.

Question 5.4

(3 marks)

Given the results of this categorical regression, is this in support of your choice of linear or polynomial regression?

Code

Readability and clarity of code

(5 marks)
This assignment is primarily about your ability to perform a statistical data analysis, but the tutors will award marks based on how clear and readable your code is. To help the tutors with this, please make sure to comment your code for each of the different questions.

  • Uploaded By : Katthy Wills
  • Posted on : May 25th, 2023
  • Downloads : 0
  • Views : 111

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more