diff_months: 11

Assessment 3 - PST 2: Questions Due 24 September

Download Solution Now
Added on: 2024-11-22 05:00:25
Order Code: SA Student Nishadi Statistics Assignment(9_23_36824_618)
Question Task Id: 495535

Assessment 3 - PST 2: Questions Due 24 September

Background: In this assignment, you will apply your learning to further analyse the data on road accidents in Victoria and its connection with weather events. This activity builds on Assignment 1; you may want to review your assignment 1 solution and identify any reusable code. Please start early so that you can identify any skill/knowledge gap and seek support from the teaching staff and other students.

Application scenario: You work in a data science team that tries to model the road accidents in an area to improve the prediction for rescue services demand. For your convenience, you are provided with the following data links, but you are encouraged to include other relevant data for your analyses.

The dataset car_accidents_victoria (CSV File) is the same used in Assignment 1

The daily temperature and precipitation data for the region accessible through the NOAA data APIs

https://www.ncdc.noaa.gov/cdo-web/webservices/v2Of particular relevance is the Global Historical Climatology Network - Daily data

https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txtQ1: Source weather data (10 points)

From Assignment 1, you have processed data for the road accidents of different types in a given region of Victoria. We still need to find local weather data from the same period. You are encouraged to find weather data online. Besides the NOAA data, you may also use data from the Bureau of Meteorology historical weather observations and statistics. (The NOAA Climate Data might be easier to process, also a full list of weather stations is provided here: https://www.ncei.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt )

Answer the following questions:

Q1.1: Which data source do you plan to use? Justify your decision. (4 points)

Q1.2: From the data source identified, download daily temperature and precipitation data for the region during the relevant time period. (Hint: If you download data from NOAA https://www.ncdc.noaa.gov/cdo-web/, you need to request an NOAA web service token for accessing the data.) (2 points)

Q1.3: Answer the following questions (Provide the output from your R Studio):

How many rows are in your local weather data? (2 points)

What time period does the data cover? (2 points)

Q2: Model planning (10 points)

Careful planning is essential for a successful modelling effort. Please answer the following planning questions.

Q2.1. Model planning:

What is the main goal of your model, how it will be used? (1 point)

How it will be relevant to the emergency services demand?(1 point)

Who are the potential users of your model? (1 point)

Q2.2. Relationship and data:

What relationship do you plan to model or what do you want to predict? (1 point)

What is the response variable? (1 point)

What are the predictor variables? (1 point)

Will the variables in your model be routinely collected and made available soon enough for prediction? (1 point)

As you are likely to build your model on historical data, will the data in the future have similar characteristics? (1 point)

Q2.3. What statistical method(s) will be applied to generate the model? Why? (2 points)

Q3: Model the number of road traffic accidents using the car accident data (30 points)

In this question you will build a model based on the car_accidents_victoria dataset alone (not involving the weather data). We will start with simple models and gradually make them more complex and improve them. We will focus on the road traffic accident variable(s) that you defined in Assignment 1. Lets denote it Y.

Randomly pick a region from the road traffic accidents data.

Q3.1 Which region do you pick? (1 point)

Q3.2 Fit a linear model for Y using date as the predictor variable. Plot the fitted values and the residuals. Assess the model fit. Is a linear function sufficient for modelling the trend of Y? Support your conclusion with plots. (4 points)

Q3.3 As we are not interested in the trend itself, relax the linearity assumption by fitting a generalised additive model (GAM). Assess the model fit. Do you see patterns in the residuals indicating insufficient model fit? (5 points)

Q3.4 Augment the model to incorporate the weekly variations. (5 points)

Q3.5 Compare the models using the Akaike information criterion (AIC). Report the best-fitted model through coefficient estimates and/or plots. (5 points)

Q3.6 Analyse the residuals. Do you see any correlation patterns among the residuals? (4 points)

Q3.7 What data type is your day-of-the-week variable? (3 points) Does the data type of this variable affect the model fit? (3 points)

Q4 Heatwaves, precipitation and road traffic accidents (30 points)

The connection between weather and the road traffic accidents is widely reported. In this task, you will try to measure the heatwave and assess its impact on the road accident statistics. Accordingly, you will be using the car_accidents_victoria dataset together with the local weather data.

Q4.1: Measuring heatwave (8 points)

John Nairn and Robert Fawcett from the Australian Bureau of Meteorology have proposed a measure for the heatwave, called the excess heat factor (EHF). Read the following article and summarise your understanding in terms of the definition of the EHF. https://dx.doi.org/10.3390%2Fijerph120100227 (3 points)

Use the NOAA data to calculate the daily EHF values for the area you chose during the relevant time period. Plot the daily EHF values. (5 points)

Q4.2: Models with EHF (7 points)

Use the EHF as an additional predictor to augment the model(s) that you fitted before. Report the estimated effect of the EHF on the road accident numbers. (3 points)

Does the extra predictor improve the model fit? (1 point)

What conclusions can you draw? (3 points)

Q4.3: Research question - extra weather features (15 points)

Is EHF a good predictor for road traffic accidents? Can you think of extra weather features that may be more predictive of road traffic accident numbers? (5 points)

Try incorporating your feature into the model and see if it improves the model fit. Use AIC to prove your point. (10 points)

Q5: Reflection (20 points)

In the form of a short report (500-1000 words, 1-2 pages), answer the following questions:

Q5.1: We used some historical data to fit regression models. What additional data could be used to improve your model? (5 points)

Q5.2: Regression models can be used for two main objectives: 1) understanding a process, and 2) making predictions.

a) In this assignment, do we have reasons to choose one objective over the other? (5 points)

b) How would the selection of one of these objectives affect our model? (5 points)

Q5.3: Overall, have your analyses answered the objective/question that you set out to answer? (5 points)

  • Uploaded By : Pooja Dhaka
  • Posted on : November 22nd, 2024
  • Downloads : 0
  • Views : 147

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more