diff_months: 27

M348 Applied statistical modelling Assignment

Download Solution Now
Added on: 2023-01-11 05:58:30
Order Code: 483370
Question Task Id: 0
  • Subject Code :

    M348

  • Country :

    United Kingdom

You should create a Jupyter notebook for your solutions to the TMAquestions. This will need to be submitted as a zip file via the Universitysonline TMA/EMA service. Before starting your work, please read theSpecific guidance for M348 assignments and other guidance providedon the Assessment page of the module website.
This TMA is marked out of 50. Your overall score for this TMA will be thesum of your marks for each question.The marks allocated to each part of each question are indicated in brackets
in the margin.If you have a disability that makes it difficult for you to attempt any of thesequestions, then please contact your Student Support Team or your tutor foradvice.

Question 1 20 marks
Note: Your solution should be contained in a Jupyter notebook. See themodule website for guidance.In a clinical trial on the effectiveness of thrombolytic therapy for acuteischemic stroke, 312 patients who had suffered ischemic stroke were givenintravenous recombinant tissue plasminogen activator (t-PA) and a further312 stroke patients received a placebo. For each patient it was recorded ifthe t-PA or placebo began within 90 minutes of the onset of stroke, or if ittook between 90 and 180 minutes before the patient was treated. Afavourable outcome is defined as one (or both) of the following happeningwithin 24 hours of the onset of stroke:
an improvement of four points over baseline values in the score of theNational Institutes of Health Stroke Scale (NIHSS)
the return to normal neurological function.The results of the trial are summarised in the table below.

(a) Create a data frame in R containing the data above and using thefollowing variable names and levels:
treat: treatment indicator, taking the values 0 (for placebo) and 1(for t-PA)
time: time of treatment after onset of stroke, taking the values 0(for within 90 minutes) and 1 (for between 90 and 180 minutes)
fav: favourable outcome, taking the values 0 (for no) and 1 (foryes). This is the response variable.
Use the table() command to check the data you have createdcorrespond to those given in the table above.(Hint: The rep() function in R with option times may be useful here.
For example, the command rep(c(0,1,0),times=c(5,6,3)) willcreate a vector of five 0s, six 1s and three 0s. Also remember that inNotebook activities 1.10 and 1.19 you saw how to create data frames bycombining vectors.) [5]
(b) Find the best (in terms of the Akaike information criterion (AIC))logistic model, with response variable fav. Explain your approach. [2]
(c) Use a statistical test to check whether the best model in part (b) can beimproved by adding an extra term. [2]

(d) Write down the fitted linear predictor of the best model you found inpart (b). For this model, interpret the estimated parameters; inparticular:
Will treating the patient within 90 minutes of the onset of strokeincrease or decrease the odds of a favourable outcome comparedwith starting treatment between 90 and 180 minutes?
Will treating the patient with t-PA increase or decrease the odds ofa favourable outcome compared with a placebo?Then find the estimated probabilities of a favourable outcome for
patients in the four groups (defined by the four combinations of timeand treat), rounded to two decimal places. Which combination givesthe highest estimated probability of a favourable outcome? [6]
(e) Check if your model from part (b) satisfies the assumptions of a logisticregression model. [5]
Question 2 30 marks
Note: Your solution should be contained in the same Jupyter notebook youused for Question 1. See the module website for guidance, and to downloadthe required data file.
The aim of a study carried out in 2004 at the Royal Berkshire Hospital,Reading, was to investigate the incidence of sore throats in patients who hadundergone surgery. Of particular interest was whether the occurrence of asore throat was affected by the method used to deliver anaesthetic gas, asone of three types of airway device was used on each patient. The responsevariable was binary and indicated whether or not a patient experienced asore throat in the 24-hour period following the operation. Several otherexplanatory variables were also recorded.
The data are given in the data frame soreThroat and stored in the filesoreThroat.RData. The variables are as follows:
age: the patients age (in years)
gender: the gender the patient identifies with, taking the values 0 (formale) and 1 (for female)
duration: the duration of the operation (in minutes)
method: the type of airway device used:
LMA: laryngeal mask airway
ETT: endo-tracheal tube
FM: traditional face mask
lubricated: if lubrication used when inserting airway device, taking thevalues 0 (for no), 1 (for yes) and 9 (for missing data)
sore: occurrence of sore throat after surgery, taking the values 0 (for no)and 1 (for yes). This is the response variable.
(a) Preliminary analysis:
(i) In the data frame soreThroat, all explanatory variables have beensaved as numeric vectors in R. Identify which of the variables arefactors, and make sure that R is treating them as factors.
(Hint: If you need a reminder about how to do this look back atNotebook activities 1.7 and 3.1.) [1]

(ii) Provide appropriate visual summaries for the variables age,gender, duration, method and lubricated, and comment onthese summaries. Where appropriate, provide suggestions formodelling the data, such as using transformations, and point out ifthere is anything you would have done differently if you had beenplanning the study. [7]
(iii) Investigate if having missing data in the variable lubricated mayimpact on your ability to compare the three types of airwaydevices. For subsequent analysis, replace the missing values inlubricated with NA.
(Hint: The ifelse() function in R may be useful here. Forexample, the command ifelse(x==1,2,x) replaces all entries inan object x that are equal to 1 with the value 2 and leaves the
other entries as they are.) [3]
(b) Fit a logistic model with all explanatory variables and comment on theoutput. Use your results from part (a)(iii) to explain any surprisingfeatures. [4]
(c) Speaking with a consultant reveals that for the traditional face mask(FM) method, no lubrication is necessary as no part of the face mask isinserted into the throat. So the missing data arise naturally from thenature of the study.The consultant also tells you that there are two research questions (RQ1and RQ2) to be answered from this study:
RQ1: Are there differences in occurrences of sore throats betweenthe three types of airway device?
RQ2: For the two types of airway device where lubrication has beenstudied, is the application of lubricant associated with lower odds ofgetting a sore throat? Is this the same for both types of device?
(i) Find the best model (in terms of AIC), which may containtwo-factor interactions, that will allow you to answerRQ1.Explain your modelling approach, incorporating the suggestion(s)you made in part (a)(ii), and answer RQ1 using your results.Which type of airway device would you recommend, and why? [5]
(ii) Create a new data frame called soreThroat2, which is a subset ofsoreThroat consisting of the data when method is either ETT orLMA.Find the best model (in terms of AIC), which may containtwo-factor interactions, that will allow you to answer

RQ2.Explain your modelling approach, and answer RQ2 using yourresults.(Hint: As the variables in the new and old data frames are likelyto have the same names, it is recommended that you add theargument data=soreThroat2 to your glm() command in R toavoid confusion.) [5]
(d) Check if the model you selected in part (c)(i) satisfies the modelassumptions of a logistic regression model. [5]

  • Uploaded By : Katthy Wills
  • Posted on : January 11th, 2023
  • Downloads : 0
  • Views : 537

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more