HSH746/HSH946 BIOSTATISTICS 1ASSIGNMENT 2 (30% OF TOTAL MARK)
HSH746/HSH946 BIOSTATISTICS 1ASSIGNMENT 2 (30% OF TOTAL MARK)
Due date: 19th January 2024
InstructionsPlease note: this assessment task must be all your own work. Please do not discuss questions and answers in detail with your fellow students.
Assignments must be submitted on-line via the assignment folder in the unit site in Deakin sync before 8 pm on Friday 19th January 2024. Assignments must be submitted in a Microsoft Word document or an editable pdf.
Some of the questions may require calculations. The formula you use, and your calculations should be included with your answers. If the final answer is incorrect, assessors can determine whether this is because of a simple calculation error (small loss of marks) or because of an incorrect formula or incorrect figures.
Some of the questions may require calculations using Stata. Where you have used Stata for calculations, you should copy the Stata commands and output from the Stata results screen and paste them into your assignment so that the assessor can see how you have derived your answer. Note: this Stata output is required in addition to your answer to the question. Simply pasting in the Stata output will not be considered an adequate answer on its own. Note that all tables and graphs in this assignment should be presented with appropriate headings and footnotes.
Please submit two versions of the assignment to the assignment submission folder. The first should include the original assessment questions (to make sure you have answered all questions) and the second should exclude the original questions to check assignment originality with Turnitin.
If applicable, acknowledge when and how you have used any AI tools for your assessment.
This assignment is worth 30% of the final mark for HSH746/HSH946 and the marks allocated for each question are shown.
Students should ensure that they keep a spare copy of their work.
Student Name:
Student ID Number:
Read the following data description and answer questions 1. (Total Marks 12)
The table below represents the patient satisfaction score (out of 100) of patients after attending an appointment at one of the new units at a hospital in Melbourne. The sample was selected so that the data are statistically independent. These are synthetic data, but you may reference them in your answers as coming from assignment 2 data: patient satisfaction data. These data can be found in the file Patient Satisfaction.csv.
ID Patient Satisfaction Score
1 15.4
2 27.5
3 33.0
4 46.2
5 59.4
6 18.7
7 28.6
8 34.1
9 47.3
10 59.4
11 22.0
12 28.6
13 34.1
14 50.6
15 69.3
16 23.1
17 28.6
18 35.2
19 52.8
20 73.7
21 24.2
22 29.7
23 38.5
24 57.2
25 91.3
26 26.4
27 47.3
28 34.1
When considering the Patient Satisfaction data, will we use the normal distribution or the t distribution to set up confidence intervals? Clearly state the reasons for your decision and state the conditions that must hold for you to use the suggested distribution. (2 marks)
Test whether the Patient satisfaction score follows an approximate normal distribution. If the data does not follow a normal distribution indicate how the data can be normalised and show if the method you used was effective. Clearly describe your steps and reasoning. (5 marks)
Calculate an appropriate estimate of the mean and its associated 95% confidence interval based on this sample. Cite your answers to 1 decimal place. (2 marks) Note: in practice, the final mean and 95% CI is always presented in the original scale (patient satisfaction score). In question 1c) you are not required to do so.
Select a random sample from this data set, containing 8 data points, and set up the 95% confidence intervals for this smaller sample. Indicate how this 95% confidence interval compares to the 95% confidence interval in question (c). Give a reason to support your answer. (2 marks).
If you were to re-import the data and draw another sample of 8 participants. Would your 95% confidence interval be the same as in question d? Give reasons to support your answer. (1 mark)
Answer question 2 using the following information. (Total marks 7)
Prevalence of iron deficiency are high even among healthy young women. You carried out a study which recorded the level of serum iron in women. The study has two groups an intervention group, who has received a supplement which regulates the serum iron, and a control group. The two groups were chosen to be statistically independent, and the serum iron data was shown to be approximately normally distributed.
The data from this study is in the file Iron study.csv.
The file has the following variables:
ID:Study participant id
Serum iron:Serum iron recording
Group: Study group, recorded as 1 = control and 2 = intervention.
You may refer to the data source as assignment 2: iron study.
Test if the mean serum iron levels for the intervention groups is different to the mean serum iron levels for the control groups, stating the test statistic to 1 decimal place. As part of your answer, you should:
State the null and alternative hypothesis (1 mark)
State what is the appropriate statistical test to use and justify why (3.5 marks)
Perform the hypothesis test and interpret the results. (2.5 marks)
Read the following data description and answer questions 3 and 4. (Total Marks 11)
The data set Cold.csv contains the results of a study examining the relationship between taking cold medication and recovery from a cold. It contains two variables:
Medicine taken: This variable takes the value 1 if the person has taken medicine and 0 if they have not taken medicine.
Recover from cold: This variable takes the value 1 if the person recovered from their cold within 3 days and 0 if they took longer than 3 days to recover.
This sample was chosen so that the individual results are statistically independent. These are synthetic data, but you may reference them in your answers as coming from assignment 2 Cold data. The data set has 1000 observations.
Question 3
What proportion of participants who recovered within 3 days did take cold medicine compared to those who did not take cold medicine? Report the proportion with the associated 99% confidence interval. Report your results in % and keep 1 decimal place (3 marks).
What is your interpretation based on these confidence intervals? (1 mark)
How would adjusting the limit to a 90% confidence interval affect the overall interval? (1 mark)
Question 4
Can you use a chi square test with these data? Why? (1.5 marks)
Test the hypothesis that that there is no association between medicine taken and cold recovery, stating the test statistic to 1 decimal place. As part of your answer, you should:
Specify the null and alternative hypotheses. (1 mark)
Correctly report and interpret the results of the hypothesis test. (3.5 marks)
END OF ASSIGNMENT QUESTIONS