HSH946 Exploratory Analysis of Health-Risk Dataset Assignment 1
- Subject Code :
HSH946
- University :
University of Sydney Exam Question Bank is not sponsored or endorsed by this college or university.
- Country :
Australia
BIOSTATISTICS 1
ASSIGNMENT 1 (20% OF TOTAL MARK)
Due date: 12th April 2023
Instructions
Please note: this assessment task must be all your own work. Please do not discuss questions and answers in detail with your fellow students.
Assignments must be submitted on-line via the assignment folder in the unit site before 8 pm on 12th April 2023. Assignments must be submitted in a Microsoft Word document or an editable pdf.
Some of the questions may require calculations. The formula you use, and your calculations should be included with your answers. If the final answer is incorrect, assessors can determine whether this is because of a simple calculation error (small loss of marks) or because of an incorrect formula or incorrect figures.
Some of the questions may require calculations using Stata. Where you have used Stata for calculations, you should copy the Stata commands and output from the Stata results screen and paste them into your assignment so that the assessor can see how you have derived your answer. Note: this Stata output is required in addition to your answer to the question. Simply pasting in the Stata output will not be considered an adequate answer on its own. Note that all tables and graphs in this assignment should be presented with appropriate headings and footnotes.
Please submit two versions of the assignment to the assignment submission folder. The first should include the original assessment questions (to make sure you have answered all questions) and the second should exclude the original questions to check assignment originality with Turnitin.
If applicable, acknowledge when and how you have used any AI tools for your assessment.
This assignment is worth 20% of the final mark for HSH746/HSH946 and the marks allocated for each question are shown.
Students should ensure that they keep a spare copy of their work.
Student Name:
Student ID Number:
- Read the following data description and answer the following questions:
A study collected data on several health risk factors for 1,850 respondents, aged 44 years and above. The data for this study is in the data set AT1_healthrisk data.
The variables in the data set include the following
Variable |
Description |
Units |
Range or count |
Age |
Age of study participant |
Years |
4479 |
Education |
Highest level of educational qualification of the participant |
1=Less than 12 class of schooling 2=High School Diploma 3=College/Vocational degree 4= Honours graduate degree 5= Postgraduate degree |
n=40 n=405 n=880 n=450 n=75 |
BMI |
Body Mass Index, weight in kilograms/height meters squared |
Kg/m2 |
1551 |
Smoking |
Whether or not the participant is a current smoker |
0=No |
n=1,626 |
Alcohol |
Whether or not the participant currently consumes alcohol |
0=No |
n=1,082 |
Drinksperweek |
Total number of standard drinks of alcohol per week |
0=Not current alcohol drinker 1-20 drinks per week |
n=1,080 n=770 |
The data is synthetic data, you may reference them in your answers as coming from assignment 1 health risk study.
- In this question, we will focus on an exploratory analysis of the data. Check all individual variables and associated variables for any invalid and/or inconsistent values and take appropriate action. (3 marks).
Following the Australian guidelines to reduce health risks from drinking alcohol is advised to reduce ones risk from drinking alcohol. The guidelines are based on evidence-based research. The guidelines are frequently reviewed by the National Health and Medical Research Council (NHMRC). Guideline 1 is shown below.
- Based on the NHMRC alcohol guidelines on the recommended number of standard drinks a week, what percentage of the respondents who indicated to currently consume alcohol do so at a risky level? (1.5 marks)
Hint: generate a new variable (riskyalcohol) for current alcohol consumers.
- Suppose you are now interested to highlight the different levels of alcohol risk for everyone included in this sample using the following criteria.
High risk level: consume > 10 standard drinks per week
Medium risk level: consume 5-10 standard drinks per week
Low risk level: consume less than 5 standard drinks per week
No alcohol risk: do not consume alcohol (alcohol=0)
You generate a new variable (different_alcoholrisk) based on the criteria above.
Hint: generate different_alcoholrisk=. then replace the variable using the criteria above starting from high risk level, followed by medium risk, low risk and lastly no alcohol risk.
Add value labels as follows:
different_alcoholrisk =0 for non-alcohol consumers (alcohol=0)
different_alcoholrisk =1 for low risk
different_alcoholrisk =2 for medium risk
different_alcoholrisk =3 for high risk.
Tabulate different_alcoholrisk variable to estimate the percentage of the respondents who indicated to currently consume alcohol at a medium risk and low risk level. What do the results tell us (report percentage to 1 decimal) (3 marks)
- Some studies have shown that current smokers are more likely to drink alcohol than non-smokers. Indicate whether this is true for our sample (1/2 mark) and use stats to support your answer (1 mark). (total 1.5 marks)
- We have looked at the different types of variables - quantitative-continuous, quantitative-discrete, categorical-nominal, categorical-ordinal. Identify the following types of variables. (5 marks).
Variable |
Variable measure |
Variable type |
Median weekly household income |
Australian Dollars rounded to the nearest $10 |
|
Highest level of education qualification |
1=Less than 12 class of schooling 2=High School Diploma 3=College/Vocational degree 4= Honours graduate degree 5= Postgraduate degree |
|
Birthweight |
Extremely low (<1000g> |
|
Waist measurement |
Centimetres |
|
State of residence |
NSW, Vic, Queensland, Tasmania |
- A study collected data on height in centimetres for 200 adults. The data for this study is AT1_heights The researchers are interested in estimating probabilities; hence the frequency distribution of height will be used. Your task is to draw a histogram with height labels. (7.5 marks)
- Create a graph of height for all respondents in the study. Adjust the bin number to 22 in Stata. Give the graph an appropriate title and footnote (2.5 marks).
- Height (cm) rounded off to a whole number is (choose 1) (1 mark)
- Right-skewed
- Left-skewed
- Approximately symmetric.
- What would be the appropriate measure of location and spread for height (cm) (1 mark). Use summary stats to support your answer (1.5 marks). (Total 2.5marks
- What is the value of height (cm) for the measure of location and the measure of spread (rounded off to 1 decimal) (1 mark)
- What is the probability that someone chosen at random from this sample will have a height greater than 180cm (1/2 mark)
- Read the study description below and identify which is the most appropriate sampling strategy and give reasons why (3.5 marks).
- A researcher wishes to investigate the level of job stress during the COVID-19 lockdown for academics that were employed at different universities in Victoria. The researcher wishes to interview the academics in person, hence will carry out the interviews at the different universities in Victoria considering the different faculties in each university (2 marks).
- For the study described in question 4 part (a), what is the target population? (1/2 mark)
- Suppose that the researcher decides to get the payroll of all currently employed academics at the different faculties of the different universities in Victoria. Would the resultant sample be representative of the target population? Give a reason why. (1 mark)
End of assignment questions