Exploratory Data Analysis and Statistical Inference: Three Study Questions Addressed Using Stata
- Subject Code :
HSH746-HSH946
Question 1
The Framingham heart study is a longitudinal prospective study of the cause or origin of cardiovascular disease among a population of individuals in the community of Framingham, Massachusetts, USA. The primary objective of the study was to identify the common factors/characteristics that contribute to cardiovascular disease in the Framingham, Massachusetts general population. Participants were followed over a long period to monitor their cardiovascular health. To sample the population, they considered certain key characteristics relevant to cardiovascular health (e.g. age and sex) and the fact that each member had a chance of being chosen into the study.(Total Marks:5)
- What is the target population of interest? (0.5 marks)
- What sampling method(s) would be most appropriate and why (2.5 marks)
- If the most appropriate sampling method is used in (b) above, would you consider the study’s findings to be generalisable to the general population of Framingham, Massachusetts? Give a reason why. (1 mark)
- Would you consider the study’s findings to be generalisable to the general population of the USA? Give a reason why. (1 mark)
Question2
Read the following data description and answer the following questions. A study collected data on GP visits for 500 adults, aged 45 years and above. The data for this study can be found in the data set AT1_GPvisits data.(Total Marks: 5)
The variables in the data set include the following:
Variable |
Description |
Units |
Range or count |
id |
Respondentindividual id |
|
1-500 |
sex |
Sex of respondent |
1 = Male 2 = Female |
n = 202 n = 297 |
age |
Respondent age |
Years |
45 - 79 |
older |
Respondent 65 years and older |
1 = yes 0 = no |
n = 344 n = 155 |
GPvisit |
Respondent visited GP |
1 = yes 0 = no |
n = 60 n = 440 |
NGP_visits6m |
Respondentnumber of GP visits in the past 6 months |
counts |
0 -14 |
The data is synthetic data, you may reference them in your answers as coming from assignment 1 GP_visitsstudy.
- In this question, we will focus on an exploratory analysis of the data. Check all individual variables and associated variables for any invalid and/or inconsistent values and take appropriate action.Clearly explain each step. (3.5 marks)
- Older peoplehave been reported to visit the GP more frequently than younger people. Indicate whether this is true for our sample and use statistics to support your answer.Hint use the GPvisits variable and report stats to 1 decimal. (1.5 mark)
Question 3
A study in Australia collected weight for 300 full term newborn babies (grams). The data for this study can be found in the data set AT1_newborn_weight data.(Total Marks: 10)
- In Stata, using the drop-down menu, create a histogram of newborn weight(grams). Adjust the binwidthto 200, suggest # ticks = 5 for major ticks, suggest # between major ticks = 5 for minor ticksand include height labels. Give the graph an appropriate title and footnote. (2.5marks)
- Is the distribution of newborn weight symmetric or not? Give a reason why.Report statistics to 1 decimal. (1.5 mark)
- Using the histogram, what is the probability that a newborn chosen at random from this sample will have a newborn weight greater than 3800g?Report statistics to 2 decimals(1 mark)
- There is evidence of a strong association of birth weight with infant mortality, with birthweight shown to be a determinant of infant survival. Suppose you are now interested to categorise the newborns into different weight groups based on their birth weight, using the following criteria.
Low birth weight:birth weight <2500g>Normal birth weight: birth weight between 2500g and 4000g
High birth weight: birthweight >4000g
Generate a new variable (bweight_group) based on the criteria above. (Hint: generate bweight_group=. then replace the variable using the criteria above). (1 mark)
Add value labels as follow:
bweight_group= 1 for Low birth weight
bweight_group= 2 for Normal birth weight
bweight_group= 3 for High birth weight - Add value labels, and tabulate bweight_group (1 mark)
- What percentage of newborn babies were classified as low birth weight (report to 1 decimal)?(0.5 marks)
- How does the percentage of newborns classified as low birth weight compare to newbornsclassified as high birth weight?(1 mark)
- What are the types of variables for examples illustrated above concerning newborn weight? (1 mark)
- Which variable type is more informative? (0.5 marks)
Are you struggling to keep up with the demands of your academic journey? Don't worry, we've got your back!
Exam Question Bank is your trusted partner in achieving academic excellence for all kind of technical and non-technical subjects. Our comprehensive range of academic services is designed to cater to students at every level. Whether you're a high school student, a college undergraduate, or pursuing advanced studies, we have the expertise and resources to support you.
To connect with expert and ask your query click here Exam Question Bank