7318AFE Business Data Analytics, T2, 2023

Assignment

DUE DATE: 9:00AM AEST Brisbane time, 9 October 2023

(Online submission via Assignment link under Assessment 2 in the course website)

Instructions:

This is the major assessment item for this course, worth 50% of the overall assessment.

Data files are provided in Excel in the Assessment 2 section of the Canvas course website.

All numerical calculations and graphs/plots should be done using EXCEL, including the workbooks provided, as much as possible.

Your answers should be typed in Word equations, symbols could be hand-written and scanned in. Type the answers below each part of the question. Present the answers to all questions as efficiently as possible in the main assignment with only relevant (Excel) output (eg, plots, regression output etc) inserted as images into the main document. All questions involving the Testing Hypothesis should include all the 6 steps. Assignment with answers to all questions should be submitted in one word file.

The completed assignment must be submitted electronically.

You are required to keep an electronic copy of the submitted assignment and Excel files with output to re-submit, in case the original submission is lost for some reason.

Important Notice:

As this is an individual assessment item, students should submit their individual assignment. If found to have cheated, all submissions involved would receive a mark of zero for this assessment item.

Marking criteria:

For each question, marking will consider the following:

1. Correctness of the identification of the data type and methodology

2. Successful application using the computer software

3. Interpretation of the results and correct answers to the question(s)

4. Professional presentation of the assignment

5. Analysis of data in order to develop solutions for real-world business problems

QUESTION 1 [10 marks]

Some critics of television complain that the amount of violence shown on television contributes to violence in our society. Others point out that television contributes to the high level of obesity among children. Now, we may have to add financial problems to the list. A sociologist theorised that people who watch television frequently are exposed to many commercials, which in turn lead them to buy, resulting in increasing debt. To analyse this belief, a researcher plans to survey a sample of families across the country.

Part A [0.5+0.5+0.5 = 1.5 marks]

Briefly explain

(a) What type of survey method could the researcher use and why?

(b) What sampling method could the researcher use to select his/her sample and why?

(c)What kind of issues could the researcher face in this data collection?

Suppose the researcher collected data from 430 randomly selected families. For each family, the total debt (in $) and the number of hours the television is turned-on (TV Hours) per week were recorded. The data are stored in file 1TVDEBT.xlsx and available in the course website under Assessment. Using this data and EXCEL, answer the Parts B, C and D below.

Part B [0.5+1.5 = 2 marks]

First, the researcher wishes to use the graphical descriptive methods to present the data for the two variables.

He suggests using 10 classes such as class intervals 0-6, 6-12, 12-18, .... for the TV Hours variable and class intervals 0-30000, 30000-60000, 60000-90000, .... for the Total Debt variable. Explain how he could have decided on the number of classes as 10.

Use appropriate BIN values to draw histograms for the two variables and comment on the shape of the two distributions (Hint: draw the frequency polygon as well).

Part C [2+1 = 3 marks]

Second, the researcher wishes to use the numerical descriptive measures to summarize the data.

Prepare a numerical summary report about the data on the two variables the researcher has considered by including the summary measures, mean, median, range, variance, standard deviation, smallest and largest values and the three quartiles, for each variable.

Use five of the above summary measures to represent the summary information in a box plot for each variable (hand drawn and scanned is OK). Comment on the outliers (if any).

Part D [1+1+1+0.5 = 3.5 marks]

Third, the researcher wishes to use graphical descriptive method and a numerical descriptive measure to analyse the strength of the linear relationship between the two variables.

Use an appropriate plot to show the relationship between variables, Total Debt and TV Hours.

Compute a measure of the direction and strength of a linear relationship between the two variables. Interpret this value.

Comment on the sociologists theory that people who watch television frequently are exposed to many commercials, which in turn lead them to buy, resulting in increasing debt.

Write a paragraph summarizing what you have learned about TV and debt that could be submitted to the CEO of a credit card company who is considering increasing their exposure on all TV stations.

QUESTION 2 [2+3.5+3.5=9 marks]

A financial analyst recorded the quarterly returns on investment for three shares and stored the data in file 2RETURN.xlsx. Using the data and performing the calculations in EXCEL, answer the following questions.

a Calculate the mean and variance of each of the three shares.

b If you wish to construct a portfolio that maximises the expected return, what should you do?

c If you wish to construct a portfolio that minimises the risk, what should you do?

B.a Find the expected value and variance of the following portfolio:

Share 1 30%

Share 2 40%

Share 3 30%

b How do the expected value and variance of the portfolio compare with those of Part A?

C. aFind the expected value and variance of the following portfolio:

Share 1 10%

Share 2 10%

Share 3 80%

b How do the expected value and variance of the portfolio compare with those of Part A and Part B(a)?

QUESTION 3 [4.5+5+0.5=10 marks]

A.A new credit card company is investigating various market segments to determine whether it is profitable to direct its advertising specifically at each one. One of the market segments is composed of Asian migrants. The latest census indicates that there are 2.01 million Asian migrants in Australia. A survey of 475 Asian migrants asked each how they usually pay for products that they purchase. The responses are:

1 Cash

2 Cheque

3 Visa

4 MasterCard

5 Other credit card.

The responses are recorded and stored in file 3ACARD.xlsx. Estimate with 95% confidence the number of Asian migrants in Australia who usually pay by credit card.

An insurance company boasts that 90% of its customers who make claims are satisfied with the service. To check the accuracy of this declaration, the company conducts an annual Claimant Satisfaction Survey in which customers are asked whether they were satisfied with the quality of the service (1 = Satisfied and 2 = Unsatisfied). Their responses are recorded and stored in file 3BSERVICE.xlsx. Can we infer at the 5% level of significance (= 0.05) that the satisfaction rate is less than 90%?

Based on your answer to part B, do you think there is an appropriately high level of satisfaction? Write a paragraph interpreting your statistical work.

QUESTION 4 [2.5+3.5+2.5+0.5=9 marks]

Income vs Height of MBA men

One general belief held by observers of the business world is that taller men earn more money than shorter men. In a study reported in the Wall Street Journal, 30 MBA graduates, all about 30 years old, were surveyed and asked to report their annual incomes and their heights. These responses are recorded in 4MBA.xlsx.

a Estimate a linear relationship between heights and annual income of MBA graduates and interpret your results.

b Do these data provide sufficient statistical evidence to infer at the 5% significance level that taller men with MBAs earn more money than shorter ones?

c Obtain two measures to check the fitness of the model. Do you think that this model is good enough to be used to estimate and predict income on the basis of height? Predict the income of 183cm tall men with MBAs.

d. What do you think are the reasons why taller men might earn more money? Interpret your model and consider some of the underlying social reasons behind this correlation.

QUESTION 5 [3+3+2+1.5+2+0.5=12 marks]

Coronavirus (COVID-19) Pandemic and its effects across Australia

The first case of the coronavirus (COVID-19) pandemic in Australia was identified on 23 January 2020. Since then, as of 30 June 2020, there have been 7881 confirmed cases and 103 deaths due to the virus, which has increased dramatically to 4.5 million confirmed positive cases and 6058 deaths by March 2022. Due to early lockdown measures, self-isolation of known cases and close contacts, and quarantine controls of international arrivals, Australia managed to control the virus until 2021. With the COVID-19 vaccine rollout in 2021 and vaccine mandates at several workplaces, more than 95% of Australians aged 16 and over are now fully vaccinated (at least 2 doses). It is believed that with cessation of COVID restrictions in all states, even though the positive COVID-19 case numbers has increased at an alarming rate, the effect of the virus was mild on most of the vaccinated people and the number of deaths is not severe compared to the early days. Our task is to verify the belief using pre- and post-vaccination data.

Data on the number of confirmed positive cases and COVID-19 related deaths by states as at 30 June 2020 and March 2023 are stored in 5COVID.xlsx (Source: Total Cases per 100,000 population - COVID Live.com.au/states-and-territories). [Note that June 2020 can be considered as pre-vaccination period and March 2023 can be considered as post-vaccination period with more than 95% of the population fully vaccinated.] Answer the following questions.

Using an appropriate graphical method, present the share of COVID-19 cases by state in June 2020. Repeat this for the March 2023 data. Compare the share of cases between the two time periods (pre- and post- vaccination).

Using an appropriate graphical method, present the share of COVID-19 related deaths by state in June 2020. Repeat this for the March 2023 data. Compare the share of deaths between the two time periods (pre- and post- vaccination).

Calculate the number of deaths due to COVID-19 per 1,000 cases by state using the June 2020 data. Repeat this for the March 2023 data.

Compare the number of deaths per 1,000 cases in June 2020 and March 2023 from your answer to part (c) and verify or refute the belief that the vaccination program has reduced the proportion of COVID-19 related deaths in each state.

Are the results similar across all states? Hint: Consider the population of each state and comment on the number of cases per 1000 residents in 2020 and 2023. [You can use the 2023 population data provided for both.]

Write a brief report for a Health Minister of a State on the key findings from the data analysis. It is assumed that this report will help the stakeholders to minimise or reduce the number of deaths caused by a pandemic if occurred in the future.

END OF THE ASSIGNMENT

7318AFE Formula List

Trimester 1 2023

K = 1 + 3.3 log10n

Summary Measures (n sample size; N = Population size)

CV = cv =

Percentiles and Quartiles

Lp = IQR = Q3-Q1

Coefficient of correlation and linear trend line

Probability distributions

Normal Distribution

Z =

Confidence Intervals

Mean:

Proportion:

Hypothesis Testing Test statistics

Mean:

Proportion:

Correlation Analysis and Simple Linear Regression Analysis

Coefficient of Correlation

Sample regression line

Residual

SSE and Standard error of estimate

Standard error of the slope and intercept coefficient estimates

Test statistic for the significance of the slope and intercept coefficients

Coefficient of determination

Prediction interval

Confidence interval for the mean

Multiple Linear Regression Analysis

n = number of observations (sample size); k = number of independent variables in the model

Coefficient of Determination

Adjusted R2 =

Test statistic for the significance of the coefficients

Standard error of estimate or Standard error of regression

Test statistic for the utility of the model

Statistical Tables

Normal Distribution

Normal Distribution (continued)

Student t Distribution

7318AFE Business Data Analytics, T2, 2023

Assignment

DUE DATE: 9:00AM AEST Brisbane time, 9 October 2023

(Online submission via Assignment link under Assessment 2 in the course website)

Instructions:

This is the major assessment item for this course, worth 50% of the overall assessment.

Data files are provided in Excel in the Assessment 2 section of the Canvas course website.

All numerical calculations and graphs/plots should be done using EXCEL, including the workbooks provided, as much as possible.

The completed assignment must be submitted electronically.

You are required to keep an electronic copy of the submitted assignment and Excel files with output to re-submit, in case the original submission is lost for some reason.

Important Notice:

As this is an individual assessment item, students should submit their individual assignment. If found to have cheated, all submissions involved would receive a mark of zero for this assessment item.

Marking criteria:

For each question, marking will consider the following:

1. Correctness of the identification of the data type and methodology

2. Successful application using the computer software

3. Interpretation of the results and correct answers to the question(s)

4. Professional presentation of the assignment

5. Analysis of data in order to develop solutions for real-world business problems

QUESTION 1 [10 marks]

Part A [0.5+0.5+0.5 = 1.5 marks]

Briefly explain

(a) What type of survey method could the researcher use and why?

(b) What sampling method could the researcher use to select his/her sample and why?

(c)What kind of issues could the researcher face in this data collection?

Part B [0.5+1.5 = 2 marks]

First, the researcher wishes to use the graphical descriptive methods to present the data for the two variables.

Use appropriate BIN values to draw histograms for the two variables and comment on the shape of the two distributions (Hint: draw the frequency polygon as well).

Part C [2+1 = 3 marks]

Second, the researcher wishes to use the numerical descriptive measures to summarize the data.

Use five of the above summary measures to represent the summary information in a box plot for each variable (hand drawn and scanned is OK). Comment on the outliers (if any).

Part D [1+1+1+0.5 = 3.5 marks]

Third, the researcher wishes to use graphical descriptive method and a numerical descriptive measure to analyse the strength of the linear relationship between the two variables.

Use an appropriate plot to show the relationship between variables, Total Debt and TV Hours.

Compute a measure of the direction and strength of a linear relationship between the two variables. Interpret this value.

Comment on the sociologists theory that people who watch television frequently are exposed to many commercials, which in turn lead them to buy, resulting in increasing debt.

Write a paragraph summarizing what you have learned about TV and debt that could be submitted to the CEO of a credit card company who is considering increasing their exposure on all TV stations.

QUESTION 2 [2+3.5+3.5=9 marks]

a Calculate the mean and variance of each of the three shares.

b If you wish to construct a portfolio that maximises the expected return, what should you do?

c If you wish to construct a portfolio that minimises the risk, what should you do?

B.a Find the expected value and variance of the following portfolio:

Share 1 30%

Share 2 40%

Share 3 30%

b How do the expected value and variance of the portfolio compare with those of Part A?

C. aFind the expected value and variance of the following portfolio:

Share 1 10%

Share 2 10%

Share 3 80%

b How do the expected value and variance of the portfolio compare with those of Part A and Part B(a)?

QUESTION 3 [4.5+5+0.5=10 marks]

1 Cash

2 Cheque

3 Visa

4 MasterCard

5 Other credit card.

The responses are recorded and stored in file 3ACARD.xlsx. Estimate with 95% confidence the number of Asian migrants in Australia who usually pay by credit card.

Based on your answer to part B, do you think there is an appropriately high level of satisfaction? Write a paragraph interpreting your statistical work.

QUESTION 4 [2.5+3.5+2.5+0.5=9 marks]

Income vs Height of MBA men

a Estimate a linear relationship between heights and annual income of MBA graduates and interpret your results.

b Do these data provide sufficient statistical evidence to infer at the 5% significance level that taller men with MBAs earn more money than shorter ones?

d. What do you think are the reasons why taller men might earn more money? Interpret your model and consider some of the underlying social reasons behind this correlation.

QUESTION 5 [3+3+2+1.5+2+0.5=12 marks]

Coronavirus (COVID-19) Pandemic and its effects across Australia

Calculate the number of deaths due to COVID-19 per 1,000 cases by state using the June 2020 data. Repeat this for the March 2023 data.

END OF THE ASSIGNMENT

Download Solution Now

Uploaded By : Pooja Dhaka
Posted on : November 21st, 2024
Downloads : 0
Views : 313

7318AFE Business Data Analytics, T2, 2023

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

7318AFE Business Data Analytics, T2, 2023

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Request a Call Back