# BUSS6002:Data Science In Business Assignment

Subject Code :

BUSS6002

Question 1 (10 Marks)

a) (4 marks) Write some code to automatically print out the column names of the variables with missing values, as well as the number of missing observations associated with each of those variables. The output should be sorted by the number of missing observations from most to least.

b) (4 marks) Write some code to cross-check the data against the data dictionary and identify discrepancies.

c) (2 marks) Briefly discuss your findings from a) and b).

Question 2 (10 Marks)

a) (4 marks) Graphically summarise the distributions of the variables Generation and Panel Capacity, one at a time, and briefly discuss the distributional characteristics of the two variables. Your discussion should also connect the distributional characteristics to the domainspecific context of these variables.

b) (2 marks) Graphically summarise the distribution of the variable Roof Azimuth and briefly discuss the distributional characteristics of the variable. Your discussion should also connect the distributional characteristics to the domain-specific context. 2

c) (3 marks) Given that there may be an association with Roof Azimuth and generation, transform Roof Azimuth so that equal angles either side of North are treated the same. d) (1 mark) Visualise your new version of Roof Azimuth.

Question 3 (19 Marks)

a) (2 marks) Generate a visualisation of the correlation coefficient between Generation and all variables. Briefly discuss your findings from in the context of predicting Generation.

b) (3 marks) Construct an appropriate plot to visualise the relationship between Generation and Panel Capacity. Briefly discuss your findings.

c) (3 marks) Construct an appropriate plot to visualise the relationship between Generation and Latitude. Briefly discuss your findings.

d) (5 marks) For each city, compile a table that shows the mean Generation for combinations of Roof Azimuth and Roof Pitch. Bin values of Roof Azimuth into 45° groups. Briefly discuss your findings.

e) (4 marks) Pick one city, bin households based on Panel Capacity and Shading. For each bin display the mean Generation. Display the results using an appropriate visualisation. Discuss your findings.

f) (4 marks) Pick one city, bin households based on Panel Capacity and Year. For each bin display the mean Generation. Display the results using an appropriate visualisation. Discuss your findings

