MA5832-Machine Learning Methodologies - Computer Science Assignment

Added on: 2022-08-20
Assignment Task


1. Capstone Scenario:


The data, AUS Data.xlsx, used in the capstone is aggregated and collected from the Australian Bureau of Statistics (ABS).1 The data is available quarterly from June 1981 to September 2020. The data includes the response variable (unemployment rate) and 7 predictors:2

  • Y : Unemployment rate measured in percentage.
  • X1: Percentage change in Gross domestic product;
  • X2: Percentage change in the Government final consumption expenditure;
  • X3: Percentage change in final consumption expenditure of all industry sectors;
  • X4: Term of trade index (percentage).
  • X5: Consumer Price Index of all groups (CPI) ;
  • X6: Number of job vacancies measured in thousands;
  • X7: Estimated Resident Population measured in thousands.

Further explanations about the variables can be found in the ABS website.

Assessment Tasks:

1. Provide an overview of the Australian unemployment rate over the last 21 years (1999-2020) and some insights on factors driving the unemployment rate.


2. Prepare data appropriate for the proposed supervised machine learning methodologies such as:

(a)Implementing appropriate data wrangling procedures, e.g. missing values treatment /transformation of variables.

(b)Provide and comment on descriptive statistics of the variables.

Machine Learning:

3. Apply one of the supervised machine learning (ML) algorithms from either Week 3 or Week 4 to the data prepared in Question 2 to predict the Australian unemployment rate from March 2018 to September 2020.)

(a) Justify your choice over the other supervised machine learning algorithms.

(b) Justify the choice of the hyper-parameter(s) which is required to be specified in R to estimate the selected model.

(c) Report the performance(s) and interpretation(s) of the obtained ML model(s) on the training dataset.

(d) Discuss the predictive performance of the model on the test dataset (March 2018 to December 2020).

Neural Network:

4. Apply a neural network (NN) to the data prepared in Question 2 to predict the Australian unemployment rate from March 2018 to September 2020. )

(a) Describe the structure of the selected neural network model.

(b) Report the performance(s) and interpretation(s) of the produced NN models on the training dataset.

(c) Discuss the predictive performance of the model on the test dataset (March 2018 to December 2020).

(d) Vary the number of hidden layers in the model 4(a). Explore the impacts of the change on the prediction performance of the model.

(e) Vary the number of neurons in each layer in the model 4(a). Explore the impacts of the change on the prediction performance of the model.

Comparison and Suggestion:

5. Compare the chosen ML model in Question 3 with the NN model in Question 4, and then provide a recommended model. At a minimum, include.

(a) Cross-validated accuracy

(b) Computational time to train models

(c) Interpretability

6. Provide some suggestions regarding the methodologies/data to further improve the prediction of the unemployment rate of Australia.

  Posted on : May 27th, 2021
