MA5832 Machine Learning Algorithms Assignment
- Subject Code :
MA5832
- Country :
Australia
1. Problem (75 marks)
1.1 Background on Credit Card Dataset
The data, “CreditCard Data.xls”, is based on Yeh and Lien (2009). The data contains 30,000 observations and 23 explanatory variables. The response variable, Y, is a binary variable where “1” refers to default payment and “0” implies non-default payment. The description of 23 explanatory variables is as follows:
X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.
X2: Gender (1 = male; 2 = female).
X3: Education (0 = unknown; 1 = graduate school; 2 = university; 3 = high school; 4 = others; 5 = unknown; 6 = unknown).
X4: Marital status (0 = unknown; 1 = married; 2 = single; 3 = others). X5: Age (year).
X6 - X11: History of past payment. The data was tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -2= no consumption, -1=pay duly, 0 = the use of revolving credit; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005.
X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.
Assessment Tasks
Data
- Select a random sample of 70% of the full dataset as the training data, retain the rest as test data. Provide the code and print out the dimensions of the training data.
Lasso regression
- Use lasso-regression to find the best model which classifies credible and non-credible clients. Specify any underlying assumptions. Justify your model choice as well as hyper-parameters which are required to be specified in R. (10 marks)
- Display model summary and discuss the relationship between the response variable versus selected features. (10 marks)
- Evaluate the performance of the algorithm on the training data and comment on the results. (5 marks)
Tree Based Algorithms
- Use an appropriate tree based algorithm to classify credible and non-credible clients. Specify any underlying assumptions. Justify your model choice as well as hyperparameters which are required to be specified in R. (10 marks)
- Display model summary and discuss the relationship between the response variable versus selected features. (10 marks)
- Evaluate the performance of the algorithm on the training data and comment on the results. (5 marks)
Support vector classifier
- Use an appropriate support vector classifier to classify the credible and non-credible clients. Justify your model choice as well as hyper-parameters which are required to be specified in R. (10 marks)
- Display model summary and discuss the relationship between the response variable versus selected features. (10 marks)
- Evaluate the performance of the algorithm on the training data and comment on the results. (5 marks)
Prediction
Apply your the optimal models identified in section , and make prediction on the test data. Evaluate the performance of the algorithms on test data. Which models do you prefer? Are there any suggestions to further improve the performance of the algorithms? Justify your answers.
(20 marks)
Are you struggling to keep up with the demands of your academic journey? Don't worry, we've got your back! Exam Question Bank is your trusted partner in achieving academic excellence for all kind of technical and non-technical subjects.
Our comprehensive range of academic services is designed to cater to students at every level. Whether you're a high school student, a college undergraduate, or pursuing advanced studies, we have the expertise and resources to support you.
To connect with expert and ask your query click here Exam Question Bank