diff_months: 8

MA5832 Machine Learning Algorithms Assignment

Download Solution Now
Added on: 2023-09-14 05:33:33
Order Code: CLT318870
Question Task Id: 0
  • Subject Code :

    MA5832

  • Country :

    Australia

1. Problem (75 marks)

1.1 Background on Credit Card Dataset

The data, “CreditCard Data.xls”, is based on Yeh and Lien (2009). The data contains 30,000 observations and 23 explanatory variables. The response variable, Y, is a binary variable where “1” refers to default payment and “0” implies non-default payment. The description of 23 explanatory variables is as follows:

X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.

X2: Gender (1 = male; 2 = female).

X3: Education (0 = unknown; 1 = graduate school; 2 = university; 3 = high school; 4 = others; 5 = unknown; 6 = unknown).

X4: Marital status (0 = unknown; 1 = married; 2 = single; 3 = others). X5: Age (year).

X6 - X11: History of past payment. The data was tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -2= no consumption, -1=pay duly, 0 = the use of revolving credit; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.

X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005.

X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.

Assessment Tasks

Data

  1. Select a random sample of 70% of the full dataset as the training data, retain the rest as test data. Provide the code and print out the dimensions of the training data.

Lasso regression

  1. Use lasso-regression to find the best model which classifies credible and non-credible clients. Specify any underlying assumptions. Justify your model choice as well as hyper-parameters which are required to be specified in R. (10 marks)
  2. Display model summary and discuss the relationship between the response variable versus selected features. (10 marks)
  3. Evaluate the performance of the algorithm on the training data and comment on the results. (5 marks)

Tree Based Algorithms

  1. Use an appropriate tree based algorithm to classify credible and non-credible clients. Specify any underlying assumptions. Justify your model choice as well as hyperparameters which are required to be specified in R. (10 marks)
  2. Display model summary and discuss the relationship between the response variable versus selected features. (10 marks)
  3. Evaluate the performance of the algorithm on the training data and comment on the results. (5 marks)

Support vector classifier

  1. Use an appropriate support vector classifier to classify the credible and non-credible clients. Justify your model choice as well as hyper-parameters which are required to be specified in R. (10 marks)
  2. Display model summary and discuss the relationship between the response variable versus selected features. (10 marks)
  3. Evaluate the performance of the algorithm on the training data and comment on the results. (5 marks)

Prediction

Apply your the optimal models identified in section , and make prediction on the test data. Evaluate the performance of the algorithms on test data. Which models do you prefer? Are there any suggestions to further improve the performance of the algorithms? Justify your answers.

(20 marks)

Are you struggling to keep up with the demands of your academic journey? Don't worry, we've got your back! Exam Question Bank is your trusted partner in achieving academic excellence for all kind of technical and non-technical subjects.

Our comprehensive range of academic services is designed to cater to students at every level. Whether you're a high school student, a college undergraduate, or pursuing advanced studies, we have the expertise and resources to support you.

To connect with expert and ask your query click here Exam Question Bank

  • Uploaded By : Mohit
  • Posted on : September 14th, 2023
  • Downloads : 0
  • Views : 78

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more