diff_months: 21

MBAS901 Essential Tools For Business Analytics Assignment

Download Solution Now
Added on: 2023-07-17 06:32:17
Order Code: SA_34982_298
Question Task Id: 0
  • Subject Code :

    MBAS901

  • Country :

    Australia

Task 1: Utilizing the 'HOUSEPRICE' dataset, address the following questions (20 Marks)

Q1. Conduct a simple exploratory data analysis (EDA) to gain insights into the data's characteristics. Throughout this process, identify the attributes that you would select to predict the 'SalePrice' and assess the presence of multicollinearity. (5 Marks)

Q2. Examine the data for potential issues such as duplicates and formatting inconsistencies in the selected attributes. Perform data preprocessing if necessary. If data preprocessing is not required, provide explanations supported by relevant validations. (5 Marks)

Q3.Develop a machine learning model to predict the 'SalePrice' and evaluate its performance.

Describe the specific type of model you selected, the rationale behind your choice, and provide justification for your model selection. Additionally, comment on the accuracy of the model. (8 Marks)

Q4.Determine whether the developed model is statistically significant. Explain your answer. (2 Marks)

Task 2: Utilizing the FLCRASH dataset, address the following questions (20 Marks)

Q1. Create a new custom category variable based on Total Crash Injuries variable. This new custom category variable should contain only two categories. One category is injuries equal to zero, while the other category is for crashes with one or more injuries. Visualize the frequency of the two new categories you just created on a bar chart. How many crashes report zero injuries? (3 Marks)

Q2.In Q1, you created a new categorical variable with only two values (binary). Your task now is to develop two models that can predict the value this target variable takes, given other explanatory variables. In other words, you attempt to predict if a crash is going to result in injuries (or not) given other important variables

  1. What are the two models (or techniques) you can use to predict this target variable? (2 Marks)
  2. Create one model to predict the target variable you created in Q1. Assess this models accuracy. What are the most important variables in predicting this target variable? (6 Marks)
  3. Create the second model to predict the target variable. Assess this models accuracy. What are the most important variables identified by the model to predict the target variable. (6 Marks)
  4. Compare the performance of the two models. Report and discuss the results of your comparison. Which model is the champion? (3 Marks)

Task 3: Utilizing the INSURANCE dataset, address the following questions (10 Marks)

Q1.Define new custom category variables for Age, BMI, and Charges based on the following criteria: (3 Marks)

  1. For the Age variable:
    • 0-17 years: Categorized as "Children."
    • 18-24 years: Categorized as "Young adults."
    • 25-44 years: Categorized as "Adults."
    • 45-64 years: Categorized as "Middle-aged adults."
    • 65 years and above: Categorized as "Seniors" or "Older adults."
  2. For the BMI variable:
    • BMI less than 18.5: Categorized as "Underweight."
    • BMI between 18.5 and 24.9: Categorized as "Normal weight."
    • BMI between 25 and 29.9: Categorized as "Overweight."
    • BMI equal to or greater than 30: Categorized as "Obese."
  3. For the Charges variable:
    • Charges less than or equal to 12000: Categorized as "Low."
    • Charges greater than 12000 and less than or equal to 40000: Categorized as "Moderate."
    • Charges greater than 40000: Categorized as "High."

Q2.Develop a machine learning model capable of identifying specific groups of individuals based on available attributes such as sex, smoking status, etc., as well as custom categories created in Q1. For example, consider a potential group consisting of middle-aged adults with higher charges associated with smoking and higher BMI values. Justify the choice of the machine learning model for this task and provide an analysis of its accuracy and applicability. (5 Marks)

Q3. Why is it not possible to forecast charges in the provided dataset? If that issue was not a limitation, which type of model would you employ for forecasting? What would be the most suitable time frequency to aggregate insurance charges, such as weekly, monthly, annually, or daily? Whatfactors would you consider as essential for predicting insurance charges? (2 Marks)

Are you struggling to keep up with the demands of your academic journey? Don't worry, we've got your back! Exam Question Bank is your trusted partner in achieving academic excellence for all kind of technical and non-technical subjects.

Our comprehensive range of academic services is designed to cater to students at every level. Whether you're a high school student, a college undergraduate, or pursuing advanced studies, we have the expertise and resources to support you.

To connect with expert and ask your query click here Exam Question Bank

  • Uploaded By : Katthy Wills
  • Posted on : July 17th, 2023
  • Downloads : 0
  • Views : 204

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more