diff_months: 28

BUS5PA Predictive Analytics : Building and Evaluating Predictive Models Assessment

Download Solution Now
Added on: 2023-07-26 10:39:02
Order Code: 491864
Question Task Id: 0
  • Subject Code :

    BUS5PA

A used car online selling company in the USA is in the process of updating its car price assessmentmethod where they want to apply a data-driven technique. The trial dataset consists of 25 variablesdescribing 39984 car sales from 1994 to 2021. The management is very keen to apply predictivemodelling for this task where the trial data set is to be used to build and evaluate predictive modelsto ascertain the feasibility of such an approach. The company has outsourced the task to you.
Part A (10%) Problem Formulation
The objective of this section (Part A) is to introduce students to the domain understanding andfamiliarisation phase data analysts go through prior to the actual analytics. Since you may have tocarry out analytics projects in different domains in the future, where you may not have sufficientdomain knowledge, it is important to develop this skill.
1. Carry out an exploratory study to identify the background and relevant features of used carsthat influence their value in the USA and methods used for price evaluation and assessment ofused cars?
2. Identify the data sources that would contain information useful for the value assessment of usedcars. What is the possible format of such information? Will you face any problems accessing thisdata?
3. What variables would be useful to build a predictive model to assess the used cars? How do youidentify such variables?
Part B (40%) Data Exploration and Cleaning
Use the provided dataset to answer this section. You are given access to 24 variables that are directlyrelated to used carsales from the above-mentioned dataset. Most of these variables are similar to thetype of information that an assessor will use to evaluate and assess the price of a used car (e.g. whenwas it made? What is the length and width of the car? What type of wheel system? Number of seats?).
You need to answer the following questions with evidence and justifications.
PART 1

a. Which variables are continuous/discrete? Which are ordinal? Which are nominal?

b. What are the methods for transforming categorical variables?

c. Carry out and demonstrate data transformation where necessary.

PART 2
a. Calculate the following summary statistics: mean, median, max, min and standard deviationfor each of the continuous variables, and count for each categorical variable.


b. Is there any evidence of extreme values using the boxplot? Briefly discuss.

PART 3
Plot histograms for each of the continuous variables and create summary statistics. Based onthe histogram and summary statistics answer the following and provide brief explanations:
a. Which variables have the largest variability?

b. Which variables seem skewed?

c. Are there any values that seem extreme?
PART 4

a. Which, if any, of the variables have missing values?

b. What are the methods of handling missing values?

c. Apply the 3 methods of missing value and demonstrate the output (summary statistics andtransformation plot) for each method in (4-a). (hint: the objective is to identify the impactof using each of the methods you mentioned in the 4-a on the summary statistics outputabove). Which method of handling missing values is most suitable for this data set? Discussbriefly referring to the data set

  • Uploaded By : Katthy Wills
  • Posted on : July 26th, 2023
  • Downloads : 0
  • Views : 373

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more