diff_months: 4

Applied Machine Learning: Employee Turnover Analytics ML402

Flat 50% Off Order New Solution
Added on: 2025-05-23 08:58:35
Order Code: LD523405
Question Task Id: 0
  • Subject Code :

    ML402

Machine Learning

Course-End Project Problem Statement

Employee Turnover Analytics

Project Statement:

Portobello Tech is an app innovator who has devised an intelligent way of predicting employee turnover within the company. It periodically evaluates employees' work details, including the number of projects they worked on, average monthly working hours, time spent in the company, promotions in the last five years, and salary level.

Data from prior evaluations shows the employees satisfaction in the workplace. The data could be used to identify patterns in work style and their interest in continuing to work for the company.

The HR Department owns the data and uses it to predict employee turnover. Employee turnover refers to the total number of workers who leave a company over time.

As the ML Developer assigned to the HR Department, you have been asked to create ML programs to:

  1. Perform data quality checks by checking for missing values, if any.
  2. Understand what factors contributed most to employee turnover at EDA.
  3. Perform clustering of employees who left based on their satisfaction and evaluation.
  4. Handle the left Class Imbalance using the SMOTE technique.
  5. Perform k-fold cross-validation model training and evaluate performance.
  6. Identify the best model and justify the evaluation metrics used.
  7. Suggest various retention strategies for targeted employees.

Data will be modified from:

https://www.kaggle.com/liujiaqi/hr-comma-sepcsv

Column Name

Description

satisfaction_level

Satisfaction level at the job of an employee

last_evaluation

Rating between 0 and 1, received by an employee at his last evaluation

number_project

The number of projects an employee is involved in

average_montly_hours

Average number of hours in a month spent by an employee at the office

time_spend_company

Number of years spent in the company

Work_accident

0 - no accident during employee stay, 1 - accident during employee stay

left

0 indicates an employee stays with the company and

1 indicates an employee left the company

promotion_last_5years

Number of promotions in his stay

Department

Department to which an employee belongs to

salary

Salary in USD

Perform the following steps:

  1. Perform data quality checks by checking for missing values, if any.
  2. Understand what factors contributed most to employee turnover at EDA.
    • Draw a heatmap of the correlation matrix between all numerical features or columns in the data.
    • Draw the distribution plot of:
      • Employee Satisfaction (use column satisfaction_level)
      • Employee Evaluation (use column last_evaluation)
      • Employee Average Monthly Hours (use column average_montly_hours)
    • Draw the bar plot of the employee project count of both employees who left and stayed in the organization (use column number_project and hue column left), and give your inferences from the plot.
  3. Perform clustering of employees who left based on their satisfaction and evaluation.
    • Choose columns satisfaction level, last evaluation, and left.
    • Do K-means clustering of employees who left the company into 3 clusters?
    • Based on the satisfaction and evaluation factors, give your thoughts on the employee clusters.
  4. Handle the left Class Imbalance using the SMOTE technique.
    • Pre-process the data by converting categorical columns to numerical columns by:
      • Separating categorical variables and numeric variables
      • Applying get dummies() to the categorical variables
      • Combining categorical variables and numeric variables
    • Do the stratified split of the dataset to train and test in the ratio 80:20 with random state=123.
    • Up sample the train dataset using the SMOTE technique from the imblearn module.
  5. Perform 5-fold cross-validation model training and evaluate performance.
    • Train a logistic regression model, apply a 5-fold CV, and plot the classification report.
    • Train a Random Forest Classifier model, apply the 5-fold CV, and plot the classification report.
    • Train a Gradient Boosting Classifier model, apply the 5-fold CV, and plot the classification report.
  6. Identify the best model and justify the evaluation metrics used.
    • Find the ROC/AUC for each model and plot the ROC curve.
    • Find the confusion matrix for each of the models.
    • Explain which metric needs to be used from the confusion matrix: Recall or Precision?
  7. Suggest various retention strategies for targeted employees.
    • Using the best model, predict the probability of employee turnover in the test data.
    • Based on the probability score range below, categorize the employees into four zones and suggest your thoughts on the retention strategies for each zone.
      • Safe Zone (Green) (Score < 20>
      • Low-Risk Zone (Yellow) (20% < Score>
      • Medium-Risk Zone (Orange) (60% < Score>
      • High-Risk Zone (Red) (Score > 90%).

  • Uploaded By : Nivesh
  • Posted on : May 23rd, 2025
  • Downloads : 0
  • Views : 72

Order New Solution

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more