PGP-DS Capstone Project Industry Review Writing
Guidelines for PGP-DS Capstone ProjectIndustryReview
- IndustryReviewCurrentPractices,BackgroundResearch
- LiteratureSurvey-Publications,Applications,pastandundergoingresearch
DatasetandDomain
- data dictionary
- Variable categorization (countofnumericand categorical)
- Pre-ProcessingDataAnalysis(countofmissing/nullvalues,redundantcolumns,)
- Alternatesourcesofdatathatcansupplementthecoredataset(atleast 2-3 columns)
- Project Justification -Project Statement, Complexity involved, Project Outcome Commercial,AcademicorSocialvalue
data exploration(EDA)
- Relationship between variables
- Check for
- Multi-colinearity
- Distribution of variables
- Presence of outliersandtheirtreatment
- Statistical significance ofvariables
- Class imbalance anditstreatment
FeatureEngineering
- Whether any transformationsrequired
- Scaling the data
- Feature selection
- Dimensionality reduction
Assumptions
- Check for theassumptionsto besatisfiedforeachofthemodelsin
- RegressionSLR,MultipleLinearRegression,LogisticRegression
- ClassificationDecisionTree,RandomForest,SVM,Baggedandboostedmodels
- Clustering PCA (multi-co linearity), K-Means (presence of outliers, scaling, conversion to numerical,)
InterimPresentationCheckpoint
Modelbuilding
- Split the datatotrainand test.
- Start with asimplemodelwhichsatisfiesalltheaboveassumptionsbasedonyour
- Check for biasandvariance
- To improve the performance, try cross-validation, ensemble models, hyperparameter tuning,gridsearch
Evaluationofmodel
- RegressionRMSE,R-Squaredvalue,
- ClassificationClassificationreportwithprecision,recall,F1-score,Support,AUC,
- ClusteringInertiavalue
- Comparison of differentmodelsbuiltanddiscussionof thesame
- Time taken fortheinferences/predictions
BusinessRecommendations&Futureenhancements
- How to improvedatacollection,processing,andmodelaccuracy?
- Commercial value/Socialvalue/Researchvalue
- Recommendations based oninsights
FinalPresentationCheckpoint
Dashboard
- EDACorrelationmatrix,pairplots,boxblots,distributionplots
- Model
- ModelParameters
- Visualization of performanceof themodelwithvaryingparameters
- Visualization of modelMetrics
- Testing outcome
- Failure cases andexplanationforthesame
- Most successful andobviouscases
- Border cases
FinalSubmissionCheckpoint