Use KNIME or RAPIDMINER for this assessment
Overview
A data analytics project starts with collecting the data and ends with communicating the results from the data. In between, there are multiple steps that are required to be followed- data preprocessing is one of the most important steps among them. The data preprocessing step itself has multiple steps depending on the nature, type, value etc. of the data.
On the other hand, data visualisation uses visual representations to explore, make sense of, and communicate data that often includes charts, graphs, illustrations etc. Today, there is a move towards visualisation that can be observed among many big companies.
Assessment Details
Case Study 1: Students are required to select a data set for regression tasks and define a question based on business requirement. This should include: (i) selection of dataset; (ii) exploring, summarizing and preparing the data; (iii) defining the problem and requirements; (iv) defining an experiment setup; (v) implementing your approach; and (vi) evaluating and analysing approach. (Marks: 20)
Problem: Describe the problem and highlight the business need.
Approach: Describe your approach It should focus on e.g., learning techniques, features, model tuning, parameter selection and analysis e.g., how the analysis will answer your questions
Results: Summarize results and critically analyse results e.g. limitations of data, setup or approach, characteristic errors, possible improvements.
Conclusion: Conclude with what you have learned from this study which would improve yourself as a data analyst. Would you recommend this as a solution to your problem? Provide reasons.
Case Study 2: Suppose that you have built a classifier that can identify whether an email is spam or not spam. After applying the classifier to the training data, you get the following confusion matrix. (Marks: 20)
Calculate the accuracy, true positive rate, true negative rate, precision, and recall.
Based on the accuracy value, do you think the classifier is doing a good job identifying spam - emails? Justify your answer.
What is the class imbalance problem? How it is affecting the accuracy for the given scenario.
Note: Students are allowed in include other sections as they deem necessary based on their case study.
Sample data set for case study 1:
Absenteeism at work Data Set Bank Marketing Data Set Iranian Churn Dataset Data Set Productivity Prediction of Garment Employees Data Set Real estate valuation data set Data Set Apartment for rent classified Data Set Chronic_Kidney_Disease Data Set Marking Rubric for Case Study 1
Score Very Good Good Satisfactory Unsatisfactory
Presentation Information is well Information is Information is somewhat Information is somewhat
/Layout organised, well written, organised, well written, organised, proper organised, but proper
and proper grammar with proper grammar grammar and grammar and
and punctuation are and punctuation. punctuation mostly punctuation not always
used throughout. Correct layout used. used. Correct layout used. Some elements of
/02 marks Correct layout used. used. layout incorrect.
Structure Structure guidelines Structure guidelines Structure guidelines Some elements of
/02 marks Enhanced followed exactly mostly followed. structure omitted
Introduction Introduces the topic of Introduces the topic of Satisfactorily introduces Introduces the topic of
the report in an the report in an the topic of the report. the report, but omits a
extremely engaging engaging manner which Gives a general general background of
manner which arouses arouses the reader's background. the topic and/or the
the reader's interest. interest. Indicates the overall overall "plan" of the
Gives a detailed general Gives some general "plan" of the paper. paper.
background and background and
indicates the overall indicates the overall
/02 marks "plan" of the paper. "plan" of the paper.
Design and Analysis All topics are discussed in Consistently detailed A topic has been Inadequate discussion
Depth coherently. discussion. Displays adequately discussed. of issues Little/no
Significant evidence of sound understanding Displays some demonstrated
Critical analysis and with some analysis of understanding and understanding or
Reflection.
Topics.
analysis of issues.
analysis of most issues and/or some irrelevant
/10 marks information.
Summary & Conclusion An interesting, well A good summary of the Satisfactory summary of Poor/no summary of the
written summary of the main points. the main points. main points.
main points. A good final comment A final comment on the A poor final comment on
An excellent final on the subject, based subject, but introduced the subject and/or new
comment on the on the information new material. material introduced.
subject, based on the provided.
/02 marks information provided.
Referencing Correct referencing Mostly correct Mostly correct Not all material correctly
(APA7 Style). All quoted referencing (APA7 Style). All referencing (APA7 Style) acknowledged.
material in quotes and quoted material in Some problems with Some problems with the
acknowledged. All Quotes & acknowledged. quoted material and reference list.
paraphrased material All paraphrased material paraphrased material
acknowledged. acknowledged. Some problems with the
Correctly set out Mostly correct setting reference list.
/02 marks reference list. out reference list.
Total out of 20