DATA ANALYSIS AND VISUALIZATION USING PYTHON FOR PETALS DISEASE ANALYSIS TCS3123
- Subject Code :
TCS3123
- University :
University of Technology Sydney Exam Question Bank is not sponsored or endorsed by this college or university.
- Country :
Malaysia
FACULTY OF ENGINEERING, BUILT ENVIRONMENT AND INFORMATION TECHNOLOGY
BACHELOR OF INFORMATION TECHNOLOGY/ BACHELOR OF COMPUTER SCIENCE
TCS3123 DATA SCIENCE
ASSESSMENT TITLE:
DATA ANALYSIS AND VISUALIZATION USING PYTHON FOR PETALS DISEASE ANALYSIS
DUE DATE SUBMISSION: 10 APRIL 2025, 11:59 PM
STUDENT NAME |
|
STUDENT ID |
|
SUBMISSION DATE |
DECLARATION OF INTEGRITY:
I hereby declare that this lab assessment submission is my own independent work and does not contain plagiarized content, unauthorized assistance, or any form of academic dishonesty. I confirm that I have adhered to the academic integrity policies outlined by the university.
I understand that if any form of academic dishonesty, including plagiarism, falsification of results, or unauthorized collaboration, is detected in this submission, I may face disciplinary actions. This may include, but is not limited to, receiving a failing grade for this assessment or further academic penalties as determined by the universitys academic integrity committee.
By signing below, I acknowledge my understanding and agreement with this declaration.
Submission Guidelines:
- Submit a single PDF or Word document on LMS
- Include screenshots of all
- Word Count for Reflection: Minimum 500 words.
- Deadline: 10 April 2025, 11:59PM
- Marks: 100 marks is equivalent 20% out of 40% of Continuous Assessment.
- Plagiarism Policy: Work must be original (less than 10% similarity).
Objective:
This assignment aims to assess students' ability to perform data manipulation, analysis, and visualization using Python libraries NumPy, Pandas, Matplotlib, and Seaborn. Students will analyze a dataset, answer analytical questions, and present their findings in a well-structured report and a short video presentation.
Case Study: Petal Diseases in Malaysia
Floriculture is an important industry in Malaysia, contributing to the economy through the export and domestic sale of various flower species. However, petal diseases caused by fungal, bacterial, and environmental conditions pose challenges to flower farmers. Early detection and analysis of these diseases can help improve yield and quality.
A dataset of 6000 flower petal samples has been collected from different farms across Malaysia. The dataset includes flower species, petal color, temperature, humidity, disease type, severity level, and treatment method used. This assignment requires students to analyze this dataset, extract meaningful insights, and visualize trends to assist researchers in understanding the factors affecting petal diseases.
Dataset Overview
The dataset includes the following columns:
Column Name |
Description |
Flower_ID |
Unique identifier for each flower sample |
Species |
Name of the flower species |
Petal_Color |
Color of the petal (e.g., Red, Yellow, White) |
Temperature |
Temperature at the time of data collection (C) |
Humidity |
Humidity percentage (%) |
Disease_Type |
Type of disease (e.g., Fungal, Bacterial, Environmental) |
Severity_Level |
Disease severity (Mild, Moderate, Severe) |
Treatment_Applied |
Treatment method used (Chemical, Organic, None) |
Region |
Geographical location (e.g., Johor, Selangor, Penang) |
Students must perform data processing, analysis, and visualization to answer key research questions.
PART A: Data Manipulation using NumPy and Pandas (30 Marks)
Task 1: NumPy Operations (15 Marks)
- Load the dataset into a NumPy
- Compute statistical measures for temperature and humidity (mean, median, standard deviation).
- Filter records where temperature is above 30C and humidity is above 80%.
- Perform matrix operations on numerical
- Normalize temperature and humidity values between 0 and
Task 2: Data Processing using Pandas (15 Marks)
- Load the dataset using Pandas and display the first five
- Check for missing values and handle them
- Convert categorical values into numerical labels where
- Analyze the distribution of petal colors across different
- Identify the most common disease type and most affected flower species.
PART B: Data Analysis and Interpretation (30 Marks)
Task 3: Exploratory Data Analysis (EDA) (15 Marks)
1. Analyze the relationship between temperature, humidity, and disease severity.
- Identify the most effective treatment method for severe
- Determine which region in Malaysia reports the highest number of diseased flowers.
- Compare the average temperature and humidity of affected non-affected petals.
Task 4: Answering Analytical Questions (15 Marks)
- Based on the dataset, formulate and answer three research questions:
- Example: Does temperature influence fungal disease outbreaks?
- Example: Are certain petal colors more prone to disease?
- Example: Which flower species is the most resistant to petal diseases?
- Use Pandas operations and visualization to support
PART C: Data Visualization using Matplotlib and Seaborn (30 Marks)
Task 5: Creating Data Visualizations (20 Marks)
- Line plot: Show temperature trends across different
- Bar chart: Display the number of diseased flowers per
- Box plot: Compare humidity levels across different disease
- Heatmap: Show correlation between temperature, humidity, and disease
- Scatter plot: Visualize the relationship between temperature and disease
Task 6: Interpreting Visualizations (10 Marks)
- Explain trends and insights observed in the
- Justify why certain visualizations were
- Provide actionable recommendations for flower farmers and researchers.
PART D: Report and Video Presentation (10 Marks)
Task 7: Report Submission (5 Marks)
- Write a structured report (1500-2000 words) covering:
- Introduction (Problem statement, importance of disease detection).
- Methodology (How data was processed and analyzed).
- Findings (Key insights from data analysis and visualizations).
- Conclusion (Summary of findings and recommendations).
Task 8: Video Presentation (5 Marks)
- Prepare a 35-minute video presenting:
- Key insights and
- One major finding and how it can be applied in real
- Explanation of the datasets impact on Malaysia's floriculture industry.
Marking Rubric and Marking Scheme for Data Science Assignment (100 Marks)
Assignment Title: Data Analysis on Petal Diseases in Malaysia using Python
Total Marks: 100
Weightage: 30%
Part A: Data Manipulation using NumPy and Pandas (30 Marks)
Criteria |
Excellent (13- 15 Marks) |
Good (10- 12 Marks) |
Satisfactory (7-9 Marks) |
Needs Improveme nt (4-6 Marks) |
Poor (0-3 Marks) |
Awarde d Marks |
NumPy |
All required |
Most |
Some NumPy |
Basic |
Little or no |
|
Operatio |
NumPy |
NumPy |
operations |
attempts at |
attempt to |
|
ns (15 |
operations |
operations |
are |
NumPy |
use NumPy |
|
Marks) |
(array |
are |
implemented |
operations, |
for data |
|
creation, |
correctly |
correctly, but |
but |
manipulation. |
||
statistics, |
implement |
missing key |
significant |
|||
filtering, |
ed; minor |
components |
errors or |
|||
matrix |
errors in |
or incorrect |
missing |
|||
operations, |
calculations |
calculations. |
parts. |
|||
normalization |
or |
|||||
) are correctly |
explanation |
|||||
implemented |
s. |
|||||
with |
||||||
appropriate |
||||||
explanations |
||||||
and |
||||||
comments. |
||||||
Pandas |
Data is loaded |
Data |
Data is |
Basic |
No attempt or |
|
Data |
and |
processing |
loaded, but |
attempt to |
completely |
|
Processin |
processed |
is mostly |
some |
load data |
incorrect |
|
g (15 |
correctly with |
correct |
transformatio |
but with |
implementati |
|
Marks) |
complete |
with only |
ns are |
major |
on. |
|
handling of |
minor |
incorrect or |
errors or |
|||
missing |
mistakes. |
missing. |
missing |
|||
values, data |
Missing |
Limited |
critical |
|||
transformatio |
value |
handling of |
processing |
|||
ns, and |
handling is |
missing |
steps. |
|||
insightful |
attempted |
values. |
||||
observations. |
but not |
|||||
Code is well- |
fully |
|||||
structured |
optimized. |
|||||
and |
||||||
optimized. |
Part B: Data Analysis and Interpretation (30 Marks)
Criteria |
Excellent (13-15 Marks) |
Good (10-12 Marks) |
Satisfact ory (7-9 Marks) |
Needs Improvement (4-6 Marks) |
Poor (0-3 Marks) |
Award ed Marks |
Explorat |
Deep and |
Insights are |
Basic EDA performe d with a few missing insights. Limited use of grouping and aggregati on functions . |
Limited |
No attempt |
|
ory Data |
meaningful |
mostly correct, |
attempt at |
or highly |
||
Analysis |
insights are |
with some |
EDA with |
incorrect |
||
(EDA) (15 |
extracted |
minor |
major missing |
implementati |
||
Marks) |
using .describe |
misinterpretati ons. EDA |
elements. Misinterpretati |
on. |
||
(), |
covers |
ons of results. |
||||
.info(), groupby() , and other |
necessary components. |
|||||
relevant |
||||||
operations. |
||||||
Analysis |
||||||
demonstrat |
||||||
es strong |
||||||
understandi |
||||||
ng. |
||||||
Answerin |
Three well- |
Three |
Some |
Research |
No attempt |
|
g |
formed, |
questions are |
research |
questions are |
to answer |
|
Analytica |
relevant |
formulated and |
questions |
weak or |
analytical |
|
l |
research |
answered with |
are |
generic. |
questions or |
|
Question |
questions |
mostly correct |
unclear |
Answers show |
answers are |
|
s (15 |
are |
methods, but |
or not |
minimal |
completely |
|
Marks) |
formulated |
minor gaps in |
entirely |
analysis and |
incorrect. |
|
and |
justification or |
relevant. |
misinterpretati |
|||
thoroughly |
interpretation. |
Answers |
on. |
|||
answered |
are |
|||||
using |
partially |
|||||
correct data |
correct |
|||||
analysis |
but lack |
|||||
methods. |
depth. |
|||||
Justification |
||||||
s are well- |
||||||
written and |
||||||
insightful. |
Part C: Data Visualization using Matplotlib and Seaborn (30 Marks)
Criteria |
Excellent (18-20 Marks) |
Good (14- 17 Marks) |
Satisfactory (10-13 Marks) |
Needs Improvement (5-9 Marks) |
Poor (0-4 Marks) |
Awarde d Marks |
Creating |
A variety of |
Most |
Basic |
Limited effort |
No attempt |
|
Data |
appropriate |
visualizatio |
visualization |
in creating |
to create |
|
Visualizatio |
visualization |
ns are |
s are |
visualizations. |
visualizatio |
|
ns (20 |
s (line plot, |
correct and |
provided but |
Major errors |
ns or |
|
Marks) |
bar chart, |
meaningful |
lack clarity, |
in |
visualizatio |
|
box plot, |
, with |
consistency, |
implementati |
ns are |
||
heatmap, |
some |
or are |
on or missing |
completely |
||
scatter plot) |
minor |
missing key |
several types |
incorrect. |
||
are created |
formatting |
plots. |
of |
|||
with proper |
issues. |
Limited use |
visualizations. |
|||
labeling, |
Titles and |
of |
||||
titles, and |
labels are |
Matplotlib |
||||
color |
mostly |
and Seaborn |
||||
schemes. |
clear. |
customizatio |
||||
Plots |
n. |
|||||
enhance |
||||||
data |
||||||
understandi |
||||||
ng. |
||||||
Interpretin |
All |
Most |
Some |
Limited or |
No |
|
g |
visualization |
visualizatio |
interpretatio |
unclear |
interpretati |
|
Visualizatio |
s are |
ns are |
ns are |
explanations |
on of |
|
ns (10 |
explained |
interpreted |
incorrect or |
of |
visualizatio |
|
Marks) |
clearly with |
correctly, |
lack clear |
visualizations. |
ns. |
|
meaningful |
but some |
reasoning. |
No |
|||
insights. |
explanatio |
Basic |
justifications |
|||
Justifications |
ns lack |
attempt to |
provided. |
|||
for each |
depth. |
describe |
||||
visualization |
visualization |
|||||
method are |
s. |
|||||
well |
||||||
articulated. |
Part D: Report and Video Presentation (10 Marks)
Criteria |
Excellent (4- 5 Marks) |
Good (3 Marks) |
Satisfactory (2 Marks) |
Needs Improvemen t (1 Mark) |
Poor (0 Marks) |
Awarde d Marks |
Report |
The report is |
The report |
The report |
The report |
No report |
|
Quality (5 |
well- |
is mostly |
is |
lacks proper |
submitted |
|
Marks) |
structured |
well- |
somewhat |
formatting, is |
or report is |
|
with clear |
structured |
structured |
incomplete, |
entirely |
||
sections |
but may |
but lacks |
or has weak |
inadequate |
||
(Introduction |
have minor |
depth, |
explanations. |
. |
||
, |
issues in |
organization |
||||
Methodolog |
clarity or |
, or contains |
||||
y, Findings, |
depth of |
some |
||||
Conclusion). |
explanation |
unclear |
||||
Writing is |
. |
sections. |
||||
professional |
||||||
and well- |
||||||
supported by |
||||||
visuals and |
||||||
citations. |
||||||
Video |
Video is well- |
Video is |
Video is |
Video is |
No video |
|
Presentatio |
organized, |
clear and |
somewhat |
unclear, |
submitted |
|
n (5 Marks) |
engaging, |
covers |
structured |
disorganized, |
or video is |
|
and presents |
most |
but lacks |
or lacks |
irrelevant. |
||
key findings |
findings but |
depth, |
explanation |
|||
effectively. |
lacks |
clarity, or is |
of findings. |
|||
Clear |
engagemen |
too brief. |
||||
explanations |
t or some |
|||||
and |
key details. |
|||||
professional |
||||||
delivery. |
Final Grade Conversion (Total 100 Marks)
Marks (100 Total) |
Grade |
Performance Level |
85 100 |
A |
Excellent |
70 84 |
B |
Good |
55 69 |
C |
Satisfactory |
40 54 |
D |
Needs Improvement |
0 39 |
F |
Poor |
Overall Feedback
First Marker:
Total Mark:
Signature |
Date |
Second Marker
Total Mark:
Signature |
Date |