Data Mining and Neural Networks MA3022
- Subject Code :
MA3022
- University :
others Exam Question Bank is not sponsored or endorsed by this college or university.
- Country :
India
MA3022/MA4022/MA7022 Data Mining and Neural Networks
Homework 1
Due till 03.02.2025
100 marks available
Theoretical background and two mini-research projects
Theoretical background (20 marks)
- Give a description of classification and clustering (5 marks)
- What is the difference between them? (5 marks)
- Describe KNN approach and Harts algorithm for data (5 marks)
- Describe the K-means (5 marks)
Project 1. Condensed Nearest Neighbour for data reduction in Nearest Neighbour classi- fier (40 marks)
Go to web-page
https://github.com/Mirkes/Data_Mining_Softbook/wiki/KNN-and-potential-ene Read text. Download application https://github.com/Mirkes/Data_Mining_Softbook/blob/master/knn/
Task 1. (10 marks). Study how the number of prototypes depends on the number of points for two convex well-separated classes.
Task 2. (10 marks) Prepare a series of examples with more sophisticated non-convex shapes of well-separated classes. Study how the number of prototypes depends on the number of points in these classes.
Task 3. (10 marks) Study how the number of prototypes and outliers depends on the num- ber of points for two well-separated classes with added background uniformly distributed noise (option random).
Task 4. (10 marks) In conclusion, discuss the results and propose a hypothesis for further study.
Do not forget to save and submit the configurations of the classes and prototypes as figures!
Project 2. Dynamics of k-means clustering (40 marks)
Go to web-page
https://github.com/Mirkes/Data_Mining_Softbook/wiki/k-means-and-k-medoids Read text. Download application https://github.com/Mirkes/Data_Mining_Softbook/blob/master/kmeans/ KMeansKMedoids.jar
Task 1. (10 marks). (Exploration) Find the final k-means configurations for a series of datasets and various initial generations of centroids. How many different configuration did you observe? How frequently did they appear? How many iterations were required?
Task 2. (10 marks). Formulate a hypothesis about a number of different final k-means config- urations and their frequencies. Analyse, how they depend on the number of data points. Check this hypothesis on the random sets of equidistributed points.
Task 3. (10 marks). Formulate a hypothesis about the convergence rate of k-means and its dependence on the number of data points. Check this hypothesis on the random sets of equidis- tributed points (use the same series of experiments as in question 2).
Task 4. (10 marks). In conclusion, discuss the results and propose a hypothesis for further study.
Do not forget to save and submit the configurations of the classes and prototypes as figures!