DataMiningandNeuralNetworks MA3022/MA4022/MA7022
- Subject Code :
MA3022-MA4022-MA7022
MA3022/MA4022/MA7022 DataMiningandNeuralNetworks
Duetill03.02.2025
100marksavailable
Theoreticalbackgroundandtwomini-researchprojects
Theoreticalbackground(20marks)
- Give a description of classification and clustering(5marks)
- What is the difference between them?(5marks)
- Describe KNN approach and Harts algorithm for data(5marks)
- Describe the K-means (5marks)
Project1. CondensedNearestNeighbourfordatareductioninNearestNeighbourclassi- fier (40 marks)
Gotoweb-page
https://github.com/Mirkes/Data_Mining_Softbook/wiki/KNN-and-potential-eneRead text. Download applicationhttps://github.com/Mirkes/Data_Mining_Softbook/blob/master/knn/
Task1. (10marks). Studyhowthenumberofprototypesdependsonthenumberofpointsfor two convex well-separated classes.
Task2.(10marks)Prepareaseriesofexampleswithmoresophisticatednon-convexshapesof well-separated classes.Study how the number of prototypes depends on the number of points in these classes.
Task 3.(10 marks)Study how the number of prototypes and outliers depends on the num- berofpointsfortwowell-separatedclasseswithaddedbackgrounduniformlydistributednoise (option random).
Task 4.(10 marks)In conclusion, discuss the results and propose a hypothesis for further study.
Donotforgettosaveandsubmittheconfigurationsoftheclassesandprototypesasfigures!
Project2.Dynamicsofk-meansclustering(40marks)
Gotoweb-page
https://github.com/Mirkes/Data_Mining_Softbook/wiki/k-means-and-k-medoidsRead text. Download applicationhttps://github.com/Mirkes/Data_Mining_Softbook/blob/master/kmeans/KMeansKMedoids.jar
Task1.(10marks).(Exploration)Findthefinalk-meansconfigurationsforaseriesofdatasets andvariousinitialgenerationsofcentroids.Howmanydifferentconfigurationdidyouobserve? How frequently did they appear? How many iterations were required?
Task2. (10marks). Formulate a hypothesis about a number of different final k-means config- durations and their frequencies. Analyse, how they depend on the number of data points. Check this hypothesis on the random sets of equidistributed points.
Task 3.(10 marks).Formulate a hypothesis about the convergence rate of k-means and its dependenceonthenumberofdatapoints. Checkthishypothesisontherandomsetsofequidis- tributed points (use the same series of experiments as in question 2).
Task 4.(10 marks).In conclusion, discuss the results and propose a hypothesis for further study.
Do not forget to save and submit the configurations of the classes and prototypes as figures!