MA7022 Data Mining and Neural Networks
- Subject Code :
MA7022
- University :
others Exam Question Bank is not sponsored or endorsed by this college or university.
- Country :
India
MA3022/MA4022/MA7022 Data Mining and Neural Networks
Computational Task 2
Due till 10.03.2025
100 marks available
Balloons Data Set {https://archive.ics.uci.edu/ml/datasets/Balloons}
Data previously used in cognitive psychology experiment; 4 data sets represent different con- ditions of an experiment
Description of database and original experiments can be found in paper [Pazzani, M. (1991). The influence of prior knowledge on concept acquisition: Experimental and computational re- sults. Journal of Experimental Psychology: Learning, Memory & Cognition, 17, 3, 416-432.] for this assignment four databases was joined and duplicates were removed. As a result we have data set presented in file Baloon.csv on BB or in table below.
Database consist on 13 cases of Inflated (T) and 13 cases of not Inflated (F).
|
Table 1: Baloon database
Task 1. Decision trees predictions (20 marks)
Select randomly 3 Inflated (T) and 3 not Inflated (F) for the test set (exclude them from the training set, of course).
Create a decision tree for prediction of the Inflation (using the training set).
Test the prediction results on the test set (6 examples).
Repeat this procedure 7 times for 7 different choices of the test examples.
Is the structure of the decision tree different for different training sets? Describe the differences (if any).
Task 2. Pruning (20 marks)
Let us do not split the cells with less than m examples. Delete splitting of the cells with less than m examples in the previously created trees.
Test the pruned trees on the corresponding test sets.
For which m the testing is better (try m =2, 3, and 4)?
Present learning curves (average testing error rate as function of m and test error versus training set error for different m). Select the best m.
Task 3. Comparison with kNN (15 marks)
Evaluate the kNN error rate (k = 1 and k = 3) for complete dataset. Compare the decision tree error rate (for the best m) with kNN error rate. Comment.
Task 4. Linear separability (20 marks)
Solve the problem using the linear Fisher discriminant for the complete data set (select the threshold with minimization of the number of errors).
Teach the Rosenblatt perceptron to solve this problem. Describe the result. Compare to Fishera Cs discriminant.
Task 5. Clustering. (15 marks)
Find clusters using k-means (k=2,3).
Evaluate the quality of clustering using one of the standard indexes.
Analyse distribution of Inflated and not inflated in clusters. Are there clusters with definite one class?
Report (with diagrams and plots) (10 mark for quality)