diff_months: 4

MA7022 Data Mining and Neural Networks

Flat 50% Off Order New Solution
Added on: 2025-05-21 10:58:12
Order Code: LD523972
Question Task Id: 0

MA3022/MA4022/MA7022 Data Mining and Neural Networks

Computational Task 2

Due till 10.03.2025

100 marks available

Balloons Data Set {https://archive.ics.uci.edu/ml/datasets/Balloons}

Data previously used in cognitive psychology experiment; 4 data sets represent different con- ditions of an experiment

Description of database and original experiments can be found in paper [Pazzani, M. (1991). The influence of prior knowledge on concept acquisition: Experimental and computational re- sults. Journal of Experimental Psychology: Learning, Memory & Cognition, 17, 3, 416-432.] for this assignment four databases was joined and duplicates were removed. As a result we have data set presented in file Baloon.csv on BB or in table below.

Database consist on 13 cases of Inflated (T) and 13 cases of not Inflated (F).







Color


Size


Act


Age


Inflated


YELLOW


SMALL


STRETCH


ADULT


T


YELLOW


SMALL


STRETCH


CHILD


T


YELLOW


SMALL


DIP


ADULT


T


YELLOW


SMALL


DIP


CHILD


F


YELLOW


LARGE


STRETCH


ADULT


T


YELLOW


LARGE


STRETCH


CHILD


T


YELLOW


LARGE


DIP


ADULT


T


YELLOW


LARGE


DIP


CHILD


F


PURPLE


SMALL


STRETCH


ADULT


T


PURPLE


SMALL


STRETCH


CHILD


T


PURPLE


SMALL


DIP


ADULT


T


PURPLE


SMALL


DIP


CHILD


F


PURPLE


LARGE


STRETCH


ADULT


T


PURPLE


LARGE


STRETCH


CHILD


T


PURPLE


LARGE


DIP


ADULT


T


PURPLE


LARGE


DIP


CHILD


F


YELLOW


SMALL


STRETCH


CHILD


F


YELLOW


SMALL


DIP


ADULT


F


YELLOW


LARGE


STRETCH


CHILD


F


YELLOW


LARGE


DIP


ADULT


F


PURPLE


SMALL


STRETCH


CHILD


F


PURPLE


SMALL


DIP


ADULT


F


PURPLE


LARGE


STRETCH


CHILD


F


PURPLE


LARGE


DIP


ADULT


F


YELLOW


SMALL


DIP


CHILD


T


YELLOW


LARGE


STRETCH


ADULT


F


PURPLE


SMALL


STRETCH


ADULT


F


PURPLE


LARGE


STRETCH


ADULT


F



Table 1: Baloon database

Task 1. Decision trees predictions (20 marks)

Select randomly 3 Inflated (T) and 3 not Inflated (F) for the test set (exclude them from the training set, of course).

Create a decision tree for prediction of the Inflation (using the training set).

Test the prediction results on the test set (6 examples).

Repeat this procedure 7 times for 7 different choices of the test examples.

Is the structure of the decision tree different for different training sets? Describe the differences (if any).

Task 2. Pruning (20 marks)

Let us do not split the cells with less than m examples. Delete splitting of the cells with less than m examples in the previously created trees.

Test the pruned trees on the corresponding test sets.

For which m the testing is better (try m =2, 3, and 4)?

Present learning curves (average testing error rate as function of m and test error versus training set error for different m). Select the best m.

Task 3. Comparison with kNN (15 marks)

Evaluate the kNN error rate (k = 1 and k = 3) for complete dataset. Compare the decision tree error rate (for the best m) with kNN error rate. Comment.

Task 4. Linear separability (20 marks)

Solve the problem using the linear Fisher discriminant for the complete data set (select the threshold with minimization of the number of errors).

Teach the Rosenblatt perceptron to solve this problem. Describe the result. Compare to Fishera Cs discriminant.

Task 5. Clustering. (15 marks)

Find clusters using k-means (k=2,3).

Evaluate the quality of clustering using one of the standard indexes.

Analyse distribution of Inflated and not inflated in clusters. Are there clusters with definite one class?

Report (with diagrams and plots) (10 mark for quality)

  • Uploaded By : Akshita
  • Posted on : May 21st, 2025
  • Downloads : 0
  • Views : 170

Order New Solution

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more