Programming Task and Manual

Assessment 2

Subject: ISY503

Version 1.0

Prepared by

Torrens University Australia

Table of contents

TOC o "1-3" h z u 1. Numerical hyperparameter tuning PAGEREF _Toc132113031 h 11.1 DNNRegressor PAGEREF _Toc132113032 h 11.2 LinearRegressor PAGEREF _Toc132113033 h 21.3 DNNLinearCombinedRegressor PAGEREF _Toc132113034 h 31.4 RandomForestRegressor PAGEREF _Toc132113035 h 42. Numerical models with and without normalization PAGEREF _Toc132113036 h 52.1 No normalization PAGEREF _Toc132113037 h 52.2 Min-max normalization PAGEREF _Toc132113038 h 62.3 Z-score normalization PAGEREF _Toc132113039 h 63. Encoding PAGEREF _Toc132113040 h 73.1 One-hot encoding PAGEREF _Toc132113041 h 73.2 Ordinal encoding PAGEREF _Toc132113042 h 83.3 DNNRegressor tuning with one-hot encoding PAGEREF _Toc132113043 h 84. Using all features PAGEREF _Toc132113044 h 94.1 One-hot encoding without normalization PAGEREF _Toc132113045 h 94.2 One-hot encoding with Z-score PAGEREF _Toc132113046 h 104.3 Ordinal encoding without normalization PAGEREF _Toc132113047 h 114.4 Ordinal encoding with Z-score PAGEREF _Toc132113048 h 114.5 RandomForestRegressor tuning with ordinal encoding PAGEREF _Toc132113049 h 12Glossary PAGEREF _Toc132113050 h 12

1. Numerical hyperparameter tuningDuring data cleaning, one column and several rows containing missing data were removed. As a result, the number of samples has decreased from 205 to 193. The numerical dataset was split into separate training (x_num_train, y_num_train) and evaluation data (x_num_eval, y_num_eval). The corresponding losses are shown in the tables. If the training and evaluation losses are close to each other, it is a promising sign, but not an accurate metric in itself. So, for the hyperparameter tuning, the RMSE is used instead, where a lower value indicates better predictions. It is important to note that the four models did not use exactly the same dataset, so the results are not comparable.

1.1 DNNRegressorA

Step Optimizer Batch size Learningrate

Hidden units Training steps RMSE Training loss Evaluation loss

01 Adagrad16 0.01 64 10000 4198.5 289446270 260001950

02 Adadelta16 0.01 64 10000 9728.3 1036619460 1395931000

03 Adam 16 0.01 64 10000 3658.8 107603220 197455780

04 Ftrl16 0.01 64 10000 4295.7 300022500 272188600

05 ProximalAdagrad16 0.01 64 10000 4262.3 297181570 267964720

06 RMSProp16 0.01 64 10000 3612.8 108048184 192521060

07 RMSProp8 0.01 64 10000 4018.8 66631724 119109580

08 RMSProp24 0.01 64 10000 3590.2 232436590 253490580

09 RMSProp32 0.01 64 10000 3736.6 159982290 411885440

10 RMSProp24 0.001 64 10000 3924.2 287867420 302860800

11 RMSProp24 0.1 64 10000 4654.2 290740300 426014370

12 RMSProp24 0.01 64, 32 10000 4299.1 108459070 363487360

13 RMSProp24 0.01 32, 16 10000 3799.0 96765350 283839460

14 RMSProp24 0.01 32 10000 3990.3 222847090 313142180

15 RMSProp24 0.01 128 10000 3766.8 284472000 279042720

16 RMSProp24 0.01 64 15000 3621.1 112965510 257878400

17 RMSProp24 0.01 64 5000 3598.5 212350780 254675660

18 RMSProp24 0.01 64 3000 3569.9 272283600 250645740

19 RMSProp24 0.01 64 1000 3924.3 287241500 302876000

20 RMSProp24 0.01 64 8000 3659.8 272362500 263418110

21 RMSProp24 0.01 64 4000 3585.2 212206460 252794220

22 Adam 24 0.01 64 3000 4165.9 261785300 341319550

This model performed best with the RMSProp and Adam optimizers. I decided to use RMSProp as it performed slightly better. I tried to vary the number of hidden layers and nodes, but the changes did not make a positive impact. The lowest RMSE was achieved at step 18 by reducing the number of training steps. However, further reducing the training steps did not improve the result.

1.2 LinearRegressorTable 2: LinearRegressor tuning (Task 1)

Step Optimizer Batch size Learningrate

Training steps RMSE Training loss Evaluation loss

01 Adagrad16 0.01 10000 5181.6 1013684900 349043870

02 Adadelta16 0.01 10000 12347.8 3638716400 1982086300

03 Adam 16 0.01 10000 3255.7 274230600 137797070

04 ProximalAdagrad16 0.01 10000 5176.0 1010307600 348285280

05 Ftrl16 0.01 10000 5182.5 1014172860 349171360

06 RMSProp16 0.01 10000 3266.0 268825250 138673300

07 Adam 8 0.01 10000 3263.1 146048290 83057544

08 Adam 24 0.01 10000 3315.5 379672060 214355950

09 Adam 16 0.001 10000 4269.3 640512830 236952110

10 Adam 16 0.1 10000 3179.2 198521980 131396216

11 Adam 16 0.2 10000 2850.9 206183860 105663130

12 Adam 16 0.3 10000 3839.53 233292450 191646450

13 Adam 16 0.2 15000 2941.1 188396690 112452770

14 Adam 16 0.2 5000 3147.5 199159790 128788740

15 Adam 16 0.2 8000 2936.5 193435740 112104740

16 Adam 16 0.2 12000 2927.1 189974430 111383590

17 RMSProp16 0.2 10000 2909.7 192797060 110068000

LinearRegressor also performed well with the Adam and RMSProp optimizers, but in this case, the results favoured the former. This regressor performed particularly well, achieving the best result in step 11 by increasing the learning rate to 0.2.

1.3 DNNLinearCombinedRegressorTable 3: DNNLinearCombinedRegressor tuning (Task 1)

Step Optimizer Batch size Learningrate

Hidden units Training steps RMSE Training loss Evaluation loss

01 Adagrad16 0.01 64 10000 3515.1 328320100 160627800

02 Adadelta16 0.01 64 10000 5890.1 1265183600 451021440

03 Adam 16 0.01 64 10000 3252.2 151474340 137502060

04 Ftrl16 0.01 64 10000 3470.2 325567940 156556540

05 ProximalAdagrad16 0.01 64 10000 3519.9 327918530 161066620

06 RMSProp16 0.01 64 10000 3071.8 141448600 122668090

07 RMSProp8 0.01 64 10000 3021.7 88667096 71221080

08 RMSProp6 0.01 64 10000 3077.0 59688664 52751132

09 RMSProp24 0.01 64 10000 3251.6 156969100 206173280

10 RMSProp8 0.001 64 10000 3142.1 114338740 77012660

11 RMSProp8 0.1 64 10000 2639.4 111357390 54340052

12 RMSProp8 0.2 64 10000 3597.9 97142616 100970660

13 RMSProp8 0.1 64, 32 10000 3630.8 109939860 102827180

14 RMSProp8 0.1 128, 64 10000 3150.1 101297930 77401970

15 RMSProp8 0.1 32, 16 10000 3432.8 109361256 91920110

16 RMSProp8 0.1 32 10000 3320.7 95783980 86011816

17 RMSProp8 0.1 128 10000 2672.0 105687784 55689804

18 RMSProp8 0.1 256 10000 2942.6 98697650 67539610

19 RMSProp8 0.1 64 15000 2776.7 98636560 60139550

20 RMSProp8 0.1 64 5000 2656.1 106008630 55030550

21 RMSProp8 0.1 64 8000 2830.9 95254936 62509868

22 RMSProp8 0.1 64 3000 2699.4 107670960 56839560

23 RMSProp8 0.1 64 1000 3013.0 110657790 70813260

24 RMSProp8 0.1 64 2500 2990.0 99884430 69733580

25 RMSProp8 0.1 64 4000 3173.6 100663470 78563860

26 RMSProp8 0.1 64 12000 3176.9 96585260 78725350

The DNNLinearCombinedRegressor uses two optimizers, but for simplicity, the same optimizer is used in both cases. This model also performed slightly better with the RMSProp optimizer. The best result was achieved at step 11, after which no further improvements were made.

1.4 RandomForestRegressorTable 4: RandomForestRegressor tuning (Task 1)

Step Number of trees Minimum samples to split Maximum samples Maximum features RMSE R2

01 100 2 None 1 2524.1 0.85

02 500 2 None 1 2503.1 0.86

03 1000 2 None 1 2603.2 0.85

04 800 2 None 1 2666.5 0.84

05 400 2 None 1 2592.4 0.85

06 600 2 None 1 2549.2 0.85

07 500 4 None 1 2551.39 0.85

08 500 3 None 1 2661.0 0.84

09 500 10 None 1 2868.0 0.81

10 500 2 0.8 1 2579.2 0.85

11 500 2 0.9 1 2554.1 0.85

12 500 2 0.4 1 2810.1 0.82

13 500 2 0.95 1 2486.7 0.86

14 500 2 0.95 0.95 2283.6 0.88

15 500 2 0.95 0.9 2315.6 0.88

16 500 2 0.95 0.85 2291.6 0.88

17 500 2 0.95 0.8 2278.4 0.88

18 500 2 0.95 0.75 2273.8 0.88

19 500 2 0.95 0.7 2338.3 0.87

The RandomForestRegressor produced very good results. Three of the four selected parameters contributed to a reduction in the RMSE, which reached its minimum in step 18.

2. Numerical models with and without normalizationIn this chapter, all four models are compared so that each receives exactly the same dataset as input. The best of the previous numerical models are used without normalization, with min-max normalization, and with Z-score normalization. The models are exactly the same as in the previous chapter, only the input is changed. Normalization is applied to each numerical column in the dataset, except for the price. The normalization section can be found in the cell called Data normalization.

2.1 No normalizationTable 5: Models without normalization (Task 2)

Case Model Hyperparameters RMSE

01 DNNRegressorRMSProp, batch size: 24, learning rate: 0.01, hidden units: 64, steps: 3000 3029.7

02 LinearRegressorAdam, batch size: 16, learning rate: 0.2, steps: 10000 3168.2

03 DNNLinearCombinedRegressorRMSProp, batch size: 8, learning rate: 0.1, hidden units: 64, steps: 10000 2945.0

04 RandomForestRegressorTrees: 500, samples to split: 2, maximum samples: 0.95, maximum features: 0.75 2303.8

Without normalization, RandomForestRegressor produced the best predictions with an RMSE value of 2303.8, which is very close to its previous best. In addition, it is very fast compared to the other algorithms.

2.2 Min-max normalizationTable 6: Models with min-max normalization (Task 2)

Case Model Hyperparameters RMSE

01 DNNRegressorRMSProp, batch size: 24, learning rate: 0.01, hidden units: 64, steps: 3000 4621.0

02 LinearRegressorAdam, batch size: 16, learning rate: 0.2, steps: 10000 6852.0

03 DNNLinearCombinedRegressorRMSProp, batch size: 8, learning rate: 0.1, hidden units: 64, steps: 10000 3813.3

04 RandomForestRegressorTrees: 500, samples to split: 2, maximum samples: 0.95, maximum features: 0.75 3076.6

05 DNNRegressorGradientDescent, batch size: 24, learning rate: 0.01, hidden units: 64, steps: 3000 7653.8

06 DNNRegressorProximalGradientDescent, batch size: 24, learning rate: 0.01, hidden units: 64, steps: 3000 7648.1

All models performed worse with min-max normalization than without. It seems that the algorithms are not sensitive to the numerical feature scales, or the dataset contained too many outlier values. However, the GradientDescent and ProximalGradientDescent optimizers could be applied to the DNNRegressor estimator for the first time.

2.3 Z-score normalizationTable 7: Models with Z-score normalization (Task 2)

Case Model Hyperparameters RMSE

01 DNNRegressorRMSProp, batch size: 24, learning rate: 0.01, hidden units: 64, steps: 3000 3291.1

02 LinearRegressorAdam, batch size: 16, learning rate: 0.2, steps: 10000 11666.2

03 DNNLinearCombinedRegressorRMSProp, batch size: 8, learning rate: 0.1, hidden units: 64, steps: 10000 2801.3

04 RandomForestRegressorTrees: 500, samples to split: 2, maximum samples: 0.95, maximum features: 0.75 2433.7

Interestingly, only DNNLinearCombinedRegressor performed slightly better with Z-score normalization than without, while the other models performed worse. This is a surprising result as the numerical features appeared to be roughly normally distributed. As in all cases, RandomForestRegressor produced the lowest RMSE value. On the other hand, the error of the LinearRegressor reached its highest value.

3. EncodingIn this chapter, first one-hot encoding will be used, and then the models will be evaluated with ordinal encoding as well. All four models are compared using the same dataset, but the split between training and test data is done separately for each encoder. The encoding is applied to each categorical column in the dataset.

3.1 One-hot encodingAll hyperparameters of the models are reset to default values, except the optimizer. This is done because this time we have categorical features. To ensure a fair comparison, LinearRegressor has the same optimizer as the others.

Table 8: Models with one-hot encoding (Task 3)

Case Model Hyperparameters RMSE

01 DNNRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 2335.9

02 LinearRegressorRMSProp, batch size: 16, learning rate: 0.01, steps: 10000 13661.6

03 DNNLinearCombinedRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 2978.8

04 RandomForestRegressorTrees: 100, samples to split: 2, maximum samples: None, maximum features: 1 3416.2

RandomForestRegressor and LinearRegressor did not cope very well with categorical features. On the other hand, DNNRegressor performed better with categorical features than with numerical features.

3.2 Ordinal encodingTable 9: Models with ordinal encoding (Task 3)

Case Model Hyperparameters RMSE

01 DNNRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 6218.5

02 LinearRegressorRMSProp, batch size: 16, learning rate: 0.01, steps: 10000 13801.8

03 DNNLinearCombinedRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 6945.7

04 RandomForestRegressorTrees: 100, samples to split: 2, maximum samples: None, maximum features: 1 4842.4

All models performed worse with ordinal encoding. This is not too surprising, since the categorical features contained little ordinal data. It also seems to be confirmed that LinearRegressor cannot cope with categorical features.

3.3 DNNRegressor tuning with one-hot encodingIt is important to note that the encoded data has changed after the ordinal encoding part, so the hyperparameter tuning starts from the very first step.

Table 10: DNNRegressor tuning (Task 3)

Step Optimizer Batch size Learningrate

Hidden units Training steps RMSE Training loss Evaluation loss

01 RMSProp16 0.01 64 10000 2791.8 84336504 113019430

02 Adadelta16 0.01 64 10000 14622.5 3537497000 3100372500

03 Adagrad16 0.01 64 10000 12674.4 2708710400 2329298400

04 Adam 16 0.01 64 10000 2828.4 84874184 116005890

05 Ftrl16 0.01 64 10000 12733.5 2731700700 2351072300

06 ProximalAdagrad16 0.01 64 10000 12733.5 2731953200 2351092500

07 RMSProp8 0.01 64 10000 2809.2 45180690 57214370

08 RMSProp32 0.01 64 10000 2832.8 150968050 232723620

09 RMSProp16 0.1 64 10000 2742.9 86725530 109092410

10 RMSProp16 0.2 64 10000 2813.3 86951680 114763560

11 RMSProp16 0.001 64 10000 5544.5 565361300 445761180

12 RMSProp16 0.1 64, 32 10000 3865.4 60813040 216656110

13 RMSProp16 0.1 128, 64 10000 2730.3 63659500 108094030

14 RMSProp16 0.1 128, 64, 32 10000 2844.9 55325212 117359930

15 RMSProp16 0.1 256, 128 10000 2940.4 53797892 125372780

16 RMSProp16 0.1 32, 16 10000 4081.4 54846556 241539360

17 RMSProp16 0.1 128, 64 15000 3322.4 54041024 160064770

18 RMSProp16 0.1 128, 64 5000 3306.4 59121356 158526530

19 RMSProp16 0.1 128, 64 8000 3457.57 99356110 173344800

20 RMSProp16 0.1 128, 64 12000 3203.0 66985584 148767500

Even after 20 steps, the initial result was only slightly improved by hyperparameter tuning. The lowest RMSE value was achieved in step 13.

4. Using all featuresIn this chapter, a total of four comparisons will be made using all models. First, one-hot encoding is performed without normalization, then with Z-score normalization. Subsequently, the same procedure is performed with ordinal encoding. All four comparisons are performed using the same dataset, which was achieved by using a fixed random state during the dataset split.

4.1 One-hot encoding without normalizationOnce again, all hyperparameters are reset to default values, except the optimizer. This is done because both numerical and categorical features must be considered by the models, so the previous tuning results are no longer relevant.

Table 11: One-hot encoding without normalization (Task 4)

Case Model Hyperparameters RMSE

01 DNNRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 3325.8

02 LinearRegressorRMSProp, batch size: 16, learning rate: 0.01, steps: 10000 3589.9

03 DNNLinearCombinedRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 2243.2

04 RandomForestRegressorTrees: 100, samples to split: 2, maximum samples: None, maximum features: 1 2460.6

The table shows that the DNNLinearCombinedRegressor performed best with the one-hot encoding of the categorical features, but the other models did not perform badly either.

4.2 One-hot encoding with Z-scoreTable 12: One-hot encoding with Z-score (Task 4)

Case Model Hyperparameters RMSE

01 DNNRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 2472.3

02 LinearRegressorRMSProp, batch size: 16, learning rate: 0.01, steps: 10000 13767.6

03 DNNLinearCombinedRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 2571.9

04 RandomForestRegressorTrees: 100, samples to split: 2, maximum samples: None, maximum features: 1 2631.4

This time, the DNNRegressor performed better with Z-score normalization of the numerical features, while the other models performed worse. Nevertheless, this result is still quite interesting.

4.3 Ordinal encoding without normalizationTable 13: Ordinal encoding without normalization (Task 4)

Case Model Hyperparameters RMSE

01 DNNRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 2659.6

02 LinearRegressorRMSProp, batch size: 16, learning rate: 0.01, steps: 10000 3523.8

03 DNNLinearCombinedRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 2375.0

04 RandomForestRegressorTrees: 100, samples to split: 2, maximum samples: None, maximum features: 1 2170.2

RandomForestRegressor stood out from the other models in this round, producing the best result from the comparisons so far. This performance is a very interesting result, as the dataset contains very little ordinal data.

4.4 Ordinal encoding with Z-scoreTable 14: Ordinal encoding with Z-score (Task 4)

Case Model Hyperparameters RMSE

01 DNNRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 2683.5

02 LinearRegressorRMSProp, batch size: 16, learning rate: 0.01, steps: 10000 12650.2

03 DNNLinearCombinedRegressorRMSProp, batch size: 16, learning rate: 0.01, hidden units: 64, steps: 10000 2743.7

04 RandomForestRegressorTrees: 100, samples to split: 2, maximum samples: None, maximum features: 1 2861.1

This time all models performed worse with the Z-score normalization of the numerical features.

4.5 RandomForestRegressor tuning with ordinal encodingSince RandomForest produced the best result and a single model had to be chosen, the tuning of the hyperparameters was performed on this algorithm.

Table 15: RandomForestRegressor tuning (Task 4)

Step Number of trees Minimum samples to split Maximum samples Maximum features RMSE R2

01 100 2 None 1 2277.9 0.92

02 500 2 None 1 2297.4 0.92

03 2000 2 None 1 2392.1 0.91

04 50 2 None 1 2301.5 0.92

05 100 4 None 1 2729.7 0.89

06 100 3 None 1 2492.3 0.91

07 100 2 0.8 1 2482.1 0.91

08 100 2 0.9 1 2209.75 0.93

09 100 2 0.95 1 2187.7 0.93

10 100 2 0.5 1 2924.8 0.87

11 100 2 0.95 0.8 1782.0 0.95

12 100 2 0.95 0.9 1806.2 0.95

13 100 2 0.95 0.95 1870.0 0.95

14 100 2 0.95 0.85 1855.9 0.95

15 100 2 0.95 0.5 1897.0 0.94

16 500 2 0.95 0.8 1722.6 0.95

17 1000 2 0.95 0.8 1772.9 0.95

18 5000 2 0.95 0.8 1786.5 0.95

The best result was achieved in step 16, significantly increasing the initial RMSE value.

GlossaryR2 - Coefficient of Determination

RMSE - Root Mean Square Error

Download Solution Now

Uploaded By : Pooja Dhaka
Posted on : November 13th, 2024
Downloads : 0
Views : 377

Programming Task and Manual

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Programming Task and Manual

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Request a Call Back