The main aim of this project is to compare the Decision Tree algorithm and Backpropagation algorithm. The study had resulted in understanding of which datasets performed well over the both algorithms. The accuracy of classification and the performance of algorithm was analysed under the following parameters:
Classification Accuracy:
Percentage of how accurately the datasets are classified for a particular classification algorithm.
Kappa Statistic:
The value which is used to assess the rate of classification of categorical data is called as Kappa Statistic.
Root Mean Squared Error and Mean Absolute Error:
With these two values, we can find the average of error and assume the measure of accuracy. If both the error values are more, then the accuracy is less and if the error values are less, the accuracy is more.
ALGORITHMS AND WEKA TOOL:
I have made use of a tool called Weka [1] which has got all the set of algorithms in it. This tool was very useful, if I wanted to test an algorithm over a given dataset. The version of tool which I used for conducting the comparative study was Weka 3.
For performing Decision Tree algorithm in Weka, I used J48. While using J48, I had an option to use with pruning and without pruned. But I have selected to analyse using Pruned J48 for Decision Tree algorithms. To assess the performance of the Backpropagation algorithm on Weka tool, I selected Multilayer Perceptron. Both the J48 and Multilayer Perceptron algorithm is included in Weka 3.6 Package which is free to download from the [1] website.
TESTING PROCEDURE:
The tests on datasets were performed over the two algorithms, Decision Tree and Backpropagation algorithm. When I started the experiment, I had to fix some parameters as constants throughout the test process for all the data. The main parameters for Decision Tree algorithm which had to be same for all datasets are:
Confidence Factor : 0.25
minNumObj : 2
numFolds : 3
All the three parameters are set as default values and we are not going to change them. Confidence factor is set to reduce the problem on overfitting and the next two parameters are the number of instances and the number of folds. They are set to reduce the error pruning. And I have performed the Pruned J48 for evaluating the decision tree.
In Backpropagation algorithm also I had fixed some parameters as default before experimenting. They are given below:
Learning Rate : 0.3
Momentum : 0.2
Training Time : 500
Validation Set Size : 0
Validation Threshold : 20
For both the algorithms, I have done the procedure using k-fold cross validation. Cross validation is nothing but dividing the data into k number (k=10 in our case) of subsets of data. In some datasets, I had considered the values of F-Measure and ROC Area for comparisons.
DATASETS:
The datasets used for conducting this study are real valued categorical datasets. As it is a comparison of two different algorithms, I tried to choose more than five datasets and ended up with eight datasets finally. All the datasets are chosen from the UCI machine learning repository [2] and they don't have any missing values. After the datasets were chosen, I converted all of them in ARFF format [3], rather than using the direct link in Weka. The characteristics of datasets are Multivariate and the basic task of them is Classification.
The information about the datasets is tabulated below:
Dataset Name
Number of
Instances
Number of Attributes
Number of Classes
Missing Values
Ecoli
336
8
8
No
Glass Identification
214
10
7
No
Ionosphere
351
34
2
No
Iris Plant
150
4
3
No
Magic Gamma Telescope
19020
11
2
No
Image Segmentation
2310
19
7
No
Sonar - Mines vs. Rocks
208
60
2
No
Blood Transfusion
748
5
2
No
TEST RESULTS:
The results of the datasets on algorithms are tabulated below.
Name
Correctly Classified Instance
Kappa Statistic
Mean Absolute Error
Root Mean Squared Error
Ecoli
86.0119%
0.8066
0.0484
0.1704
Glass Identification
96.2617%
0.9492
0.0196
0.0946
Ionosphere
91.1681%
0.7993
0.0938
0.2786
Iris
Plant
97.3333%
0.96
0.0327
0.1291
Magic Gamma Telescope
85.8728 %
0.6776
0.1934
0.327
Image Segmentation
96.0606%
0.954
0.0159
0.097
Sonar - Mines vs. Rocks
82.2115%
0.6419
0.1901
0.3964
Blood Transfusion
78.2086%
0.2844
0.2958
0.3931
The table above is the results of eight datasets on Multilayer Perceptron algorithm. The result of the datasets on pruned J48 is given below:
Name
Correctly Classified Instance
Kappa Statistic
Mean Absolute Error
Root Mean Squared Error
Ecoli
84.2262%
0.7824
0.0486
0.1851
Glass Identification
96.729%
0.9557
0.0093
0.0967
Ionosphere
91.453%
0.8096
0.0938
0.2901
Iris
Plant
96%
0.94
0.035
0.1586
Magic Gamma Telescope
85.0578%
0.6614
0.1955
0.3509
Image Segmentation
96.9264%
0.9641
0.0104
0.0914
Sonar - Mines vs. Rocks
71.1538%
0.422
0.2863
0.5207
Blood Transfusion
77.8075%
0.3424
0.3037
0.3987
A measure [4] of dataset between the categorization of predicted and the observed is called as Kappa Statistic. If the value predicted and observed are same, then the Kappa Statistic value is equal to 1. With RMS error and Mean Absolute error, the average error value will be found. If both the error values are more, then the accuracy is less and vice versa. Correctly Classified Instance is the percentage of instance which has been classified correctly.
EVALUATING THE RESULTS:
From the results, we can see that both the algorithm have classified the datasets according to the number of instances and the number of attributes. The accuracy and performance of the both the algorithms are not the same in all the datasets and the variations are more. So, I have divided this section into eight parts discussing the accuracy and performance of the given datasets.
ECOLI:
The ecoli dataset has got 336 instances with 8 attributes. The accuracy of classifying the instances was good in MLP with 86.01% than the J48 which classified 84.23% of instances correctly.
Name
MLP
J48
Correctly Classified Instance
86.0119%
84.2262%
Kappa Statistic
0.8066
0.7824
Mean Absolute Error
0.0484
0.0486
Root Mean Squared Error
0.1704
0.1851
It was found that, out of eight classes, the True Positive rate and False Positive rate for last two classes was 0 in both algorithm outputs. Though the weighted averages of ROC area in MLP is higher than J48. Kappa statistic value was also good in MLP. Only drawback of Multilayer Perceptron over this dataset is that the time taken for training dataset is more than the time taken by J48, otherwise it is good in terms of accuracy and performance.
GLASS IDENTIFICATION:
Both the algorithms has classified the instances correctly with J48 performing slightly better than Multilayer Perceptron. The output is tabulated below for the ease of references. The root mean squared error was almost same in both algorithms but the absolute error from MLP was double the value of J48.
This training dataset has achieved a greater value of kappa statistic from both algorithms and it is noted that the average ROC area of Backpropagation algorithm is 0.991, which is also a parameter to check the classification ratio. If it is closer to 1.0, then it is assumed to be good.
Name
MLP
J48
Correctly Classified Instance
96.2617%
96.729%
Kappa Statistic
0.9492
0.9557
Mean Absolute Error
0.0196
0.0093
Root Mean Squared Error
0.0946
0.0967
The above given table is data related to Glass Identification dataset on two algorithms. On overall from this data, I assume that Decision Tree has performed well only with a little difference to Backpropagation algorithm.
IONOSPHERE:
The mean absolute error value obtained from MLP and Pruned J48 is the same and the accuracy of classification in J48 is slightly more than that of MLP. But when analysing the RMS error, I found that it is higher in Pruned J48.
Name
MLP
J48
Correctly Classified Instance
91.1681%
91.453%
Kappa Statistic
0.7993
0.8096
Mean Absolute Error
0.0938
0.0938
Root Mean Squared Error
0.2738
0.2901
This dataset has only two classes and from the confusion matrix, we can easily identify that difference between two algorithms in terms of instances is one. MLP has misclassified an extra instance than the Pruned J48. Below given is the confusion matrix for Ionosphere dataset.
Multilayer Perceptron:
a b <-- classified as
98 28 | a = b
3 222 | b = g
Pruned J48:
a b <-- classified as
104 22 | a = b
8 217 | b = g
For the Ionosphere dataset, I consider that Decision Tree algorithm was better, because MLP had some disadvantages in performance and accuracy on this dataset.
IRIS:
Name
MLP
J48
Correctly Classified Instance
97.3333%
96%
Kappa Statistic
0.96
0.94
Mean Absolute Error
0.0327
0.035
Root Mean Squared Error
0.1291
0.1586The Iris dataset consists of three classes and four attributes. This datasets achieved a high classification rate in both algorithms. On comparing both algorithms, the classifying rate in Multilayer Perceptron was better than the values of Decision Tree algorithm. The mean absolute error and RMS error was very low and the kappa statistic value was high enough in both algorithmic outputs. Although they were very similar in performance, I predicate that MLP was better than Pruned J48 in Iris Dataset.
MAGIC GAMMA TELESCOPE:
Out of eight datasets I had chosen, this one has got the highest number of instances with eleven attributes and two classes. From this dataset output, it is possible for us to get the accurate values of classification rate and also the performance.
Name
MLP
J48
Correctly Classified Instance
85.8728%
85.0578%
Kappa Statistic
0.6776
0.6614
Mean Absolute Error
0.1934
0.1955
Root Mean Squared Error
0.327
0.3509
The MLP has outdone the Decision Tree algorithm in terms of all the parameter. Percentage of Classification and Kappa Statistic value is more in MLP when compared to Pruned J48. The Mean Absolute Error and RMS Error was more in both the outputs, but, on observing, MLP was better than the J48.
IMAGE SEGMENTATION:
The instances have been classified at a higher percentage in both algorithms and the Kappa Statistic is also near the 1.0 value. On examining the best one, it was found that J48 was more suitable than the MLP. But when I checked the weighted averages of ROC Area [5] which is usually represented as a graph between True Positives rate and False Positive rate, it is noted that the value got from MLP was 0.995 and the ROC of J48 was 0.988.
Name
MLP
J48
Correctly Classified Instance
96.0606%
96.9264%
Kappa Statistic
0.954
0.9641
Mean Absolute Error
0.0159
0.0104
Root Mean Squared Error
0.097
0.0914
Usually ROC Area of 1.0 is considered to be a perfect test. So, in terms of ROC, MLP was good. But on overall, it is assumed that the J48 has performed well on datasets.
SONAR - MINES VS ROCKS:
Sonar dataset have 208 instances, 60 attributes and 2 classes. MLP exhibited well in this dataset, but the classification rate of instances and Kappa Statistic was moderate. The Kappa Statistic of J48 was less than 0.5 whereas the same on MLP was better achieving more than 0.5 (although it is very low).
Name
MLP
J48
Correctly Classified Instance
82.2115%
71.1538%
Kappa Statistic
0.6419
0.422
Mean Absolute Error
0.1901
0.2863
Root Mean Squared Error
0.3964
0.5207
This dataset has resulted in high error values from both algorithmic values. But on comparing both algorithms, for this dataset, I would conclude that MLP is good.
BLOOD TRANSFUSION:
The Blood Transfusion datasets, which has 748 instances, 5 attributes and 2 classes, achieved a classification accuracy of less than 80% from both the algorithms.
Name
MLP
J48
Correctly Classified Instance
78.2086%
77.8075%
Kappa Statistic
0.2844
0.3424
Mean Absolute Error
0.2958
0.3037
Root Mean Squared Error
0.3931
0.3987
Out of two, MLP was better at 78.2086% than J48 which was just 0.5% lesser than the former. Kappa Statistic of was also very low with high errors in the both outputs. Even if the kappa statistic is not very high, it will usually be around 0.8 or 0.7. But in our case, it is very low in value from both outputs. Although the performance of MLP and J48 was somewhat same, on overall comparison, MLP was better than Pruned J48.
CONCLUSION:
I chose datasets in a way that they had different ranges of instances, classes and attributes. And it made me to get the outputs of accuracy and performance of datasets with various properties. On analysing the results which was tabulated, it is seen than, five out of eight datasets was good in classifying, using MLP. And three datasets was good in classifying using Pruned J48.
We cannot come to a conclusion that, MLP is the best classifier and J48 is not a good one, because MLP also had some of its own disadvantages. When I was separating the data according to the datasets, I noted the training time taken for each process. Finally when I went through that data, I understood that the training time taken is more in MLP than Pruned J48. On most of the training data, J48 takes only a less amount of time, whereas the MLP takes around 5 to 8 times of what Pruned J48 takes.
I would like to come to a conclusion that the Multilayer Perceptron was good in our datasets and the Pruned J48 also performed well with only a minor difference in accuracy and performance from Multilayer Perceptron.