Compare The Decision Tree Algorithm And Backpropagation Algorithm Accounting Essay

Published: October 28, 2015 Words: 2211

The main aim of this project is to compare the Decision Tree algorithm and Backpropagation algorithm. The study had resulted in understanding of which datasets performed well over the both algorithms. The accuracy of classification and the performance of algorithm was analysed under the following parameters:

Classification Accuracy:

Percentage of how accurately the datasets are classified for a particular classification algorithm.

Kappa Statistic:

The value which is used to assess the rate of classification of categorical data is called as Kappa Statistic.

Root Mean Squared Error and Mean Absolute Error:

With these two values, we can find the average of error and assume the measure of accuracy. If both the error values are more, then the accuracy is less and if the error values are less, the accuracy is more.

ALGORITHMS AND WEKA TOOL:

I have made use of a tool called Weka [1] which has got all the set of algorithms in it. This tool was very useful, if I wanted to test an algorithm over a given dataset. The version of tool which I used for conducting the comparative study was Weka 3.

For performing Decision Tree algorithm in Weka, I used J48. While using J48, I had an option to use with pruning and without pruned. But I have selected to analyse using Pruned J48 for Decision Tree algorithms. To assess the performance of the Backpropagation algorithm on Weka tool, I selected Multilayer Perceptron. Both the J48 and Multilayer Perceptron algorithm is included in Weka 3.6 Package which is free to download from the [1] website.

TESTING PROCEDURE:

The tests on datasets were performed over the two algorithms, Decision Tree and Backpropagation algorithm. When I started the experiment, I had to fix some parameters as constants throughout the test process for all the data. The main parameters for Decision Tree algorithm which had to be same for all datasets are:

Confidence Factor : 0.25

minNumObj : 2

numFolds : 3

All the three parameters are set as default values and we are not going to change them. Confidence factor is set to reduce the problem on overfitting and the next two parameters are the number of instances and the number of folds. They are set to reduce the error pruning. And I have performed the Pruned J48 for evaluating the decision tree.

In Backpropagation algorithm also I had fixed some parameters as default before experimenting. They are given below:

Learning Rate : 0.3

Momentum : 0.2

Training Time : 500

Validation Set Size : 0

Validation Threshold : 20

For both the algorithms, I have done the procedure using k-fold cross validation. Cross validation is nothing but dividing the data into k number (k=10 in our case) of subsets of data. In some datasets, I had considered the values of F-Measure and ROC Area for comparisons.

DATASETS:

The datasets used for conducting this study are real valued categorical datasets. As it is a comparison of two different algorithms, I tried to choose more than five datasets and ended up with eight datasets finally. All the datasets are chosen from the UCI machine learning repository [2] and they don't have any missing values. After the datasets were chosen, I converted all of them in ARFF format [3], rather than using the direct link in Weka. The characteristics of datasets are Multivariate and the basic task of them is Classification.

The information about the datasets is tabulated below:

Dataset Name

Number of

Instances

Number of Attributes

Number of Classes

Missing Values

Ecoli

336

8

8

No

Glass Identification

214

10

7

No

Ionosphere

351

34

2

No

Iris Plant

150

4

3

No

Magic Gamma Telescope

19020

11

2

No

Image Segmentation

2310

19

7

No

Sonar - Mines vs. Rocks

208

60

2

No

Blood Transfusion

748

5

2

No

TEST RESULTS:

The results of the datasets on algorithms are tabulated below.

Name

Correctly Classified Instance

Kappa Statistic

Mean Absolute Error

Root Mean Squared Error

Ecoli

86.0119%

0.8066

0.0484

0.1704

Glass Identification

96.2617%

0.9492

0.0196

0.0946

Ionosphere

91.1681%

0.7993

0.0938

0.2786

Iris

Plant

97.3333%

0.96

0.0327

0.1291

Magic Gamma Telescope

85.8728 %

0.6776

0.1934

0.327

Image Segmentation

96.0606%

0.954

0.0159

0.097

Sonar - Mines vs. Rocks

82.2115%

0.6419

0.1901

0.3964

Blood Transfusion

78.2086%

0.2844

0.2958

0.3931

The table above is the results of eight datasets on Multilayer Perceptron algorithm. The result of the datasets on pruned J48 is given below:

Name

Correctly Classified Instance

Kappa Statistic

Mean Absolute Error

Root Mean Squared Error

Ecoli

84.2262%

0.7824

0.0486

0.1851

Glass Identification

96.729%

0.9557

0.0093

0.0967

Ionosphere

91.453%

0.8096

0.0938

0.2901

Iris

Plant

96%

0.94

0.035

0.1586

Magic Gamma Telescope

85.0578%

0.6614

0.1955

0.3509

Image Segmentation

96.9264%

0.9641

0.0104

0.0914

Sonar - Mines vs. Rocks

71.1538%

0.422

0.2863

0.5207

Blood Transfusion

77.8075%

0.3424

0.3037

0.3987

A measure [4] of dataset between the categorization of predicted and the observed is called as Kappa Statistic. If the value predicted and observed are same, then the Kappa Statistic value is equal to 1. With RMS error and Mean Absolute error, the average error value will be found. If both the error values are more, then the accuracy is less and vice versa. Correctly Classified Instance is the percentage of instance which has been classified correctly.

EVALUATING THE RESULTS:

From the results, we can see that both the algorithm have classified the datasets according to the number of instances and the number of attributes. The accuracy and performance of the both the algorithms are not the same in all the datasets and the variations are more. So, I have divided this section into eight parts discussing the accuracy and performance of the given datasets.

ECOLI:

The ecoli dataset has got 336 instances with 8 attributes. The accuracy of classifying the instances was good in MLP with 86.01% than the J48 which classified 84.23% of instances correctly.

Name

MLP

J48

Correctly Classified Instance

86.0119%

84.2262%

Kappa Statistic

0.8066

0.7824

Mean Absolute Error

0.0484

0.0486

Root Mean Squared Error

0.1704

0.1851

It was found that, out of eight classes, the True Positive rate and False Positive rate for last two classes was 0 in both algorithm outputs. Though the weighted averages of ROC area in MLP is higher than J48. Kappa statistic value was also good in MLP. Only drawback of Multilayer Perceptron over this dataset is that the time taken for training dataset is more than the time taken by J48, otherwise it is good in terms of accuracy and performance.

GLASS IDENTIFICATION:

Both the algorithms has classified the instances correctly with J48 performing slightly better than Multilayer Perceptron. The output is tabulated below for the ease of references. The root mean squared error was almost same in both algorithms but the absolute error from MLP was double the value of J48.

This training dataset has achieved a greater value of kappa statistic from both algorithms and it is noted that the average ROC area of Backpropagation algorithm is 0.991, which is also a parameter to check the classification ratio. If it is closer to 1.0, then it is assumed to be good.

Name

MLP

J48

Correctly Classified Instance

96.2617%

96.729%

Kappa Statistic

0.9492

0.9557

Mean Absolute Error

0.0196

0.0093

Root Mean Squared Error

0.0946

0.0967

The above given table is data related to Glass Identification dataset on two algorithms. On overall from this data, I assume that Decision Tree has performed well only with a little difference to Backpropagation algorithm.

IONOSPHERE:

The mean absolute error value obtained from MLP and Pruned J48 is the same and the accuracy of classification in J48 is slightly more than that of MLP. But when analysing the RMS error, I found that it is higher in Pruned J48.

Name

MLP

J48

Correctly Classified Instance

91.1681%

91.453%

Kappa Statistic

0.7993

0.8096

Mean Absolute Error

0.0938

0.0938

Root Mean Squared Error

0.2738

0.2901

This dataset has only two classes and from the confusion matrix, we can easily identify that difference between two algorithms in terms of instances is one. MLP has misclassified an extra instance than the Pruned J48. Below given is the confusion matrix for Ionosphere dataset.

Multilayer Perceptron:

a b <-- classified as

98 28 | a = b

3 222 | b = g

Pruned J48:

a b <-- classified as

104 22 | a = b

8 217 | b = g

For the Ionosphere dataset, I consider that Decision Tree algorithm was better, because MLP had some disadvantages in performance and accuracy on this dataset.

IRIS:

Name

MLP

J48

Correctly Classified Instance

97.3333%

96%

Kappa Statistic

0.96

0.94

Mean Absolute Error

0.0327

0.035

Root Mean Squared Error

0.1291

0.1586The Iris dataset consists of three classes and four attributes. This datasets achieved a high classification rate in both algorithms. On comparing both algorithms, the classifying rate in Multilayer Perceptron was better than the values of Decision Tree algorithm. The mean absolute error and RMS error was very low and the kappa statistic value was high enough in both algorithmic outputs. Although they were very similar in performance, I predicate that MLP was better than Pruned J48 in Iris Dataset.

MAGIC GAMMA TELESCOPE:

Out of eight datasets I had chosen, this one has got the highest number of instances with eleven attributes and two classes. From this dataset output, it is possible for us to get the accurate values of classification rate and also the performance.

Name

MLP

J48

Correctly Classified Instance

85.8728%

85.0578%

Kappa Statistic

0.6776

0.6614

Mean Absolute Error

0.1934

0.1955

Root Mean Squared Error

0.327

0.3509

The MLP has outdone the Decision Tree algorithm in terms of all the parameter. Percentage of Classification and Kappa Statistic value is more in MLP when compared to Pruned J48. The Mean Absolute Error and RMS Error was more in both the outputs, but, on observing, MLP was better than the J48.

IMAGE SEGMENTATION:

The instances have been classified at a higher percentage in both algorithms and the Kappa Statistic is also near the 1.0 value. On examining the best one, it was found that J48 was more suitable than the MLP. But when I checked the weighted averages of ROC Area [5] which is usually represented as a graph between True Positives rate and False Positive rate, it is noted that the value got from MLP was 0.995 and the ROC of J48 was 0.988.

Name

MLP

J48

Correctly Classified Instance

96.0606%

96.9264%

Kappa Statistic

0.954

0.9641

Mean Absolute Error

0.0159

0.0104

Root Mean Squared Error

0.097

0.0914

Usually ROC Area of 1.0 is considered to be a perfect test. So, in terms of ROC, MLP was good. But on overall, it is assumed that the J48 has performed well on datasets.

SONAR - MINES VS ROCKS:

Sonar dataset have 208 instances, 60 attributes and 2 classes. MLP exhibited well in this dataset, but the classification rate of instances and Kappa Statistic was moderate. The Kappa Statistic of J48 was less than 0.5 whereas the same on MLP was better achieving more than 0.5 (although it is very low).

Name

MLP

J48

Correctly Classified Instance

82.2115%

71.1538%

Kappa Statistic

0.6419

0.422

Mean Absolute Error

0.1901

0.2863

Root Mean Squared Error

0.3964

0.5207

This dataset has resulted in high error values from both algorithmic values. But on comparing both algorithms, for this dataset, I would conclude that MLP is good.

BLOOD TRANSFUSION:

The Blood Transfusion datasets, which has 748 instances, 5 attributes and 2 classes, achieved a classification accuracy of less than 80% from both the algorithms.

Name

MLP

J48

Correctly Classified Instance

78.2086%

77.8075%

Kappa Statistic

0.2844

0.3424

Mean Absolute Error

0.2958

0.3037

Root Mean Squared Error

0.3931

0.3987

Out of two, MLP was better at 78.2086% than J48 which was just 0.5% lesser than the former. Kappa Statistic of was also very low with high errors in the both outputs. Even if the kappa statistic is not very high, it will usually be around 0.8 or 0.7. But in our case, it is very low in value from both outputs. Although the performance of MLP and J48 was somewhat same, on overall comparison, MLP was better than Pruned J48.

CONCLUSION:

I chose datasets in a way that they had different ranges of instances, classes and attributes. And it made me to get the outputs of accuracy and performance of datasets with various properties. On analysing the results which was tabulated, it is seen than, five out of eight datasets was good in classifying, using MLP. And three datasets was good in classifying using Pruned J48.

We cannot come to a conclusion that, MLP is the best classifier and J48 is not a good one, because MLP also had some of its own disadvantages. When I was separating the data according to the datasets, I noted the training time taken for each process. Finally when I went through that data, I understood that the training time taken is more in MLP than Pruned J48. On most of the training data, J48 takes only a less amount of time, whereas the MLP takes around 5 to 8 times of what Pruned J48 takes.

I would like to come to a conclusion that the Multilayer Perceptron was good in our datasets and the Pruned J48 also performed well with only a minor difference in accuracy and performance from Multilayer Perceptron.