Genetic Algorithm For Feature Selection Of Mr Brain Biology Essay

Published: November 2, 2015 Words: 2996

The selection of features has a considerable impact on the success or failure of classification process. Feature selection refers to the procedure of selecting a subset of informative attributes to build models describing data. The main purpose of feature selection is to reduce the number of features used in classification while maintaining high classification accuracy. A large number of algorithms have been proposed for feature subset selection. Here we compare classical sequential methods with the genetic approach in terms of the number of features, classification accuracy and reduction rate. Genetic algorithm achieves an acceptable classification accuracy with only five of the available 44 features. The optimal feature such as mean of contrast, mean of homogeneity, mean of sum average, mean of sum variance and range of autocorrelation provide best classification performance. Similar classification performance is obtained with SFFS and SFBS but with larger feature set.

KEY WORDS

Feature selection; classification accuracy; reduction rate; genetic algorithm.

1. Introduction

Texture is a fundamental property in many natural images. It plays an important role in pattern recognition and computer vision. Also it helps our visual perception system to understand a scene better and to extract more information from it. In the last few decades [1, 2, 3] texture analysis is investigated by research. Texture analysis proves to be important in many fields such as industrial impaction, document segmentation, remote sensing of each resources and medical imaging.

Texture classification and texture segmentation arise as two major texture analysis problems. If a textured image is segmented it doesn't achieve satisfactory results. The failure to get the desired segmentation results for the textured images is explained by the fact that the distance from which we observe and the visual attention that we give to a texture gives us different interpretations of the texture. Many researchers have tried to solve the problem of texture segmentation by using different features selection techniques such as directional grey-level energy and different segmentation methods like region growing.

In texture classification, image textures are categorized into different classes and observed image will be determined to belong to one of the given set of texture classes.

Researchers employed various ways to solve the problem of texture classification such as co-occurrence matrix [4], an analysis of the image spectrum using Wigner distribution and Gabor filter, etc. [5], fractals [6], rotational invariant operators [6], the extraction of the structural and stochastic components of an image [7], model-based method and so on.

Due to the large numbers of features, feature selection procedure seems to be a key to classification problem. This procedure reduces the number of features that need to be collected and provides a better classification accuracy. Feature selection refers to algorithms that output a subset of the input feature set since it is unfeasible to measure all possible features and it is unknown in advance which features will provide the best discrimination between classes. More discriminatory features are kept, retaining sufficient information and maintaining an acceptable classification accuracy. By reducing the number of features, feature selection improves classification speed and efficiency. Genetic algorithm appears as a seductive approach to find optimal solution to optimization problem and to offer an alternative to heuristic tree search methods.

Feature selection methods use various feature selection criteria. Among these criteria first statistics such as T-statistics, F-statistics. Also information-theoretic criteria such as mutual information and entropy-based measure. Since these criteria are diverse they produce substantially different outcomes when applied to same data set.

The goal of this paper is to highlight the value of feature selection in having a considerable effect on the effectiveness of the resulting classification algorithm. Several algorithms are implemented. A taxonomy of feature selection algorithms is presented to show rank of genetic algorithm.

The outline of this paper is organized as follows. In section 2 we will introduce our method based on feature extraction, feature selection and classification. Different feature selection algorithms are presented in the following section. Section 4 we focus our interest on genetic algorithm. Section 5 compares our method using genetic algorithm with other feature selection algorithms with experimental results. Conclusions are presented in the final section.

2. Our Approach

We have developed an automated brain tumor classification method that is made up of three phases: features extraction, dimensionality reduction and finally SVM training and testing. The whole process is shown in the diagram of Figure 1. Our whole approach aims at classifying MR brain images into normal, benign tumor and malignant tumor tissue. To achieve this goal, texture features need to be extracted by using Spatial Gray Level Dependence Method (SGLDM) approach [4]. Having as input extracted features the dimensionality reduction is applied. This step is ensured by GA [8]. The out put of this phase is a selected set of features used as entries to the classification step which will be done via training and testing techniques. These tools enable us to distinguish between the brain tissue, benign tumor and malignant tumor tissue. Among all classification methods we adopt the Support Vector Machines (SVM) as a classifier. Our implementation of SVM is explained by its robustness, its capacity to handle very large feature spaces and to have good generalization properties compared to conventional classifiers.

Features selection

Features extraction

Training and testing phase

SVM model

Classification step

Wavelet Transform by

SGLDM

Dimensionality reduction via GA

Results

3. Feature Selection Algorithms

3.1 Definition

Feature selection refers to algorithms that output a subset of the input feature set. Let us consider Y the original set of features, with cardinality n. Let d represent the desired number of features in the selected subset X, XY. J(X) denotes the feature selection criterion function for the set X. To indicate a better feature subset we consider a higher value of J. Since we are maximizing J(.), (1-pe) is one possible criterion function, where pe denotes the probability of error. The use of probability of error as a criterion function makes feature selection dependent on the specific classifier used and the size of the training and test data sets.

Formally, the problem of feature selection is to find a subset XY such that |X| = d and this is shown in equation (1).

J(X) =

(1)

To solve this problem we need to examine all possible d-subsets of the feature set Y. The exponential growth of the number of possibilities makes exhaustive search impractical for even moderate values of n.

.

3.2 Overview of Feature Selection Algorithms

Feature selection algorithms are divided into two main categories: statistical pattern recognition (SPR) techniques, and artificial neural networks (ANN). The SPR category is then split into those guaranteed to find the optimal solution and those that may result in a suboptimal feature set. The suboptimal methods comprise methods that store just one "current" feature subset and make modifications to it, and other methods that maintain a population of subsets. Also algorithms may be deterministic, producing the same subset on a given problem every time, and may be stochastic having a random element which could produce different subsets on every run. Some representative feature selection algorithms are listed beneath each leaf node in the tree (figure2).

Figure.1. Overview of our approach

Feature selection

SPR

ANN

Node pruning

Optimal

Exhaustive search branch-and-bound

Suboptimal

Single solution

Many solutions

Deterministic

PTA (l, r)

Floating

Max-Min

Stochastic

SA

Deterministic beam search

Stochastic

GA

Figure 2.A taxonomy of feature selection algorithms

3.2.1 Suboptimal Methods

These methods are not guaranteed to produce the optimal result as they don't examine all possible subsets. They include deterministic, Single-Solution Methods and deterministic, stochastic Multiple-Solution Methods.

3.2.1.1 Deterministic, Single-Solution Methods

They are the most commonly used methods for performing selection. Being referred to as sequential method, deterministic single solution method start with a single solution and iteratively add or remove features until some termination criterion is met. They are split into those that start with the full set and delete features. Kittler [9] compares these algorithms with the optimal branch-and-bound algorithm by applying a synthetic two-class Gaussian data set. Pudil et al [10] modify Kittler's comparative study by introducing sequential floating forward selection (SFFS) and sequential floating backward selection (SFBS).

3.2.1.2 Deterministic, Multiple-Solution Methods

They are referred to as "feature selection lattice" since they treat the space of subsets as a graph. Siedlecki and Sklansky [11] have discussed the performance of "beam search" and a best-first search in the space of feature subsets and induced that both methods maintain a queue of possible solutions.

3.2.1.3 Stochastic, Multiple-Solution Methods

Among Stochastic, Multiple-Solution Methods genetic algorithm. GA is an evolutionary method inspired by the natural process of evolutional. It allows a randomised search guided by a certain fitness measure. A feature subset is identified by a particular binary string of length n, with a zero or one in position i denoting the absence or presence of feature i in the set. In each iteration of the algorithm (generation), a fixed number (population) of possible solutions (chromosomes) is generated by means of applying certain "genetic" operators in a stochastic process guided by a fitness measure. Each chromosome is evaluated to determine its "fitness". New chromosomes are created from old chromosomes by the processes of recombination, crossover and mutation which represent the most important genetic operators. Siedlecki and Sklansky [11] introduced the use of genetic algorithms (GA) for feature selection.

3.2.2 Optimal Methods

Among optimal method brand-and-bound (BB) feature selection algorithm. Narendra and Fukunaga [12] introduced this algorithm to find the optimal subset of features much more quickly than exhaustive search. Yu and Yuan [13] modified Narendra and Fukunaga's branch and bound algorithm and introduced BAB+. They showed that BAB+ outperforms the original algorithm both analytically and experimentally. Their modification essentially recognizes all "string-structure subtrees".

4. Genetic Algorithm

Genetic algorithms are stochastic global adaptive search techniques based on the mechanisms of natural selection. GAs comprise a subset of Darwinian evolution-based optimisation techniques. They are introduced by Siedlecki and Sklansky [11] and later expanded to allow linear feature extraction by Punch et al. [14] and independently by Kelly and Davis [15].

Let d be a set of dimensional input patterns, m a dimensional space and (m < d). GA seeks to find a transformed set of patterns in m that maximizes a set of optimization criteria.

A population of competing feature transformation matrices is maintained by GA. The input patterns are multiplied by the matrix in order to evaluate each matrix in this population. The evaluation of matrices produces a set of transformed patterns which are then sent to a classifier. The available samples are divided into a training set, used to train the classifier, and a testing set, used to evaluate classification accuracy. Then GA, being a measure of the quality of the transformation matrix, measures the accuracy obtained. Exploiting this information, the GA yields for a transformation that decreases the dimensionality of the transformed patterns while increasing classification accuracy. GA is identified by a particular method of coding the features into string of D bits, where =1 if the feature i is in the subset and included in the classifier; or=0 otherwise. Each resulting subset of features is evaluated according to its classification accuracy on a set of test data [16]. Fitness value is allocated based on a performance measure in classification, e.g. the classification accuracy. The fitness function is represented in the equation (2).

(2)

Where is the weight of accuracy and is the weight of N feature participated in classification where N.

Feature reduction aims at improving classification by searching for the best features subset, from the fixed set of the original features, according to a given processing goal and a feature evaluation criterion: classification accuracy. Accurate classification performance of a particular algorithm is of much more importance than its execution time.

The effectiveness of the GA in producing parsimonious feature sets was evaluated and compared with several feature selection methods for each dataset. Ferri et al. [17] compared SFS, SFFS, and the genetic algorithm methods on data sets with up to 360 dimensions. They induce that SFFS gives good performance even on very high-dimensional problems. They showed that the performance of GA, while comparable to SFFS on medium-sized problems (around 20-30 dimensions), degrades as the dimensionality increases.

Siedlecki and Sklansky compared the GA approach with SFFS and SFBS, and with the branch and bound algorithm which is able to work with a nonmonotonic criterion. They showed the GA outperformed these other feature selection methods in terms of both classification performance and computational effort on a synthetic 24-dimensional data set as well as on a real 30-dimensional data set. Using a population size of 100 for fifteen generations, with a probability of mutation pm = 0.02, Jain et al [18] tried several different settings of the feasibility threshold (ranging from 0.1 to 0.4) and the tolerance margin (ranging from 0.05 to 1.0). The best result was obtained with threshold t = 0.1 and margin m = 0.05: an 8-element subset giving a recognition rate of 78.9%. They showed GA displays a tendency towards premature convergence most runs reached their peak by the seventh or eighth generation and failed to make further improvements after that.

5. Experimental results

In order to test the efficiency of our classification method, we apply it on a real data base relative to 83 images of seven patients (four men and three women) collected from the Harvard Medical School website [19]. The images consist of 29 images normal, 22 malignant tumors suffering from a low grade Glioma, Meningioma and 32 benign tumors suffering from a Bronchogenic Carcinoma, Glioblastoma multiform, Sarcoma and Grade IV tumors. These normal and pathological benchmark images used for classification, are axial, T2-weighted of 256256 sizes and acquired at several positions of the transaxial planes. Patients' ages are ranged from 22 to 81 years.

To discriminate between normal and abnormal brain we have chosen 12 normal brain images and 20 abnormal images in the training phase and the remaining slices are used for testing. To distinguish between normal, benign and malign tumors we have taken 12 normal brain images, 9 benign and 16 malign tumor images in the training phase and the lasting images are employed for testing. Then the SVM is implemented in testing phase. In testing phase, each feature vector, corresponding to a test image, is individually input to the SVM classifier, which produces a continuous output.

In our approach we have applied Daubechies-2 level 1 wavelet co-occurrence in feature extraction phase [20] which enables us to extract a total of 44 features including the mean and the range. Then we employ SVM classifier especially RBF kernel [21]. Cross-validation method, as a grid search with 5 folders, is used to fix its optimal parameters and to search the best parameters among an interval of values which achieve a high accuracy during training and testing phases. Hence, the values of and are 8 and 2, respectively as the best parameters to apply in our implementation. The efficiency of the classier is tested for 44 extracted features and is found to be 100% for normal and pathological brain [22]. The classification of benign and malignant brain tumors yields an accuracy rate of 98.14%.

Table 1. Results of feature selection by sequential search algorithms (forward and backward) and genetic algorithm

Feature selection

Number of features

Classifier accuracy by normal and pathological brain

Classifier accuracy by benign and malignant brain tumors

Percentage reduction

SFFS

7

100%

98.14%

84.09%

SFBS

11

100%

98.14%

75%

GA

7

100%

98.14%

84.09%

6

100%

98.14%

86.36%

5

100%

98.14%

88.63%

Table 1 shows the performance of various feature selection algorithms. In GA-based feature selection method, a population of 30 chromosomes is randomly generated. Each chromosome contains 44 genes (One gene for each feature). The genetic operators, one point crossover and mutation are used. The crossover rate is 90% and mutation rate is 10%. Tournoi selection method is used to select the mating pool. In the case of classification of normal and pathological brain, the sequential floating forward selection (SFFS) method achieves good classification results of 100% with 7 of the available 44 features. This accuracy is similar to that obtained by the sequential floating backward selection (SFBS) method but by using 11 features. The optimal feature set found using GA requires only 5 features to obtain a classification accuracy of 100%. The feature size is reduced by 88.63%. The classifier accuracy for different feature set sizes for the feature selection methods is shown in table 1. For the identification of benign and malignant brain tumors the three selection methods achieve the same classification accuracy 98.14% but they differ in the size of features used. Table 1 displays GA is the best selection method as it uses the least number of features, such as mean of contrast, mean of homogeneity, mean of sum average, mean of sum variance and range of autocorrelation [22]. GA reduces the feature size by 88.63% without compromising the accuracy of the classifier.

6. Conclusion

In this study we have tried to classify normal, benign and malignant brain tumors using texture analysis for classification. In order to validate our work we test various feature selection algorithms: SFFS, SFBS and GA. Although SFFS and SFBS achieve high classification accuracy, they don't actually reduce the number of features that must be measured for classification.

The experimental results indicate that SFFS is the best among the sequential floating algorithms. The optimal subset of features found using GA reveal that the classification accuracy can be obtained with only five wavelet co-occurrence features. The feature size is reduced by 88.63% using GA. The optimal features such as mean of contrast, mean of homogeneity, mean of sum average, mean of sum variance and range of autocorrelation are selected by GA and they are found to be highly discriminant features. The experiments show that GA is the best feature selection method followed by SFFS and SFBS for the classification of brain tumors in MR images.

Acknowledgement(s)

We would like to thank Mrs Ines Kallel for her helpful review of the manuscript.