Recurrence Quantification Analysis For The Automated Identification Engineering Essay

Published: November 21, 2015 Words: 3722

Epilepsy is a disorder of the brain electrophysiology that causes recurrent convulsions or seizures.9 The nerve cells in the brain send out excessive electrical impulses leading to seizures. Epilepsy may be due to illness, brain injury, and imbalance of nerve signaling chemicals or abnormal development of brain.

The estimated proportion of the general population with active epilepsy (i.e. continuing seizures or the need for treatment) at a given time is between 4 and 10 per 1,000 people globally. However, some studies in developing countries suggest that the proportion is between 6 and 10 per 1,000. Around 50 million people in the world have epilepsy. In developed countries, annual new cases are between 40 and 70 per 100 000 people in the general population. In developing countries, this figure is often close to twice as high due to the higher risk of experiencing conditions that can lead to permanent brain damage. Close to 90% of epilepsy cases worldwide are found in developing regions (Website: http://www.who.int/mediacentre/factsheets/fs999/en/index.html.)

Epilepsy is generally evaluated using the Electroencephalogram (EEG) test in which electrodes are placed on the affected area on the brain (scalp region) and brain signals are recorded and analyzed. The EEG signals are complex, non-linear, non-stationary and random in nature. These signals are due to the complex interconnection of billions of neurons. It is very difficult to understand the behavior of such complex signals using linear methods or by manual inspection. Hence, using chaos theory, the nonlinear features can be extracted from these signals to study the complexities existing in the time series. Many nonlinear dynamical methods have been applied to EEG signals.2, 9, 24, 26, 40

A Recurrence Plot (RP) is an advanced technique for nonlinear analysis. These plots are used to visualize the recurrence of states xi in phase space. Higher dimensional phase spaces can be seen by projecting into two or three dimensional spaces. An RP is an array of dots in an N x N square, where a dot is placed at (i, j) whenever x(j) is very close to x(i). More information on recurrence plots can be found in the seminal paper by Eckmann et al. 11

Recurrence Quantification Analysis (RQA) is a nonlinear data analysis method used to quantify the RPs in order to explore the dynamical systems. It quantifies the number and duration of recurrences of a dynamical system presented by its phase space trajectory.34 They are actually measures of complexity and provide useful information for short and non-stationary data. They are developed to understand the behavior of the dynamical systems. 11, 27, 41

Several studies have used RQA parameters in EEG signal analysis. Among other non-linear parameters, RP based features extracted from EEG signals were used to quantify the cortical function at different sleep stages. 2 Different unique RPs have been proposed for different sleep stages. In another sleep related study, RQA was used to discriminate sleep stages and to characterize the different behaviors of sleep EEGs in patients having sleep apnea syndrome.37 The depth of anesthesia was assessed using RQA features from EEG signals. 22 Four recurrence plot based measures were extracted and fed to an Artificial Neural Network (ANN), and the system was able to correctly classify responses to incision with an average accuracy of 87.76%.

The onset of epilepsy was predicted using seven RQA measures extracted from raw EEG signals that were fed to a four layer ANN. An accuracy of around 58.3 to 70% was achieved in different EEG leads.42 An automated technique using RQA was used to detect pre-ictal phase in rat EEG signals.31Their results show that the RQA method can be used to predict the epileptic seizures.

The rest of the paper is structured as follows. The details of EEG data used in this work are given in Section 2.1. The RQA parameters extracted from the EEG signals are described in Section 2.2. Section 3 briefly explains about the different classifiers used in this work. The results and discussion are presented in Sections 4 and 5 respectively. Conclusions are given in Section 6.

Data and Methods

EEG Data

The EEG data used in this study was taken from the artifact free EEG time series data available at the Department of Epileptology, University of Bonn (Website: http://www.meb.uni-bonn.de/epileptologie/science/physik/eegdataold.html

) More information about the data can be found in the paper by Andrzejak et al. 6 The data were taken from five healthy subjects and five epileptic patients. 100 sets of data in each of the 3 categories: normal (data taken from healthy subjects), pre-ictal/background and ictal/epileptic were selected. The duration of each data was 23.6 seconds. The signals were recorded from intracranial electrodes that were placed on the correct epileptogenic zone.6 The EEG signals were obtained using standard international 10-20 systems. 128-channel amplifier system was used to record the EEG signals and the data were digitized with a sampling rate of 173.61 Hz and 12-bit A/D resolution and filtered using a 0.53~40Hz (12 dB/octave) band pass filter. Typical normal, pre-ictal and epileptic (seizure) EEG signals are illustrated in Fig.1.

(a)

(b)

(c)

Fig.1. EEG signals (a) normal (b) pre-ictal and (c) ictal.

Methods

In this work, RQA parameters were extracted from the EEG signals. These RQA parameters11, 27, 41, which are various measures of the complexity of the EEG signals, are briefly described in this section. An RP can be mathematically represented as

(1)

Where: N: total number of considered states

xi: considered states

: threshold distance

||.||: norm

: Heaviside function.

Recurrence Rate (RR): It indicates the density of recurrence points in a recurrence plot. It is given by

(2)

Determinism (DET): It is the fraction of recurrence points forming diagonal lines. These lines represent epochs of similar time evolution of states of the system. Hence, DET indicates the predictability of the system. It is given by

(3)

Where lmin is the length of the minimal diagonal line, P(l) is the frequency distribution of the lengths l of the diagonal lines.

Mean diagonal line length: This parameter quantifies the mean prediction time or the inverse of the divergence of the system. It is defined as

(4)

Entropy: It measures the complexity of the recurrence structure. It is given by

(5)

Laminarity (LAM): LAM is the fraction of recurrence points that form vertical lines and it corresponds to the amount of laminar states in the system. It is given by

(6)

Where P(v) is the histogram of the length v of the vertical lines.

Trapping time (TT): TT indicates the mean length of the vertical lines. It measures the mean time that the system is trapped in one state or changes very slowly. It is given by

(7)

Longest vertical line: It is the length of the longest vertical line and is given by

(8)

Longest diagonal line: It is the length of the longest diagonal line and is given by

(9)

Recurrence times of the 1st and 2nd Poincare recurrence points are given by:

(10)

(11)

Classification

Classifiers used

In this work, we have used seven classifiers namely Support Vector Machine (SVM) , Gaussian Mixture Model (GMM), Fuzzy Sugeno Classifier, K-Nearest Neighbor (KNN), Naive Bayes Classifier (NBC), Decision Tree (DT), and Probabilistic Neural Network (PNN). They are briefly explained below.

Support Vector Machine (SVM)

SVM is a supervised learning technique used for classification.21 It constructs a separating hyperplane that maximizes the margin between the input data classes which are viewed in an n-dimensional space (n is the number of features used as inputs). Two parallel hyperplanes are constructed to calculate the margin from training data, one on each side of the separating hyperplane.

Gaussian Mixture Model (GMM)

GMM is a probabilistic model for density estimation using a mixture distribution. It uses unsupervised learning technique, and for the input features, it refines the weights of each distribution through expectation-maximization algorithms. 21

Fuzzy Sugeno Classifier

In this work, subtractive clustering technique was used to generate a Fuzzy Inference System (FIS). 21 Clustering technique estimates the number of clusters and cluster centers in the examined dataset. An FIS comprises of inputs, outputs, and a set of rules that define the behavior of the fuzzy system. Each input and output has as many membership functions as the number of clusters. Radius parameter is used to indicate a cluster centre's range of influence in each of the data dimensions. An FIS structure containing a set of fuzzy rules that cover the feature space is generated after the training. This is used to perform fuzzy inference calculations of the test data.

K-Nearest Neighbor (KNN)

KNN is a supervised learning algorithm..For analyzing a new test instance, K number of training points closest to the test data are evaluated. The class that is most common amongst its K nearest neighbours is assigned as the class to the new test data. Thus, algorithm uses neighborhood classification for predicting the class of the new test data.

Naïve Bayes Classifier (NBC)

NBC is a probabilistic classifier based on Bayes theorem. NBS strongly assumes that the predictor variables are independent random variables.21 This assumption helps it to compute probabilities required by the Bayes formula from even a small training data.

Decision Tree (DT)

In the case of decision trees, the input features are used to construct a tree. The formed tree produces a series of rules that can be used to recognize the class of a test data.

Probabilistic Neural Network (PNN)

PNN is a two-layer radial basis network used for classification. 21 The first layer is the radial basis layer, which computes distances from the input vector to the training input vectors and yields a distance vector. The second layer is the competitive layer that sums these contributions for each input classes and produces a vector of probabilities as its output. The 'compete' transfer function at the output of the second layer selects the maximum of these probabilities, and assigns a 1 for that selected class and a 0 for the other classes.

Classification methodology

As indicated earlier, a total of 300 EEG time series data (100 in each group), each of 23.6 second duration, were used. The ten RQA features that were extracted from these data were used to form a reduced dataset of the original EEG data. This reduced dataset was used to test the seven classifiers using ten-fold stratified cross validation. The data was split into ten parts such that each part contains approximately the same proportion of class samples as the original dataset. Nine parts of the data were used for training the classifier and remaining one part for testing. This procedure was repeated ten times using different part for testing in each case. Sensitivity, specificity, and accuracy were calculated for all ten folds and the average values were taken as the actual estimate of the classifier performance.

Results

Figure 2 shows the typical RPs for normal, pre-ictal and ictal signals. The delay and embedding dimension used for generating the RPs are 1 and 10 respectively.23 These figures indicate the dynamic behavior of the signal. It can be seen from Figure 2(a) that the RP of normal EEG looks more random, and the dots are scattered throughout the plot. Figure 2(b) and 2(c) show some periodicity in the plot. Also, more rhythmic spikes can be seen in the RPs of epilepsy and pre-ictal signals. These plots are unique for each class. Hence, they can be used to identify the unknown class easily even for a short data.

(a)

(b)

(c)

Fig.2. Recurrence plots (RP) of EEG signals (a) normal (b) pre-ictal (c) ictal. Figure 2(a) indicates that the RP of normal EEG looks more random and dots are scattered throughout the plot. The RPs of pre-ictal and ictal signals (Figures 2(b) and 2(c)) show some periodicity in the plot.

Table 1 shows the range of RQA features for normal, pre-ictal, and ictal classes. It can be seen from the table that all the ten features are clinically significant indicated by the low p-values. Table 2 shows the results of accuracy, sensitivity, specificity, and positive predictive accuracy for the seven classifiers. Our results show that the SVM and Fuzzy classifiers perform better than the other classifiers.

Table 1 Results of RQA features for normal, pre-ictal, and ictal EEG classes.

Features

Normal

Pre-ictal

Ictal

p-value

RR

0.0575808±0.004212

0.0617364

±0.01391

0.0671869 ±0.008863

< .0001

DET

0.26333 ±0.04178

0.48499 ±0.127

0.47018 ±0.109

< .0001

<L>

2.3101 ±0.112

2.6712 ±0.668

3.0637 ±0.417

< .0001

Lmax

7.685 ±1.5

17.425 ±26.6

30.33 ±16.1

< .0001

ENTR

0.70126 ±0.156

1.0463 ±0.317

1.373 ±0.238

< .0001

LAM

0.35446 ±0.05464

0.61359 ±0.124

0.57155 ±0.134

< .0001

TT

2.402 ±0.126

2.9401 ±0.964

3.2156 ±0.535

< .0001

Vmax

7.195 ±1.62

12.47 ±8.97

15.97 ±7

< .0001

T1

16.383 ±1.21

15.159 ±2.15

14.294 ±1.78

< .0001

T2

20.762 ±1.32

26.371 ±4.57

24.509 ±4.31

< .0001

Table 2 Results of sensitivity, specificity, accuracy and positive predictive accuracy for different classifiers.

Classifiers

Accuracy (%)

PPV

(%)

Sensitivity (%)

Specificity (%)

SVM

94.4 ±2.41

96.3±3.06

97.7 ±1.53

94.7 ±3.51

GMM

89.3±2.94

90.3±5.13

99.0±0.00

87.3±6.11

Fuzzy

94.4±3.16

96.3±3.06

97.7 ±1.53

94.7 ±3.51

KNN

92.4 ±3.89

94.7 ±2.08

95.3±4.62

92.3±2.31

NBC

78.4±4.83

85.0±5.62

98.7±2.31

81.7±6.13

DT

90.9±2.54

95.0±1.73

96.0±4.36

92.7 ±1.53

PNN

91.8±5.04

94.7 ±2.08

95.0±5.20

92.3±2.31

We have developed a simple Graphical User Interface (GUI) (Figure 3) to visualize the RP and also to diagnose the unknown class of the EEG signal. A Browse button is provided on the top left corner of the GUI to select and display the unclassified EEG signal. On clicking the Obtain the Features button, the values of all ten RQA features will be displayed below the button, and simultaneously the RP will be displayed on the right side of the screen. When the Diagnose button is pressed, the class of the loaded EEG signal is displayed. In the illustration, the class is " Possible normal".

Fig.3. Snap shot of the Graphical User Interface that was developed to visualize the recurrence plots and classify the EEG signal

Discussion

In this section, the studies related to automatic detection of epilepsy using non-linear features are summarized. All these studies used data that was selected from the Bonn University dataset. Table 3 shows the summary of these studies.

Table 3Summary of automated identification of epilepsy work done so far.

Authors

Method

Accuracy

(%)

Nigam et. al.28

Nonlinear preprocessing filter-ANN

97.2

Srinivasan et. al.6

Time & frequency domain features - Recurrent neural network

99.6

Kannathal et. al.25

Entropy Measures - ANFIS

92.2

Polat et. al.32

Fast Fourier Transform - Decision tree

98.72

Ocak et al., 30

DWT and ApEn

96

Subasi39

Discrete wavelet transform - Mixture expert model

94.5

Ghosh-Dastidar et. al.19

Spiking Neural Network

92.5

Guler et. al.20

Lyapunov exponents - Recurrent neural networks

96.79

Sadati et. al.36

Adaptive neural fuzzy network

85.9

Chua et. al.10

HOS based features - SVM and GMM

93.11

Ghosh-Dastidar et. al.18

Wavelet and non-linear features - 1) unsupervised k-means clustering; 2) linear and quadratic discriminant analysis; 3) radial basis function neural network; 4) Levenberg-Marquardt Back Propagation Neural Network (LMBPNN).

96.7

Acharya et. al.1

Nonlinear method - SVM and GMM

95

Faust et. al.17

Frequency domain method - SVM and GMM

93.3

This work

RQA features - SVM and Fuzzy

94.4

A novel wavelet-chaos-neural network methodology was presented for classification of EEG signals into healthy, ictal, and inter-ictal signals.18 Wavelet analysis was used to decompose the EEG signals into delta, theta, alpha, beta, and gamma sub-bands. Three parameters were extracted for EEG representation: standard deviation (quantifying the signal variance), correlation dimension, and largest Lyapunov exponent. A good classification accuracy of 96.7% was obtained using a back propagation neural network and a mixed-band feature space consisting of nine parameters. In another study, nonlinear parameters like CD and LLE were extracted from wavelet based EEG sub-bands to detect 1) healthy subjects; 2) epileptic subjects during a seizure-free interval (inter-ictal EEG); 3) epileptic subjects during a seizure (ictal EEG).4 It was observed that while there may not be significant differences in the values of the parameters obtained from the original EEG signals, differences may be identified when the parameters were employed in conjunction with specific EEG sub-bands. They have shown that for the higher frequency beta and gamma sub-bands, the CD feature differentiates between the three groups, whereas for the lower frequency alpha sub-band, the LLE feature differentiates between the three groups.

Discrete Wavelet Transform (DWT) was used to unearth the hidden complexities in the epileptic EEG signals.3, 5 They used discrete Daubechies and harmonic wavelets to analyze and characterize epileptiform discharges in patients in the absence of seizure. Transient features were accurately captured and localized in both time and frequency domain using their method. ApEn and DWT were used to detect the epileptic seizures from EEG data. 30 The detection rate of epilepsy was 96% when ApEn was extracted from the DWT signal and it was 73% when ApEn was calculated from raw EEG data. EEG signals were subjected to DWT to decompose them into frequency sub-bands. The DWT coefficients were fed into a modular neural network structure to identify normal and epileptic EEG signals.39 The network structure achieved an accuracy rate (94.5%) which was higher than that of the stand-alone neural network (93.2%).

Nonlinear features derived from the Higher Order Spectra (HOS) were used to differentiate normal, pre-ictal (background) and epileptic EEG signals.9 Their results show that the HOS based measures have unique ranges for the different signals with a high confidence level (p-value = 0.002). In another study by the same group10, the HOS features were fed to GMM and SVM classifiers which presented epilepsy detection accuracies of 93.11% and 92.67% respectively.

Sample entropy (SampEn) was used to analyze the epileptic EEG signals and its performance was compared with approximate entropy (ApEn).8 Both entropies, ApEn and SampEn, decreased significantly during epilepsy. However, they showed that SampEn was 15%- 20% more sensitive as compared to ApEn in detecting the changes in epilepsy. Various entropies were used to detect normal and epileptic EEG signals using Adaptive Neuro Fuzzy Inference System (ANFIS). 25A detection accuracy of more than 90% was obtained.

Nonlinear parameters namely Correlation Dimension (CD), Largest Lyapunov Exponent (LLE), Hurst exponent (H), and entropy were used to analyze epileptic EEG signals.24 It was shown that these chaotic measures had a good discriminatory power (accuracy of more than 90% in detecting epilepsy). Recently, chaotic features like CD, H, LLE, fractal dimension, and approximate entropy (ApEn) were used to extract the important hidden features from the normal, pre-ictal (background) and epileptic EEG signals.1 GMM and SVM classifiers were used for automatic identification. Their results show that the GMM classifier performed better with an average classification efficiency of 95%, sensitivity and specificity of 92.22% and 100% respectively.

Automated detection of epilepsy using Elman network was performed.6 Their results showed that the Elman network was able to detect epilepsy with an accuracy of 99.6% with a single input feature which was better than the results obtained by using other types of neural network classifiers with two or more input features. The performance of Recurrent Neural Network (RNN) was compared with that of feedforward neural network models in detecting the early electroencephalographic changes.20 The proposed RNNs using the Lyapunov exponents were useful in analyzing long-term EEG signals for early detection of the changes with an efficiency of more than 96%.

An efficient Spiking Neural Network (SNN) model for epilepsy and epileptic seizure detection using EEGs using three training algorithms (SpikeProp, QuickProp, and RProp) was developed to classify the three EEG classes. 19 The model using RProp as the training algorithm yielded a high classification accuracy of 92.5%. Multistage nonlinear pre-processing filter in combination with an ANN was used to identify epilepsy. 28 Their proposed system was able to detect the seizures accurately up to 97.2%. Epileptic seizure in EEG signals using a hybrid system based on decision tree classifier and fast Fourier transform based Welch method was studied.32 They obtained 98.68% and 98.72% classification accuracies using 5- and 10-fold cross-validation for their proposed method.

Adaptive Neural Fuzzy Network (ANFN) was used to classify normal and epileptic EEG signals.36 Classification accuracy of about 85.9% was achieved using ANFN. Different modeling techniques and classifiers (ANN, GMM, and SVM) were used to identify pre-ictal, ictal, and normal EEG signals.17 They showed that four local maxima and four local minima values that were extracted from the power density spectrum obtained using Burg's method in combination with SVM classifier provided the highest classification rate of 93.33%, sensitivity and specificity of 98.33% and 96.67% respectively.

In this work, we used RQA features in seven classifiers Our results show that Fuzzy and SVM performed better than the other classifiers with a classification accuracy of 94.4%, sensitivity and specificity of 97.7% and 94.7% respectively. The novelty in this work is the use of RQA features for epilepsy detection. It is to be noted that most studies with accuracies higher than the accuracy obtained in this work either classified only two classes of EEG signals (normal and epileptic) (Refs. 28, 6, 32, 39) or they applied the non-linear techniques on time frames or sub-bands of data (Ref. 18) which indicates an increased data analysis time. However, in this work, we extracted the RQA features from the entire time series data. We also showed that the RQA features have good discriminatory capability to classify all three types of EEG signals (normal, pre-ictal, and ictal).

Conclusion

The application of nonlinear methods like RQA helps in the automatic detection of the hidden fluctuations in the waveforms that are highly complex in nature. Moreover, it is also evident from the numerous studies that were summarized in the discussion section of this paper that automatic detection of epilepsy with good accuracy is possible. Consequently, in this work ten RQA parameters were extracted from three classes of EEG signals (normal, pre-ictal, and ictal).These parameters were fed as inputs to seven different classifiers: (i) Fuzzy-Sugeno (i) SVM (iii) GMM (iv) KNN (v) PNN (vi) decision tree, and (vii) Naïve Bayes classifier, and a performance-based comparative study was carried out. Results show that SVM and Fuzzy classifiers are able to differentiate the three classes with the best accuracy of 94.4%, sensitivity and specificity of 97.7% and 94.7%, respectively. The accuracy of the proposed system can be improved by increasing the size of the training set.

Acknowledgements

The authors would like to thank the owners of the website http://www.agnld.uni-potsdam.de/~marwan/toolbox/ for the Cross Recurrence Plot toolbox.