Apneic Events Detection Using Different Features of Airflow Signals

Apneic-event based sleep disorders are very common and affect greatly the daily life of people. However, diagnosis of these disorders by detecting apneic events are very difficult. Studies show that analyzes of airflow signals are effective in diagnosis of apneic-event based sleep disorders. According to these studies, diagnosis can be performed by detecting the apneic episodes of the airflow signals. This work deals with detection of apneic episodes on airflow signals belonging to Apnea-ECG (Electrocardiogram) and MIT (Massachusetts Institute of Technology) BIH (Bastons’s Beth Isreal Hospital) databases. In order to accomplish this task, three representative feature sets namely classic feature set, amplitude feature set and descriptive model feature set were created. The performance of these feature sets were evaluated individually and in combination with the aid of the random forest classifier to detect apneic episodes. Moreover, effective features were selected by OneR Attribute Eval Feature Selection Algorithm to obtain higher performance. Selected 28 features for Apnea-ECG database and 31 features for MITBIH database from 54 features were applied to classifier to compare achievements. As a result, the highest classification accuracies were obtained with the usage of effective features as 96.21% for Apnea-ECG database and 92.23% for MIT-BIH database. Kappa values are also quite good (91.80 and 81.96%) and support the classification accuracies for both databases, too. The results of the study are quite promising for determining apneic events on a minute-by-minute basis.


INTRODUCTION
experienced situations such as sleepiness, tiredness, carelessnes, low concentration [3]. These situations can cause to traffic and work accidences, depression, impaired learning etc. Moreover, apnea-induced sleep disorders can trigger heart disease, cardiovascular disfunction, hypertension and myocardial infarction [2][3]. Therefore, diagnosing and treatment of SBD is important. In a clinical environment, diagnosing is generally made by polysomnography. Through polysomnography apneic events are specified and number of events are counted [4]. Calculation of apneic events number is important for diagnosis but the identification of events and determination of time intervals at which events occur are also important especially for treatment [4]. For determination of time intervals, apneic episodes of signal must be detected and separated from the normal episodes.
In this way, both events can be identified and the time when events occur can be determined.
This study focuses on the detection of apneic episodes on a minute-by-minute basis. So, we can also learn the minutes at which apneic events occur. Altough several signals have been used in many studies [5][6][7][8], airflow signals were selected for this study and detection of apneic episodes was made on the airflow signals. Because, these signals give the primary indication of apneic events [3]. Detection processes generally contain analysis of airflow signals and classification of the signals according to their specific characteristics .Therefore, characteristics of signals must be well defined. In order to define the signal characteristics, various feature sets including classical feature set, amplitude feature set and descriptive model feature set were produced in this study. And then these sets were used with a classifier RF (Random Forest) to detect the apneic events. In the study, it was aimed how successfully the apneic episodes were detected and which feature sets or features were more effective in this success.

MATERIALS AND METHOD
In this study experiments were performed with nasal airflow signals obtained from two separate databases, Apnea-ECG and MIT-BIH Polysomnographic [9][10].
These databases can be accessed on the PyhsioNet website [11]. Using records obtained from these databases, this study was realized in four distinct stages; preprocessing, feature extraction, feature selection and classification. The block diagram of the study is shown in Fig. 1.

Apnea_ECG database
This database is described in Penzel et. al. [9] . Data were recorded in Philips University in Marburg, Germany. In this database, since only 5 airflow signals contain the apneic events. These 5 signals were selected for this study.
Reference annotation file associated with each signal was created by a sleep expert to indicate the presence or absence of apnea during 1 minute. Each minute is labeled as 'A' when apnea was in progress at the beginning of the associated minute, otherwise this minute is labelled as 'N' [2].

FIG. 1. BLOCK DIAGRAM OF STUDY
In this database, apneas were associated with 90% drops in airflow. Also, minutes containing hypopneas (defined as intermittent drops in airflow below 50%, accompanied by drops in oxygen saturation of at least 4%, and followed by compensating hyperventilation) were scored as minutes containing apnea [11].

MIT-BIH Polysomnographic Database
The MIT-BIH Polysomnographic Database consists of multiple physiologic signal recordings in sleep.
Recordings were gathered in BIH Sleep Laboratory for evaluation of chronic OSA syndrome [10]. Also, database include annotation file associated with signals. File contains apnea information and sleep stage. Annotation was made by sleep experts according to the presence or absence of apneic events during 30 second periods [11].
Because rules of apneic event scoring was not specified for this database, it was accepted general apnea, hypopnea scoring rules as 90% drops in airflow signal for apnea, 50 or 70% drop in airflow signal for hypopnea [12].

Pre-Processing
Proper pre-processing can cause to be obtained of better results in all signal processing area. Filtering and segmentation processes constituted as pre-processes of our study.
In the various studies carried out by airflow signal, the filters whose frequency range is varying from 0. 01 Fig. 2 shows airflow signals which are filtered by bandpass filter with various frequency values.
As seen from Fig. 2, areas in the red circle are apneic events and they are seen more clearly in determined frequency ranges.
Generally, it is expected that the airflow signals will show a sinusoidal pattern. The period of these signals represents respiratory cycle and amplitudes are always normalized between -1 and 1 [8]. As a usual, all the signals used in our work were normalized between -1 and 1.
After the filtering and normalization processes, both For MIT database, although some episodes included the air flow cessation, they were ignored because the patient is awake at that time. Airflow cessations are not valid for apnea scoring clinically when the patient is awake.
At the end of the pre-processing stage, 2513 and 1699 episodes with 1-minute length were created from Apnea-ECG database and MIT-BIH database, respectively.

Feature Extraction
When apneic events occur at night, airflow signals exhibit different characteristics. In order to separation of apneic episodes from normal, changing characteristics must be specified. In this study, classic features set, amplitude features set and descriptive model features setwere created to define different characteristics of the airflow signals for both databases separately. Subsequently, features that were more sensitive to apneic events were identified.

Classic Features Set (Set-1)
Set-1, generally include descriptive statistic  Table 1. The "//" sign in the tables represents the distinction between features. For example, S9 represents the mean absolute deviation, and S10 represents the median absolute deviation.

Amplitude Features (Set-2)
According to AASM criteria [12] and previous studies, amplitudes of airflow signals [3,15] or peaks of the signals [13] vary during apneic events. Events are called as apnea, if there is a decrease in the peak or amplitude of signal by  90% [12]. AASM guidelines [12] define events as hypopnea if there is a decrease in the peak or amplitude of signal by  50% or by  30%. This decrease should be accompanied by 3 or 4 oxygen desaturation. It is also necessary that these reductions last for at least 10 seconds. Based on this definitions of apneic events, Set-2 was created. Table 2 shows the features of Set-2.
In order to extract amplitude features, firstly, peak and trough points and corresponding times of these points in the airflow signal were determined for any apneic episode as shown in Fig. 3. Then, features between A1-A15 were calculated using these peaks and trough points.
Amplitude was calculated as shown in Equation (1).
Where p(i) and t(i) are ith peak and ith trough, respectively.  The total number of signals peak values less than 70% of the baseline peak The total number of signals peak values less than 50% of the baseline The total number of signals peak values less than 10% of the baseline (The baseline peak is the mean value of the hightest 20% of the signal peaks) The total number of signals amplitude values less than 70% of the baseline amplitude The total number of signals amplitude values less than 50% of the baseline The total number of signals amplitude values less than 10% of the baseline (The baseline amplitude is the mean value of the highest 20% of the signal amplitudes) Mean // Maximum of time between peaks higher than 70% of baseline peak Mean // Maximum of time between peaks higher than 50% of baseline peak Mean // Maximum of time between peaks higher than 10% of baseline peak

A14
Average of absolute differences between two successive amplitudes of 60 sec signal A15 Average of absolute differences between two successive mean values of 60 sec signal over 4 second interval

Descriptive Model Features (Set-3)
Descriptive Nowadays, probability distribution plays an important role in various research areas [18]. The probability distribution is a mathematical function and it gives the probability of each value of the variable or gives the probability that the variable falls in a particular interval [19]. The most frequently used models, the parametric normal (Gaussian) model and non-parametric kernel model, were preferred in this study because data can be described efficiently by a model consisting of probabilistic distributions for facilitating analysis and classification [20][21][22][23]. In addition to these models, the histogram model was used since histograms give graphical representation of data distribution. These three models were applied to peak values of every 1-minute signal episodes. Fig. 5 summarizes this process. The upper part of the Fig. 5 shows the normal, histogram and kernel model of any apneic episode. The lower part of the Fig. 5 shows the kernel functions for both normal and apneic episodes.
Also calculation of M11 and M12 features mentioned in Table 3 is seen from the lower part of  Table 3 shows the features created by descriptive models.

Feature Selection
Determination of best features that represents the data better than others increases the discrimination of apneic episodes from normals [3]. In this study, 54 features were extracted in different categories and best features must be determined to obtain the best performance. For this purpose, OneR Attribute Eval Feature Selection Algorithm which calculates the value of a features using the OneR classification algorithm was used [24]. The OneR

Classification
The classification process was carried out to detect apneic episodes of entire airflow signals using RF classifier in Weka 3.9 [26] The RF algorithm was suggested by Brierman [27]. It is an ensemble of decision tree classifers [1]. The RF has alot of advantages such as rapidness, robustness to noise and outliers, resistance to overfit [1,28]. Moreover, in the algorithm, there are very few parameters to be determined such as number of features (m) to be used for each node and the number of trees (N) to be created [29]. Also, in the literature, RF has shown successful results for two-class problems [28][29][30]. Due to these positive properties, the RF classification algorithm has been found suitable for this study and it was preferred to classify the 1-minute airflow signal episodes.
CA (Classification Accuracy), Prec (Precision) , recall, kappa statistic and area under of ROC curve measures were used to evaluate performance of classifier and study [31].

EXPERIMENTAL RESULTS
In this study, it was aimed to detect apneic episodes of all airflow signals with RF using two different databases. In order to achieve this aim, pre-processing, feature extraction, feature selection and classification stages were realized.
In the pre-processing stage, filtering and segmentation processes were carried out. After the pre-processing stages, 2513 episodes from Apnea-ECG database [9] and 1699 episodes from the MIT-BIH database [10] were  Confusion matrices are also shown in Table 4. It can be seen from confusion matrices in Table 4, that total number of episodes with apneic and normal events are 1608 and 905, respectively. In the classification with selected 28 features, only 46 normal and 49 apneic episodes were classified wrongly. 2418 episodes were correctly classified.
The number of misclassifications with the previous 4 feature sets is higher. Consequently, Table 4 illustrated that the best performance was obtained with the selected features, when all results in Table 4 were compared in terms of all evaluation criteria.
Classification results of Set-1, Set-2, Set-3, all features, selected 28 features and selected 31 features for MIT-BIH database with RF as shown in Table 5.
According to Table 7 Literature studies, the results obtained from these studies and the comparison with this study are shown in do not been apneic event. These signals is not important for our study. Therefore, 5 airflow signal were preferred by us. We think that the difference between the CAs is due to the number of used signals.