fbpx
Wikipedia

Approximate entropy

In statistics, an approximate entropy (ApEn) is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data.[1] For example, consider two series of data:

Series A: (0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...), which alternates 0 and 1.
Series B: (0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, ...), which has either a value of 0 or 1, chosen randomly, each with probability 1/2.

Moment statistics, such as mean and variance, will not distinguish between these two series. Nor will rank order statistics distinguish between these series. Yet series A is perfectly regular: knowing a term has the value of 1 enables one to predict with certainty that the next term will have the value of 0. In contrast, series B is randomly valued: knowing a term has the value of 1 gives no insight into what value the next term will have.

Regularity was originally measured by exact regularity statistics, which has mainly centered on various entropy measures.[1] However, accurate entropy calculation requires vast amounts of data, and the results will be greatly influenced by system noise,[2] therefore it is not practical to apply these methods to experimental data. ApEn was developed by Steve M. Pincus to handle these limitations by modifying an exact regularity statistic, Kolmogorov–Sinai entropy. ApEn was initially developed to analyze medical data, such as heart rate,[1] and later spread its applications in finance,[3] physiology,[4] human factors engineering,[5] and climate sciences.[6]

Algorithm edit

A comprehensive step-by-step tutorial with an explanation of the theoretical foundations of Approximate Entropy is available.[7] The algorithm is:

Step 1
Assume a time series of data  . These are   raw data values from measurements equally spaced in time.
Step 2
Let   be a positive integer, with  , which represents the length of a run of data (essentially a window).
Let   be a positive real number, which specifies a filtering level.
Let  .
Step 3
Define   for each   where  . In other words,   is an  -dimensional vector that contains the run of data starting with  .
Define the distance between two vectors   and   as the maximum of the distances between their respective components, given by
 
for  .
Step 4
Define a count   as
 
for each   where  . Note that since   takes on all values between 1 and  , the match will be counted when   (i.e. when the test subsequence,  , is matched against itself,  ).
Step 5
Define
 
where   is the natural logarithm, and for a fixed  ,  , and   as set in Step 2.
Step 6
Define approximate entropy ( ) as
 
Parameter selection
Typically, choose   or  , whereas   depends greatly on the application.

An implementation on Physionet,[8] which is based on Pincus,[2] use   instead of   in Step 4. While a concern for artificially constructed examples, it is usually not a concern in practice.

Example edit

 
Illustration of the Heart Rate Sequence

Consider a sequence of   samples of heart rate equally spaced in time:

 

Note the sequence is periodic with a period of 3. Let's choose   and   (the values of   and   can be varied without affecting the result).

Form a sequence of vectors:

 

Distance is calculated repeatedly as follows. In the first calculation,

  which is less than  .

In the second calculation, note that  , so

  which is greater than  .

Similarly,

 

The result is a total of 17 terms   such that  . These include  . In these cases,   is

 
 
 
 

Note in Step 4,   for  . So the terms   such that   include  , and the total number is 16.

At the end of these calculations, we have

 

Then we repeat the above steps for  . First form a sequence of vectors:

 

By calculating distances between vector  , we find the vectors satisfying the filtering level have the following characteristic:

 

Therefore,

 
 
 
 

At the end of these calculations, we have

 

Finally,

 

The value is very small, so it implies the sequence is regular and predictable, which is consistent with the observation.

Python implementation edit

import numpy as np def ApEn(U, m, r) -> float:  """Approximate_entropy.""" def _maxdist(x_i, x_j): return max([abs(ua - va) for ua, va in zip(x_i, x_j)]) def _phi(m): x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)] C = [ len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x ] return (N - m + 1.0) ** (-1) * sum(np.log(C)) N = len(U) return abs(_phi(m + 1) - _phi(m)) 

Usage example:

>>> U = np.array([85, 80, 89] * 17) >>> print(ApEn(U, 2, 3)) 1.0996541105257052e-05 >>> randU = np.random.choice([85, 80, 89], size=17*3) >>> print(ApEn(randU, 2, 3)) 0.8626664154888908 

MATLAB implementation edit

  • Fast Approximate Entropy from MatLab Central
  • approximateEntropy

Interpretation edit

The presence of repetitive patterns of fluctuation in a time series renders it more predictable than a time series in which such patterns are absent. ApEn reflects the likelihood that similar patterns of observations will not be followed by additional similar observations.[9] A time series containing many repetitive patterns has a relatively small ApEn; a less predictable process has a higher ApEn.

Advantages edit

The advantages of ApEn include:[2]

  • Lower computational demand. ApEn can be designed to work for small data samples (  points) and can be applied in real time.
  • Less effect from noise. If data is noisy, the ApEn measure can be compared to the noise level in the data to determine what quality of true information may be present in the data.

Limitations edit

The ApEn algorithm counts each sequence as matching itself to avoid the occurrence of   in the calculations. This step might introduce bias in ApEn, which causes ApEn to have two poor properties in practice:[10]

  1. ApEn is heavily dependent on the record length and is uniformly lower than expected for short records.
  2. It lacks relative consistency. That is, if ApEn of one data set is higher than that of another, it should, but does not, remain higher for all conditions tested.

Applications edit

ApEn has been applied to classify electroencephalography (EEG) in psychiatric diseases, such as schizophrenia,[11] epilepsy,[12] and addiction.[13]

See also edit

References edit

  1. ^ a b c Pincus, S. M.; Gladstone, I. M.; Ehrenkranz, R. A. (1991). "A regularity statistic for medical data analysis". Journal of Clinical Monitoring and Computing. 7 (4): 335–345. doi:10.1007/BF01619355. PMID 1744678. S2CID 23455856.
  2. ^ a b c Pincus, S. M. (1991). "Approximate entropy as a measure of system complexity". Proceedings of the National Academy of Sciences. 88 (6): 2297–2301. Bibcode:1991PNAS...88.2297P. doi:10.1073/pnas.88.6.2297. PMC 51218. PMID 11607165.
  3. ^ Pincus, S.M.; Kalman, E.K. (2004). "Irregularity, volatility, risk, and financial market time series". Proceedings of the National Academy of Sciences. 101 (38): 13709–13714. Bibcode:2004PNAS..10113709P. doi:10.1073/pnas.0405168101. PMC 518821. PMID 15358860.
  4. ^ Pincus, S.M.; Goldberger, A.L. (1994). "Physiological time-series analysis: what does regularity quantify?". The American Journal of Physiology. 266 (4): 1643–1656. doi:10.1152/ajpheart.1994.266.4.H1643. PMID 8184944. S2CID 362684.
  5. ^ McKinley, R.A.; McIntire, L.K.; Schmidt, R; Repperger, D.W.; Caldwell, J.A. (2011). "Evaluation of Eye Metrics as a Detector of Fatigue". Human Factors. 53 (4): 403–414. doi:10.1177/0018720811411297. PMID 21901937. S2CID 109251681.
  6. ^ Delgado-Bonal, Alfonso; Marshak, Alexander; Yang, Yuekui; Holdaway, Daniel (2020-01-22). "Analyzing changes in the complexity of climate in the last four decades using MERRA-2 radiation data". Scientific Reports. 10 (1): 922. Bibcode:2020NatSR..10..922D. doi:10.1038/s41598-020-57917-8. ISSN 2045-2322. PMC 6976651. PMID 31969616.
  7. ^ Delgado-Bonal, Alfonso; Marshak, Alexander (June 2019). "Approximate Entropy and Sample Entropy: A Comprehensive Tutorial". Entropy. 21 (6): 541. Bibcode:2019Entrp..21..541D. doi:10.3390/e21060541. PMC 7515030. PMID 33267255.
  8. ^ . Archived from the original on 2012-06-18. Retrieved 2012-07-04.
  9. ^ Ho, K. K.; Moody, G. B.; Peng, C.K.; Mietus, J. E.; Larson, M. G.; levy, D; Goldberger, A. L. (1997). "Predicting survival in heart failure case and control subjects by use of fully automated methods for deriving nonlinear and conventional indices of heart rate dynamics". Circulation. 96 (3): 842–848. doi:10.1161/01.cir.96.3.842. PMID 9264491.
  10. ^ Richman, J.S.; Moorman, J.R. (2000). "Physiological time-series analysis using approximate entropy and sample entropy". American Journal of Physiology. Heart and Circulatory Physiology. 278 (6): 2039–2049. doi:10.1152/ajpheart.2000.278.6.H2039. PMID 10843903. S2CID 2389971.
  11. ^ Sabeti, Malihe (2009). "Entropy and complexity measures for EEG signal classification of schizophrenic and control participants". Artificial Intelligence in Medicine. 47 (3): 263–274. doi:10.1016/j.artmed.2009.03.003. PMID 19403281.
  12. ^ Yuan, Qi (2011). "Epileptic EEG classification based on extreme learning machine and nonlinear features". Epilepsy Research. 96 (1–2): 29–38. doi:10.1016/j.eplepsyres.2011.04.013. PMID 21616643. S2CID 41730913.
  13. ^ Yun, Kyongsik (2012). "Decreased cortical complexity in methamphetamine abusers". Psychiatry Research: Neuroimaging. 201 (3): 226–32. doi:10.1016/j.pscychresns.2011.07.009. PMID 22445216. S2CID 30670300.

approximate, entropy, statistics, approximate, entropy, apen, technique, used, quantify, amount, regularity, unpredictability, fluctuations, over, time, series, data, example, consider, series, data, series, which, alternates, series, which, either, value, cho. In statistics an approximate entropy ApEn is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time series data 1 For example consider two series of data Series A 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 which alternates 0 and 1 Series B 0 1 0 0 1 0 1 0 0 1 1 1 1 0 0 1 which has either a value of 0 or 1 chosen randomly each with probability 1 2 Moment statistics such as mean and variance will not distinguish between these two series Nor will rank order statistics distinguish between these series Yet series A is perfectly regular knowing a term has the value of 1 enables one to predict with certainty that the next term will have the value of 0 In contrast series B is randomly valued knowing a term has the value of 1 gives no insight into what value the next term will have Regularity was originally measured by exact regularity statistics which has mainly centered on various entropy measures 1 However accurate entropy calculation requires vast amounts of data and the results will be greatly influenced by system noise 2 therefore it is not practical to apply these methods to experimental data ApEn was developed by Steve M Pincus to handle these limitations by modifying an exact regularity statistic Kolmogorov Sinai entropy ApEn was initially developed to analyze medical data such as heart rate 1 and later spread its applications in finance 3 physiology 4 human factors engineering 5 and climate sciences 6 Contents 1 Algorithm 2 Example 3 Python implementation 4 MATLAB implementation 5 Interpretation 6 Advantages 7 Limitations 8 Applications 9 See also 10 ReferencesAlgorithm editA comprehensive step by step tutorial with an explanation of the theoretical foundations of Approximate Entropy is available 7 The algorithm is Step 1 Assume a time series of data u 1 u 2 u N displaystyle u 1 u 2 ldots u N nbsp These are N displaystyle N nbsp raw data values from measurements equally spaced in time Step 2 Let m Z displaystyle m in mathbb Z nbsp be a positive integer with m N displaystyle m leq N nbsp which represents the length of a run of data essentially a window Let r R displaystyle r in mathbb R nbsp be a positive real number which specifies a filtering level Let n N m 1 displaystyle n N m 1 nbsp Step 3 Define x i u i u i 1 u i m 1 displaystyle mathbf x i big u i u i 1 ldots u i m 1 big nbsp for each i displaystyle i nbsp where 1 i n displaystyle 1 leq i leq n nbsp In other words x i displaystyle mathbf x i nbsp is an m displaystyle m nbsp dimensional vector that contains the run of data starting with u i displaystyle u i nbsp Define the distance between two vectors x i displaystyle mathbf x i nbsp and x j displaystyle mathbf x j nbsp as the maximum of the distances between their respective components given byd x i x j max k x i k x j k max k u i k 1 u j k 1 displaystyle begin aligned d mathbf x i mathbf x j amp max k big mathbf x i k mathbf x j k big amp max k big u i k 1 u j k 1 big end aligned nbsp dd for 1 k m displaystyle 1 leq k leq m nbsp Step 4 Define a count C i m displaystyle C i m nbsp asC i m r number of j such that d x i x j r n displaystyle C i m r text number of j text such that d mathbf x i mathbf x j leq r over n nbsp dd for each i displaystyle i nbsp where 1 i j n displaystyle 1 leq i j leq n nbsp Note that since j displaystyle j nbsp takes on all values between 1 and n displaystyle n nbsp the match will be counted when j i displaystyle j i nbsp i e when the test subsequence x j displaystyle mathbf x j nbsp is matched against itself x i displaystyle mathbf x i nbsp Step 5 Defineϕ m r 1 n i 1 n log C i m r displaystyle phi m r 1 over n sum i 1 n log C i m r nbsp dd where log displaystyle log nbsp is the natural logarithm and for a fixed m displaystyle m nbsp r displaystyle r nbsp and n displaystyle n nbsp as set in Step 2 Step 6 Define approximate entropy A p E n displaystyle mathrm ApEn nbsp asA p E n m r N u ϕ m r ϕ m 1 r displaystyle mathrm ApEn m r N u phi m r phi m 1 r nbsp dd Parameter selection Typically choose m 2 displaystyle m 2 nbsp or m 3 displaystyle m 3 nbsp whereas r displaystyle r nbsp depends greatly on the application An implementation on Physionet 8 which is based on Pincus 2 use d x i x j lt r displaystyle d mathbf x i mathbf x j lt r nbsp instead of d x i x j r displaystyle d mathbf x i mathbf x j leq r nbsp in Step 4 While a concern for artificially constructed examples it is usually not a concern in practice Example edit nbsp Illustration of the Heart Rate SequenceConsider a sequence of N 51 displaystyle N 51 nbsp samples of heart rate equally spaced in time S N 85 80 89 85 80 89 displaystyle S N 85 80 89 85 80 89 ldots nbsp Note the sequence is periodic with a period of 3 Let s choose m 2 displaystyle m 2 nbsp and r 3 displaystyle r 3 nbsp the values of m displaystyle m nbsp and r displaystyle r nbsp can be varied without affecting the result Form a sequence of vectors x 1 u 1 u 2 85 80 x 2 u 2 u 3 80 89 x 3 u 3 u 4 89 85 x 4 u 4 u 5 85 80 displaystyle begin aligned mathbf x 1 amp u 1 u 2 85 80 mathbf x 2 amp u 2 u 3 80 89 mathbf x 3 amp u 3 u 4 89 85 mathbf x 4 amp u 4 u 5 85 80 amp vdots end aligned nbsp Distance is calculated repeatedly as follows In the first calculation d x 1 x 1 max k x 1 k x 1 k 0 displaystyle d mathbf x 1 mathbf x 1 max k mathbf x 1 k mathbf x 1 k 0 nbsp which is less than r displaystyle r nbsp In the second calculation note that u 2 u 3 gt u 1 u 2 displaystyle u 2 u 3 gt u 1 u 2 nbsp so d x 1 x 2 max k x 1 k x 2 k u 2 u 3 9 displaystyle d mathbf x 1 mathbf x 2 max k mathbf x 1 k mathbf x 2 k u 2 u 3 9 nbsp which is greater than r displaystyle r nbsp Similarly d x 1 x 3 u 2 u 4 5 gt r d x 1 x 4 u 1 u 4 u 2 u 5 0 lt r d x 1 x j displaystyle begin aligned d mathbf x 1 amp mathbf x 3 u 2 u 4 5 gt r d mathbf x 1 amp mathbf x 4 u 1 u 4 u 2 u 5 0 lt r amp vdots d mathbf x 1 amp mathbf x j cdots amp vdots end aligned nbsp The result is a total of 17 terms x j displaystyle mathbf x j nbsp such that d x 1 x j r displaystyle d mathbf x 1 mathbf x j leq r nbsp These include x 1 x 4 x 7 x 49 displaystyle mathbf x 1 mathbf x 4 mathbf x 7 ldots mathbf x 49 nbsp In these cases C i m r displaystyle C i m r nbsp is C 1 2 3 17 50 displaystyle C 1 2 3 frac 17 50 nbsp C 2 2 3 17 50 displaystyle C 2 2 3 frac 17 50 nbsp C 3 2 3 16 50 displaystyle C 3 2 3 frac 16 50 nbsp C 4 2 3 17 50 displaystyle C 4 2 3 frac 17 50 cdots nbsp Note in Step 4 1 i n displaystyle 1 leq i leq n nbsp for x i displaystyle mathbf x i nbsp So the terms x j displaystyle mathbf x j nbsp such that d x 3 x j r displaystyle d mathbf x 3 mathbf x j leq r nbsp include x 3 x 6 x 9 x 48 displaystyle mathbf x 3 mathbf x 6 mathbf x 9 ldots mathbf x 48 nbsp and the total number is 16 At the end of these calculations we have ϕ 2 3 1 50 i 1 50 log C i 2 3 1 0982 displaystyle phi 2 3 1 over 50 sum i 1 50 log C i 2 3 approx 1 0982 nbsp Then we repeat the above steps for m 3 displaystyle m 3 nbsp First form a sequence of vectors x 1 u 1 u 2 u 3 85 80 89 x 2 u 2 u 3 u 4 80 89 85 x 3 u 3 u 4 u 5 89 85 80 x 4 u 4 u 5 u 6 85 80 89 displaystyle begin aligned mathbf x 1 amp u 1 u 2 u 3 85 80 89 mathbf x 2 amp u 2 u 3 u 4 80 89 85 mathbf x 3 amp u 3 u 4 u 5 89 85 80 mathbf x 4 amp u 4 u 5 u 6 85 80 89 amp vdots end aligned nbsp By calculating distances between vector x i x j 1 i 49 displaystyle mathbf x i mathbf x j 1 leq i leq 49 nbsp we find the vectors satisfying the filtering level have the following characteristic d x i x i 3 0 lt r displaystyle d mathbf x i mathbf x i 3 0 lt r nbsp Therefore C 1 3 3 17 49 displaystyle C 1 3 3 frac 17 49 nbsp C 2 3 3 16 49 displaystyle C 2 3 3 frac 16 49 nbsp C 3 3 3 16 49 displaystyle C 3 3 3 frac 16 49 nbsp C 4 3 3 17 49 displaystyle C 4 3 3 frac 17 49 cdots nbsp At the end of these calculations we have ϕ 3 3 1 49 i 1 49 log C i 3 3 1 0982 displaystyle phi 3 3 1 over 49 sum i 1 49 log C i 3 3 approx 1 0982 nbsp Finally A p E n ϕ 2 3 ϕ 3 3 0 000010997 displaystyle mathrm ApEn phi 2 3 phi 3 3 approx 0 000010997 nbsp The value is very small so it implies the sequence is regular and predictable which is consistent with the observation Python implementation editimport numpy as np def ApEn U m r gt float Approximate entropy def maxdist x i x j return max abs ua va for ua va in zip x i x j def phi m x U j for j in range i i m 1 1 for i in range N m 1 C len 1 for x j in x if maxdist x i x j lt r N m 1 0 for x i in x return N m 1 0 1 sum np log C N len U return abs phi m 1 phi m Usage example gt gt gt U np array 85 80 89 17 gt gt gt print ApEn U 2 3 1 0996541105257052e 05 gt gt gt randU np random choice 85 80 89 size 17 3 gt gt gt print ApEn randU 2 3 0 8626664154888908MATLAB implementation editFast Approximate Entropy from MatLab Central approximateEntropyInterpretation editThe presence of repetitive patterns of fluctuation in a time series renders it more predictable than a time series in which such patterns are absent ApEn reflects the likelihood that similar patterns of observations will not be followed by additional similar observations 9 A time series containing many repetitive patterns has a relatively small ApEn a less predictable process has a higher ApEn Advantages editThe advantages of ApEn include 2 Lower computational demand ApEn can be designed to work for small data samples N lt 50 displaystyle N lt 50 nbsp points and can be applied in real time Less effect from noise If data is noisy the ApEn measure can be compared to the noise level in the data to determine what quality of true information may be present in the data Limitations editThe ApEn algorithm counts each sequence as matching itself to avoid the occurrence of log 0 displaystyle log 0 nbsp in the calculations This step might introduce bias in ApEn which causes ApEn to have two poor properties in practice 10 ApEn is heavily dependent on the record length and is uniformly lower than expected for short records It lacks relative consistency That is if ApEn of one data set is higher than that of another it should but does not remain higher for all conditions tested Applications editApEn has been applied to classify electroencephalography EEG in psychiatric diseases such as schizophrenia 11 epilepsy 12 and addiction 13 See also editRecurrence quantification analysis Sample entropyReferences edit a b c Pincus S M Gladstone I M Ehrenkranz R A 1991 A regularity statistic for medical data analysis Journal of Clinical Monitoring and Computing 7 4 335 345 doi 10 1007 BF01619355 PMID 1744678 S2CID 23455856 a b c Pincus S M 1991 Approximate entropy as a measure of system complexity Proceedings of the National Academy of Sciences 88 6 2297 2301 Bibcode 1991PNAS 88 2297P doi 10 1073 pnas 88 6 2297 PMC 51218 PMID 11607165 Pincus S M Kalman E K 2004 Irregularity volatility risk and financial market time series Proceedings of the National Academy of Sciences 101 38 13709 13714 Bibcode 2004PNAS 10113709P doi 10 1073 pnas 0405168101 PMC 518821 PMID 15358860 Pincus S M Goldberger A L 1994 Physiological time series analysis what does regularity quantify The American Journal of Physiology 266 4 1643 1656 doi 10 1152 ajpheart 1994 266 4 H1643 PMID 8184944 S2CID 362684 McKinley R A McIntire L K Schmidt R Repperger D W Caldwell J A 2011 Evaluation of Eye Metrics as a Detector of Fatigue Human Factors 53 4 403 414 doi 10 1177 0018720811411297 PMID 21901937 S2CID 109251681 Delgado Bonal Alfonso Marshak Alexander Yang Yuekui Holdaway Daniel 2020 01 22 Analyzing changes in the complexity of climate in the last four decades using MERRA 2 radiation data Scientific Reports 10 1 922 Bibcode 2020NatSR 10 922D doi 10 1038 s41598 020 57917 8 ISSN 2045 2322 PMC 6976651 PMID 31969616 Delgado Bonal Alfonso Marshak Alexander June 2019 Approximate Entropy and Sample Entropy A Comprehensive Tutorial Entropy 21 6 541 Bibcode 2019Entrp 21 541D doi 10 3390 e21060541 PMC 7515030 PMID 33267255 PhysioNet Archived from the original on 2012 06 18 Retrieved 2012 07 04 Ho K K Moody G B Peng C K Mietus J E Larson M G levy D Goldberger A L 1997 Predicting survival in heart failure case and control subjects by use of fully automated methods for deriving nonlinear and conventional indices of heart rate dynamics Circulation 96 3 842 848 doi 10 1161 01 cir 96 3 842 PMID 9264491 Richman J S Moorman J R 2000 Physiological time series analysis using approximate entropy and sample entropy American Journal of Physiology Heart and Circulatory Physiology 278 6 2039 2049 doi 10 1152 ajpheart 2000 278 6 H2039 PMID 10843903 S2CID 2389971 Sabeti Malihe 2009 Entropy and complexity measures for EEG signal classification of schizophrenic and control participants Artificial Intelligence in Medicine 47 3 263 274 doi 10 1016 j artmed 2009 03 003 PMID 19403281 Yuan Qi 2011 Epileptic EEG classification based on extreme learning machine and nonlinear features Epilepsy Research 96 1 2 29 38 doi 10 1016 j eplepsyres 2011 04 013 PMID 21616643 S2CID 41730913 Yun Kyongsik 2012 Decreased cortical complexity in methamphetamine abusers Psychiatry Research Neuroimaging 201 3 226 32 doi 10 1016 j pscychresns 2011 07 009 PMID 22445216 S2CID 30670300 Retrieved from https en wikipedia org w index php title Approximate entropy amp oldid 1173919176, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.