fbpx
Wikipedia

Independent component analysis

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other.[1] ICA is a special case of blind source separation. A common example application is the "cocktail party problem" of listening in on one person's speech in a noisy room.[2]

Introduction edit

ICA on four randomly mixed videos.[3] Top row: The original source videos. Middle row: Four random mixtures used as input to the algorithm. Bottom row: The reconstructed videos.

Independent component analysis attempts to decompose a multivariate signal into independent non-Gaussian signals. As an example, sound is usually a signal that is composed of the numerical addition, at each time t, of signals from several sources. The question then is whether it is possible to separate these contributing sources from the observed total signal. When the statistical independence assumption is correct, blind ICA separation of a mixed signal gives very good results.[citation needed] It is also used for signals that are not supposed to be generated by mixing for analysis purposes.

A simple application of ICA is the "cocktail party problem", where the underlying speech signals are separated from a sample data consisting of people talking simultaneously in a room. Usually the problem is simplified by assuming no time delays or echoes. Note that a filtered and delayed signal is a copy of a dependent component, and thus the statistical independence assumption is not violated.

Mixing weights for constructing the   observed signals from the   components can be placed in an   matrix. An important thing to consider is that if   sources are present, at least   observations (e.g. microphones if the observed signal is audio) are needed to recover the original signals. When there are an equal number of observations and source signals, the mixing matrix is square ( ). Other cases of underdetermined ( ) and overdetermined ( ) have been investigated.

The success of ICA separation of mixed signals relies on two assumptions and three effects of mixing source signals. Two assumptions:

  1. The source signals are independent of each other.
  2. The values in each source signal have non-Gaussian distributions.

Three effects of mixing source signals:

  1. Independence: As per assumption 1, the source signals are independent; however, their signal mixtures are not. This is because the signal mixtures share the same source signals.
  2. Normality: According to the Central Limit Theorem, the distribution of a sum of independent random variables with finite variance tends towards a Gaussian distribution.
    Loosely speaking, a sum of two independent random variables usually has a distribution that is closer to Gaussian than any of the two original variables. Here we consider the value of each signal as the random variable.
  3. Complexity: The temporal complexity of any signal mixture is greater than that of its simplest constituent source signal.

Those principles contribute to the basic establishment of ICA. If the signals extracted from a set of mixtures are independent and have non-Gaussian distributions or have low complexity, then they must be source signals.[4][5]

Defining component independence edit

ICA finds the independent components (also called factors, latent variables or sources) by maximizing the statistical independence of the estimated components. We may choose one of many ways to define a proxy for independence, and this choice governs the form of the ICA algorithm. The two broadest definitions of independence for ICA are

  1. Minimization of mutual information
  2. Maximization of non-Gaussianity

The Minimization-of-Mutual information (MMI) family of ICA algorithms uses measures like Kullback-Leibler Divergence and maximum entropy. The non-Gaussianity family of ICA algorithms, motivated by the central limit theorem, uses kurtosis and negentropy.

Typical algorithms for ICA use centering (subtract the mean to create a zero mean signal), whitening (usually with the eigenvalue decomposition), and dimensionality reduction as preprocessing steps in order to simplify and reduce the complexity of the problem for the actual iterative algorithm. Whitening and dimension reduction can be achieved with principal component analysis or singular value decomposition. Whitening ensures that all dimensions are treated equally a priori before the algorithm is run. Well-known algorithms for ICA include infomax, FastICA, JADE, and kernel-independent component analysis, among others. In general, ICA cannot identify the actual number of source signals, a uniquely correct ordering of the source signals, nor the proper scaling (including sign) of the source signals.

ICA is important to blind signal separation and has many practical applications. It is closely related to (or even a special case of) the search for a factorial code of the data, i.e., a new vector-valued representation of each data vector such that it gets uniquely encoded by the resulting code vector (loss-free coding), but the code components are statistically independent.

Mathematical definitions edit

Linear independent component analysis can be divided into noiseless and noisy cases, where noiseless ICA is a special case of noisy ICA. Nonlinear ICA should be considered as a separate case.

General definition edit

The data are represented by the observed random vector   and the hidden components as the random vector   The task is to transform the observed data   using a linear static transformation   as   into a vector of maximally independent components   measured by some function   of independence.

Generative model edit

Linear noiseless ICA edit

The components   of the observed random vector   are generated as a sum of the independent components  ,  :

 

weighted by the mixing weights  .

The same generative model can be written in vector form as  , where the observed random vector   is represented by the basis vectors  . The basis vectors   form the columns of the mixing matrix   and the generative formula can be written as  , where  .

Given the model and realizations (samples)   of the random vector  , the task is to estimate both the mixing matrix   and the sources  . This is done by adaptively calculating the   vectors and setting up a cost function which either maximizes the non-gaussianity of the calculated   or minimizes the mutual information. In some cases, a priori knowledge of the probability distributions of the sources can be used in the cost function.

The original sources   can be recovered by multiplying the observed signals   with the inverse of the mixing matrix  , also known as the unmixing matrix. Here it is assumed that the mixing matrix is square ( ). If the number of basis vectors is greater than the dimensionality of the observed vectors,  , the task is overcomplete but is still solvable with the pseudo inverse.

Linear noisy ICA edit

With the added assumption of zero-mean and uncorrelated Gaussian noise  , the ICA model takes the form  .

Nonlinear ICA edit

The mixing of the sources does not need to be linear. Using a nonlinear mixing function   with parameters   the nonlinear ICA model is  .

Identifiability edit

The independent components are identifiable up to a permutation and scaling of the sources.[6] This identifiability requires that:

  • At most one of the sources   is Gaussian,
  • The number of observed mixtures,  , must be at least as large as the number of estimated components  :  . It is equivalent to say that the mixing matrix   must be of full rank for its inverse to exist.

Binary ICA edit

A special variant of ICA is binary ICA in which both signal sources and monitors are in binary form and observations from monitors are disjunctive mixtures of binary independent sources. The problem was shown to have applications in many domains including medical diagnosis, multi-cluster assignment, network tomography and internet resource management.

Let   be the set of binary variables from   monitors and   be the set of binary variables from   sources. Source-monitor connections are represented by the (unknown) mixing matrix  , where   indicates that signal from the i-th source can be observed by the j-th monitor. The system works as follows: at any time, if a source   is active ( ) and it is connected to the monitor   ( ) then the monitor   will observe some activity ( ). Formally we have:

 

where   is Boolean AND and   is Boolean OR. Noise is not explicitly modelled, rather, can be treated as independent sources.

The above problem can be heuristically solved[7] by assuming variables are continuous and running FastICA on binary observation data to get the mixing matrix   (real values), then apply round number techniques on   to obtain the binary values. This approach has been shown to produce a highly inaccurate result.[citation needed]

Another method is to use dynamic programming: recursively breaking the observation matrix   into its sub-matrices and run the inference algorithm on these sub-matrices. The key observation which leads to this algorithm is the sub-matrix   of   where   corresponds to the unbiased observation matrix of hidden components that do not have connection to the  -th monitor. Experimental results from[8] show that this approach is accurate under moderate noise levels.

The Generalized Binary ICA framework[9] introduces a broader problem formulation which does not necessitate any knowledge on the generative model. In other words, this method attempts to decompose a source into its independent components (as much as possible, and without losing any information) with no prior assumption on the way it was generated. Although this problem appears quite complex, it can be accurately solved with a branch and bound search tree algorithm or tightly upper bounded with a single multiplication of a matrix with a vector.

Methods for blind source separation edit

Projection pursuit edit

Signal mixtures tend to have Gaussian probability density functions, and source signals tend to have non-Gaussian probability density functions. Each source signal can be extracted from a set of signal mixtures by taking the inner product of a weight vector and those signal mixtures where this inner product provides an orthogonal projection of the signal mixtures. The remaining challenge is finding such a weight vector. One type of method for doing so is projection pursuit.[10][11]

Projection pursuit seeks one projection at a time such that the extracted signal is as non-Gaussian as possible. This contrasts with ICA, which typically extracts M signals simultaneously from M signal mixtures, which requires estimating a M × M unmixing matrix. One practical advantage of projection pursuit over ICA is that fewer than M signals can be extracted if required, where each source signal is extracted from M signal mixtures using an M-element weight vector.

We can use kurtosis to recover the multiple source signal by finding the correct weight vectors with the use of projection pursuit.

The kurtosis of the probability density function of a signal, for a finite sample, is computed as

 

where   is the sample mean of  , the extracted signals. The constant 3 ensures that Gaussian signals have zero kurtosis, Super-Gaussian signals have positive kurtosis, and Sub-Gaussian signals have negative kurtosis. The denominator is the variance of  , and ensures that the measured kurtosis takes account of signal variance. The goal of projection pursuit is to maximize the kurtosis, and make the extracted signal as non-normal as possible.

Using kurtosis as a measure of non-normality, we can now examine how the kurtosis of a signal   extracted from a set of M mixtures   varies as the weight vector   is rotated around the origin. Given our assumption that each source signal   is super-gaussian we would expect:

  1. the kurtosis of the extracted signal   to be maximal precisely when  .
  2. the kurtosis of the extracted signal   to be maximal when   is orthogonal to the projected axes   or  , because we know the optimal weight vector should be orthogonal to a transformed axis   or  .

For multiple source mixture signals, we can use kurtosis and Gram-Schmidt Orthogonalization (GSO) to recover the signals. Given M signal mixtures in an M-dimensional space, GSO project these data points onto an (M-1)-dimensional space by using the weight vector. We can guarantee the independence of the extracted signals with the use of GSO.

In order to find the correct value of  , we can use gradient descent method. We first of all whiten the data, and transform   into a new mixture  , which has unit variance, and  . This process can be achieved by applying Singular value decomposition to  ,

 

Rescaling each vector  , and let  . The signal extracted by a weighted vector   is  . If the weight vector w has unit length, then the variance of y is also 1, that is  . The kurtosis can thus be written as:

 

The updating process for   is:

 

where   is a small constant to guarantee that   converges to the optimal solution. After each update, we normalize  , and set  , and repeat the updating process until convergence. We can also use another algorithm to update the weight vector  .

Another approach is using negentropy[12][13] instead of kurtosis. Using negentropy is a more robust method than kurtosis, as kurtosis is very sensitive to outliers. The negentropy methods are based on an important property of Gaussian distribution: a Gaussian variable has the largest entropy among all continuous random variables of equal variance. This is also the reason why we want to find the most nongaussian variables. A simple proof can be found in Differential entropy.

 

y is a Gaussian random variable of the same covariance matrix as x

 

An approximation for negentropy is

 

A proof can be found in the original papers of Comon;[14][12] it has been reproduced in the book Independent Component Analysis by Aapo Hyvärinen, Juha Karhunen, and Erkki Oja[15] This approximation also suffers from the same problem as kurtosis (sensitivity to outliers). Other approaches have been developed.[16]

 

A choice of   and   are

  and  

Based on infomax edit

Infomax ICA[17] is essentially a multivariate, parallel version of projection pursuit. Whereas projection pursuit extracts a series of signals one at a time from a set of M signal mixtures, ICA extracts M signals in parallel. This tends to make ICA more robust than projection pursuit.[18]

The projection pursuit method uses Gram-Schmidt orthogonalization to ensure the independence of the extracted signal, while ICA use infomax and maximum likelihood estimate to ensure the independence of the extracted signal. The Non-Normality of the extracted signal is achieved by assigning an appropriate model, or prior, for the signal.

The process of ICA based on infomax in short is: given a set of signal mixtures   and a set of identical independent model cumulative distribution functions(cdfs)  , we seek the unmixing matrix   which maximizes the joint entropy of the signals  , where   are the signals extracted by  . Given the optimal  , the signals   have maximum entropy and are therefore independent, which ensures that the extracted signals   are also independent.   is an invertible function, and is the signal model. Note that if the source signal model probability density function   matches the probability density function of the extracted signal  , then maximizing the joint entropy of   also maximizes the amount of mutual information between   and  . For this reason, using entropy to extract independent signals is known as infomax.

Consider the entropy of the vector variable  , where   is the set of signals extracted by the unmixing matrix  . For a finite set of values sampled from a distribution with pdf  , the entropy of   can be estimated as:

 

The joint pdf   can be shown to be related to the joint pdf   of the extracted signals by the multivariate form:

 

where   is the Jacobian matrix. We have  , and   is the pdf assumed for source signals  , therefore,

 

therefore,

 

We know that when  ,   is of uniform distribution, and   is maximized. Since

 

where   is the absolute value of the determinant of the unmixing matrix  . Therefore,

 

so,

 

since  , and maximizing   does not affect  , so we can maximize the function

 

to achieve the independence of extracted signal.

If there are M marginal pdfs of the model joint pdf   are independent and use the commonly super-gaussian model pdf for the source signals  , then we have

 

In the sum, given an observed signal mixture  , the corresponding set of extracted signals   and source signal model  , we can find the optimal unmixing matrix  , and make the extracted signals independent and non-gaussian. Like the projection pursuit situation, we can use gradient descent method to find the optimal solution of the unmixing matrix.

Based on maximum likelihood estimation edit

Maximum likelihood estimation (MLE) is a standard statistical tool for finding parameter values (e.g. the unmixing matrix  ) that provide the best fit of some data (e.g., the extracted signals  ) to a given a model (e.g., the assumed joint probability density function (pdf)   of source signals).[18]

The ML "model" includes a specification of a pdf, which in this case is the pdf   of the unknown source signals  . Using ML ICA, the objective is to find an unmixing matrix that yields extracted signals   with a joint pdf as similar as possible to the joint pdf   of the unknown source signals  .

MLE is thus based on the assumption that if the model pdf   and the model parameters   are correct then a high probability should be obtained for the data   that were actually observed. Conversely, if   is far from the correct parameter values then a low probability of the observed data would be expected.

Using MLE, we call the probability of the observed data for a given set of model parameter values (e.g., a pdf   and a matrix  ) the likelihood of the model parameter values given the observed data.

We define a likelihood function   of  :

 

This equals to the probability density at  , since  .

Thus, if we wish to find a   that is most likely to have generated the observed mixtures   from the unknown source signals   with pdf   then we need only find that   which maximizes the likelihood  . The unmixing matrix that maximizes equation is known as the MLE of the optimal unmixing matrix.

It is common practice to use the log likelihood, because this is easier to evaluate. As the logarithm is a monotonic function, the   that maximizes the function   also maximizes its logarithm  . This allows us to take the logarithm of equation above, which yields the log likelihood function

 

If we substitute a commonly used high-Kurtosis model pdf for the source signals   then we have

 

This matrix   that maximizes this function is the maximum likelihood estimation.

History and background edit

The early general framework for independent component analysis was introduced by Jeanny Hérault and Bernard Ans from 1984,[19] further developed by Christian Jutten in 1985 and 1986,[20][21][22] and refined by Pierre Comon in 1991,[14] and popularized in his paper of 1994.[12] In 1995, Tony Bell and Terry Sejnowski introduced a fast and efficient ICA algorithm based on infomax, a principle introduced by Ralph Linsker in 1987.

There are many algorithms available in the literature which do ICA. A largely used one, including in industrial applications, is the FastICA algorithm, developed by Hyvärinen and Oja, which uses the negentropy as cost function.[23] Other examples are rather related to blind source separation where a more general approach is used. For example, one can drop the independence assumption and separate mutually correlated signals, thus, statistically "dependent" signals. Sepp Hochreiter and Jürgen Schmidhuber showed how to obtain non-linear ICA or source separation as a by-product of regularization (1999).[24] Their method does not require a priori knowledge about the number of independent sources.

Applications edit

ICA can be extended to analyze non-physical signals. For instance, ICA has been applied to discover discussion topics on a bag of news list archives.

Some ICA applications are listed below:[4]

 
Independent component analysis in EEGLAB
  • optical Imaging of neurons[25]
  • neuronal spike sorting[26]
  • face recognition[27]
  • modelling receptive fields of primary visual neurons[28]
  • predicting stock market prices[29]
  • mobile phone communications[30]
  • colour based detection of the ripeness of tomatoes[31]
  • removing artifacts, such as eye blinks, from EEG data.[32]
  • predicting decision-making using EEG[33]
  • analysis of changes in gene expression over time in single cell RNA-sequencing experiments.[34]
  • studies of the resting state network of the brain.[35]
  • astronomy and cosmology[36]
  • finance[37]

Availability edit

ICA can be applied through the following software:

  • SAS PROC ICA
  • R ICA package
  • scikit-learn Python implementation sklearn.decomposition.FastICA

See also edit

Notes edit

  1. ^ "Independent Component Analysis: A Demo".
  2. ^ Hyvärinen, Aapo (2013). "Independent component analysis: recent advances". Philosophical Transactions: Mathematical, Physical and Engineering Sciences. 371 (1984): 20110534. Bibcode:2012RSPTA.37110534H. doi:10.1098/rsta.2011.0534. ISSN 1364-503X. JSTOR 41739975. PMC 3538438. PMID 23277597.
  3. ^ Isomura, Takuya; Toyoizumi, Taro (2016). "A local learning rule for independent component analysis". Scientific Reports. 6: 28073. Bibcode:2016NatSR...628073I. doi:10.1038/srep28073. PMC 4914970. PMID 27323661.
  4. ^ a b Stone, James V. (2004). Independent component analysis : a tutorial introduction. Cambridge, Massachusetts: MIT Press. ISBN 978-0-262-69315-8.
  5. ^ Hyvärinen, Aapo; Karhunen, Juha; Oja, Erkki (2001). Independent component analysis (1st ed.). New York: John Wiley & Sons. ISBN 978-0-471-22131-9.
  6. ^ Theorem 11, Comon, Pierre. "Independent component analysis, a new concept?." Signal processing 36.3 (1994): 287-314.
  7. ^ Johan Himbergand Aapo Hyvärinen, Independent Component Analysis For Binary Data: An Experimental Study, Proc. Int. Workshop on Independent Component Analysis and Blind Signal Separation (ICA2001), San Diego, California, 2001.
  8. ^ Huy Nguyen and Rong Zheng, Binary Independent Component Analysis With or Mixtures, IEEE Transactions on Signal Processing, Vol. 59, Issue 7. (July 2011), pp. 3168–3181.
  9. ^ Painsky, Amichai; Rosset, Saharon; Feder, Meir (2014). "Generalized binary independent component analysis". 2014 IEEE International Symposium on Information Theory. pp. 1326–1330. doi:10.1109/ISIT.2014.6875048. ISBN 978-1-4799-5186-4. S2CID 18579555.
  10. ^ James V. Stone(2004); "Independent Component Analysis: A Tutorial Introduction", The MIT Press Cambridge, Massachusetts, London, England; ISBN 0-262-69315-1
  11. ^ Kruskal, JB. 1969; "Toward a practical method which helps uncover the structure of a set of observations by finding the line transformation which optimizes a new "index of condensation", Pages 427–440 of: Milton, RC, & Nelder, JA (eds), Statistical computation; New York, Academic Press
  12. ^ a b c Pierre Comon (1994) Independent component analysis, a new concept? http://www.ece.ucsb.edu/wcsl/courses/ECE594/594C_F10Madhow/comon94.pdf
  13. ^ Hyvärinen, Aapo; Erkki Oja (2000). "Independent Component Analysis:Algorithms and Applications". Neural Networks. 4-5. 13 (4–5): 411–430. CiteSeerX 10.1.1.79.7003. doi:10.1016/s0893-6080(00)00026-5. PMID 10946390. S2CID 11959218.
  14. ^ a b P.Comon, Independent Component Analysis, Workshop on Higher-Order Statistics, July 1991, republished in J-L. Lacoume, editor, Higher Order Statistics, pp. 29-38. Elsevier, Amsterdam, London, 1992. HAL link
  15. ^ Hyvärinen, Aapo; Karhunen, Juha; Oja, Erkki (2001). Independent component analysis (Reprint ed.). New York, NY: Wiley. ISBN 978-0-471-40540-5.
  16. ^ Hyvärinen, Aapo (1998). "New approximations of differential entropy for independent component analysis and projection pursuit". Advances in Neural Information Processing Systems. 10: 273–279.
  17. ^ Bell, A. J.; Sejnowski, T. J. (1995). "An Information-Maximization Approach to Blind Separation and Blind Deconvolution", Neural Computation, 7, 1129-1159
  18. ^ a b James V. Stone (2004). "Independent Component Analysis: A Tutorial Introduction", The MIT Press Cambridge, Massachusetts, London, England; ISBN 0-262-69315-1
  19. ^ Hérault, J.; Ans, B. (1984). "Réseau de neurones à synapses modifiables : Décodage de messages sensoriels composites par apprentissage non supervisé et permanent". Comptes Rendus de l'Académie des Sciences, Série III. 299: 525–528.
  20. ^ Ans, B., Hérault, J., & Jutten, C. (1985). Architectures neuromimétiques adaptatives  : Détection de primitives. Cognitiva 85 (Vol. 2, pp. 593-597). Paris: CESTA.
  21. ^ Hérault, J., Jutten, C., & Ans, B. (1985). Détection de grandeurs primitives dans un message composite par une architecture de calcul neuromimétique en apprentissage non supervisé. Proceedings of the 10th Workshop Traitement du signal et ses applications (Vol. 2, pp. 1017-1022). Nice (France): GRETSI.
  22. ^ Hérault, J., & Jutten, C. (1986). Space or time adaptive signal processing by neural networks models. Intern. Conf. on Neural Networks for Computing (pp. 206-211). Snowbird (Utah, USA).
  23. ^ Hyvärinen, A.; Oja, E. (2000-06-01). "Independent component analysis: algorithms and applications". Neural Networks. 13 (4): 411–430. doi:10.1016/S0893-6080(00)00026-5. ISSN 0893-6080. PMID 10946390. S2CID 11959218.
  24. ^ Hochreiter, Sepp; Schmidhuber, Jürgen (1999). "Feature Extraction Through LOCOCODE" (PDF). Neural Computation. 11 (3): 679–714. doi:10.1162/089976699300016629. ISSN 0899-7667. PMID 10085426. S2CID 1642107. Retrieved 24 February 2018.
  25. ^ Brown, GD; Yamada,S; Sejnowski, TJ (2001). "Independent components analysis at the neural cocktail party". Trends in Neurosciences. 24 (1): 54–63. doi:10.1016/s0166-2236(00)01683-0. PMID 11163888. S2CID 511254.
  26. ^ Lewicki, MS (1998). "Areview of methods for spike sorting: detection and classification of neural action potentials". Network: Computation in Neural Systems. 9 (4): 53–78. doi:10.1088/0954-898X_9_4_001. S2CID 10290908.
  27. ^ Barlett, MS (2001). Face image analysis by unsupervised learning. Boston: Kluwer International Series on Engineering and Computer Science.
  28. ^ Bell, AJ; Sejnowski, TJ (1997). "The independent components of natural scenes are edge filters". Vision Research. 37 (23): 3327–3338. doi:10.1016/s0042-6989(97)00121-1. PMC 2882863. PMID 9425547.
  29. ^ Back, AD; Weigend, AS (1997). "A first application of independent component analysis to extracting structure from stock returns". International Journal of Neural Systems. 8 (4): 473–484. doi:10.1142/s0129065797000458. PMID 9730022. S2CID 872703.
  30. ^ Hyvarinen, A, Karhunen,J & Oja,E (2001a). Independent component analysis. New York: John Wiley and Sons.{{cite book}}: CS1 maint: multiple names: authors list (link)
  31. ^ Polder, G; van der Heijen, FWAM (2003). "Estimation of compound distribution in spectral images of tomatoes using independent component analysis". Austrian Computer Society: 57–64.
  32. ^ Delorme, A; Sejnowski, T; Makeig, S (2007). "Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis". NeuroImage. 34 (4): 1443–1449. doi:10.1016/j.neuroimage.2006.11.004. PMC 2895624. PMID 17188898.
  33. ^ Douglas, P (2013). "Single trial decoding of belief decision making from EEG and fMRI data using independent components features". Frontiers in Human Neuroscience. 7: 392. doi:10.3389/fnhum.2013.00392. PMC 3728485. PMID 23914164.
  34. ^ Trapnell, C; Cacchiarelli, D; Grimsby, J (2014). "The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells". Nature Biotechnology. 32 (4): 381–386. doi:10.1038/nbt.2859. PMC 4122333. PMID 24658644.
  35. ^ Kiviniemi, Vesa J.; Kantola, Juha-Heikki; Jauhiainen, Jukka; Hyvärinen, Aapo; Tervonen, Osmo (2003). "Independent component analysis of nondeterministic fMRI signal sources". NeuroImage. 19 (2): 253–260. doi:10.1016/S1053-8119(03)00097-1. PMID 12814576. S2CID 17110486.
  36. ^ Wang, Jingying; Xu, Haiguang; Gu, Junhua; An, Tao; Cui, Haijuan; Li, Jianxun; Zhang, Zhongli; Zheng, Qian; Wu, Xiang-Ping (2010-11-01). "How to Identify and Separate Bright Galaxy Clusters from the Low-frequency Radio Sky?". The Astrophysical Journal. 723 (1): 620–633. arXiv:1008.3391. Bibcode:2010ApJ...723..620W. doi:10.1088/0004-637X/723/1/620. ISSN 0004-637X.
  37. ^ Moraux, Franck; Villa, Christophe (2003). "The dynamics of the term structure of interest rates: An Independent Component Analysis". Connectionist Approaches in Economics and Management Sciences. Advances in Computational Management Science. Vol. 6. pp. 215–232. doi:10.1007/978-1-4757-3722-6_11. ISBN 978-1-4757-3722-6.

References edit

  • Comon, Pierre (1994): "Independent Component Analysis: a new concept?", Signal Processing, 36(3):287–314 (The original paper describing the concept of ICA)
  • Hyvärinen, A.; Karhunen, J.; Oja, E. (2001): Independent Component Analysis, New York: Wiley, ISBN 978-0-471-40540-5 ( Introductory chapter )
  • Hyvärinen, A.; Oja, E. (2000): "Independent Component Analysis: Algorithms and Application", Neural Networks, 13(4-5):411-430. (Technical but pedagogical introduction).
  • Comon, P.; Jutten C., (2010): Handbook of Blind Source Separation, Independent Component Analysis and Applications. Academic Press, Oxford UK. ISBN 978-0-12-374726-6
  • Lee, T.-W. (1998): Independent component analysis: Theory and applications, Boston, Mass: Kluwer Academic Publishers, ISBN 0-7923-8261-7
  • Acharyya, Ranjan (2008): A New Approach for Blind Source Separation of Convolutive Sources - Wavelet Based Separation Using Shrinkage Function ISBN 3-639-07797-0 ISBN 978-3639077971 (this book focuses on unsupervised learning with Blind Source Separation)

External links edit

  • What is independent component analysis? by Aapo Hyvärinen
  • Independent Component Analysis: A Tutorial by Aapo Hyvärinen
  • A Tutorial on Independent Component Analysis
  • FastICA as a package for Matlab, in R language, C++
  • for Matlab, developed at RIKEN
  • High Performance Signal Analysis Toolkit provides C++ implementations of FastICA and Infomax
  • Matlab tools for ICA with Bell-Sejnowski, Molgedey-Schuster and mean field ICA. Developed at DTU.
  • Demonstration of the cocktail party problem
  • EEGLAB Toolbox ICA of EEG for Matlab, developed at UCSD.
  • FMRLAB Toolbox ICA of fMRI for Matlab, developed at UCSD
  • MELODIC, part of the FMRIB Software Library.
  • Discussion of ICA used in a biomedical shape-representation context
  • FastICA, CuBICA, JADE and TDSEP algorithm for Python and more...
  • Group ICA Toolbox and Fusion ICA Toolbox
  • Tutorial: Using ICA for cleaning EEG signals

independent, component, analysis, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspapers, books, scholar, . This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Independent component analysis news newspapers books scholar JSTOR October 2011 Learn how and when to remove this template message In signal processing independent component analysis ICA is a computational method for separating a multivariate signal into additive subcomponents This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other 1 ICA is a special case of blind source separation A common example application is the cocktail party problem of listening in on one person s speech in a noisy room 2 Contents 1 Introduction 2 Defining component independence 3 Mathematical definitions 3 1 General definition 3 2 Generative model 3 2 1 Linear noiseless ICA 3 2 2 Linear noisy ICA 3 2 3 Nonlinear ICA 3 3 Identifiability 4 Binary ICA 5 Methods for blind source separation 5 1 Projection pursuit 5 2 Based on infomax 5 3 Based on maximum likelihood estimation 6 History and background 7 Applications 8 Availability 9 See also 10 Notes 11 References 12 External linksIntroduction edit source source source source source source ICA on four randomly mixed videos 3 Top row The original source videos Middle row Four random mixtures used as input to the algorithm Bottom row The reconstructed videos Independent component analysis attempts to decompose a multivariate signal into independent non Gaussian signals As an example sound is usually a signal that is composed of the numerical addition at each time t of signals from several sources The question then is whether it is possible to separate these contributing sources from the observed total signal When the statistical independence assumption is correct blind ICA separation of a mixed signal gives very good results citation needed It is also used for signals that are not supposed to be generated by mixing for analysis purposes A simple application of ICA is the cocktail party problem where the underlying speech signals are separated from a sample data consisting of people talking simultaneously in a room Usually the problem is simplified by assuming no time delays or echoes Note that a filtered and delayed signal is a copy of a dependent component and thus the statistical independence assumption is not violated Mixing weights for constructing the M textstyle M nbsp observed signals from the N textstyle N nbsp components can be placed in an M N textstyle M times N nbsp matrix An important thing to consider is that if N textstyle N nbsp sources are present at least N textstyle N nbsp observations e g microphones if the observed signal is audio are needed to recover the original signals When there are an equal number of observations and source signals the mixing matrix is square M N textstyle M N nbsp Other cases of underdetermined M lt N textstyle M lt N nbsp and overdetermined M gt N textstyle M gt N nbsp have been investigated The success of ICA separation of mixed signals relies on two assumptions and three effects of mixing source signals Two assumptions The source signals are independent of each other The values in each source signal have non Gaussian distributions Three effects of mixing source signals Independence As per assumption 1 the source signals are independent however their signal mixtures are not This is because the signal mixtures share the same source signals Normality According to the Central Limit Theorem the distribution of a sum of independent random variables with finite variance tends towards a Gaussian distribution Loosely speaking a sum of two independent random variables usually has a distribution that is closer to Gaussian than any of the two original variables Here we consider the value of each signal as the random variable Complexity The temporal complexity of any signal mixture is greater than that of its simplest constituent source signal Those principles contribute to the basic establishment of ICA If the signals extracted from a set of mixtures are independent and have non Gaussian distributions or have low complexity then they must be source signals 4 5 Defining component independence editICA finds the independent components also called factors latent variables or sources by maximizing the statistical independence of the estimated components We may choose one of many ways to define a proxy for independence and this choice governs the form of the ICA algorithm The two broadest definitions of independence for ICA are Minimization of mutual information Maximization of non GaussianityThe Minimization of Mutual information MMI family of ICA algorithms uses measures like Kullback Leibler Divergence and maximum entropy The non Gaussianity family of ICA algorithms motivated by the central limit theorem uses kurtosis and negentropy Typical algorithms for ICA use centering subtract the mean to create a zero mean signal whitening usually with the eigenvalue decomposition and dimensionality reduction as preprocessing steps in order to simplify and reduce the complexity of the problem for the actual iterative algorithm Whitening and dimension reduction can be achieved with principal component analysis or singular value decomposition Whitening ensures that all dimensions are treated equally a priori before the algorithm is run Well known algorithms for ICA include infomax FastICA JADE and kernel independent component analysis among others In general ICA cannot identify the actual number of source signals a uniquely correct ordering of the source signals nor the proper scaling including sign of the source signals ICA is important to blind signal separation and has many practical applications It is closely related to or even a special case of the search for a factorial code of the data i e a new vector valued representation of each data vector such that it gets uniquely encoded by the resulting code vector loss free coding but the code components are statistically independent Mathematical definitions editLinear independent component analysis can be divided into noiseless and noisy cases where noiseless ICA is a special case of noisy ICA Nonlinear ICA should be considered as a separate case General definition edit The data are represented by the observed random vector x x 1 x m T displaystyle boldsymbol x x 1 ldots x m T nbsp and the hidden components as the random vector s s 1 s n T displaystyle boldsymbol s s 1 ldots s n T nbsp The task is to transform the observed data x displaystyle boldsymbol x nbsp using a linear static transformation W displaystyle boldsymbol W nbsp as s W x displaystyle boldsymbol s boldsymbol W boldsymbol x nbsp into a vector of maximally independent components s displaystyle boldsymbol s nbsp measured by some function F s 1 s n displaystyle F s 1 ldots s n nbsp of independence Generative model edit Linear noiseless ICA edit The components x i displaystyle x i nbsp of the observed random vector x x 1 x m T displaystyle boldsymbol x x 1 ldots x m T nbsp are generated as a sum of the independent components s k displaystyle s k nbsp k 1 n displaystyle k 1 ldots n nbsp x i a i 1 s 1 a i k s k a i n s n displaystyle x i a i 1 s 1 cdots a i k s k cdots a i n s n nbsp weighted by the mixing weights a i k displaystyle a i k nbsp The same generative model can be written in vector form as x k 1 n s k a k displaystyle boldsymbol x sum k 1 n s k boldsymbol a k nbsp where the observed random vector x displaystyle boldsymbol x nbsp is represented by the basis vectors a k a 1 k a m k T displaystyle boldsymbol a k boldsymbol a 1 k ldots boldsymbol a m k T nbsp The basis vectors a k displaystyle boldsymbol a k nbsp form the columns of the mixing matrix A a 1 a n displaystyle boldsymbol A boldsymbol a 1 ldots boldsymbol a n nbsp and the generative formula can be written as x A s displaystyle boldsymbol x boldsymbol A boldsymbol s nbsp where s s 1 s n T displaystyle boldsymbol s s 1 ldots s n T nbsp Given the model and realizations samples x 1 x N displaystyle boldsymbol x 1 ldots boldsymbol x N nbsp of the random vector x displaystyle boldsymbol x nbsp the task is to estimate both the mixing matrix A displaystyle boldsymbol A nbsp and the sources s displaystyle boldsymbol s nbsp This is done by adaptively calculating the w displaystyle boldsymbol w nbsp vectors and setting up a cost function which either maximizes the non gaussianity of the calculated s k w T x displaystyle s k boldsymbol w T boldsymbol x nbsp or minimizes the mutual information In some cases a priori knowledge of the probability distributions of the sources can be used in the cost function The original sources s displaystyle boldsymbol s nbsp can be recovered by multiplying the observed signals x displaystyle boldsymbol x nbsp with the inverse of the mixing matrix W A 1 displaystyle boldsymbol W boldsymbol A 1 nbsp also known as the unmixing matrix Here it is assumed that the mixing matrix is square n m displaystyle n m nbsp If the number of basis vectors is greater than the dimensionality of the observed vectors n gt m displaystyle n gt m nbsp the task is overcomplete but is still solvable with the pseudo inverse Linear noisy ICA edit With the added assumption of zero mean and uncorrelated Gaussian noise n N 0 diag S displaystyle n sim N 0 operatorname diag Sigma nbsp the ICA model takes the form x A s n displaystyle boldsymbol x boldsymbol A boldsymbol s n nbsp Nonlinear ICA edit The mixing of the sources does not need to be linear Using a nonlinear mixing function f 8 displaystyle f cdot theta nbsp with parameters 8 displaystyle theta nbsp the nonlinear ICA model is x f s 8 n displaystyle x f s theta n nbsp Identifiability edit The independent components are identifiable up to a permutation and scaling of the sources 6 This identifiability requires that At most one of the sources s k displaystyle s k nbsp is Gaussian The number of observed mixtures m displaystyle m nbsp must be at least as large as the number of estimated components n displaystyle n nbsp m n displaystyle m geq n nbsp It is equivalent to say that the mixing matrix A displaystyle boldsymbol A nbsp must be of full rank for its inverse to exist Binary ICA editA special variant of ICA is binary ICA in which both signal sources and monitors are in binary form and observations from monitors are disjunctive mixtures of binary independent sources The problem was shown to have applications in many domains including medical diagnosis multi cluster assignment network tomography and internet resource management Let x 1 x 2 x m displaystyle x 1 x 2 ldots x m nbsp be the set of binary variables from m displaystyle m nbsp monitors and y 1 y 2 y n displaystyle y 1 y 2 ldots y n nbsp be the set of binary variables from n displaystyle n nbsp sources Source monitor connections are represented by the unknown mixing matrix G textstyle boldsymbol G nbsp where g i j 1 displaystyle g ij 1 nbsp indicates that signal from the i th source can be observed by the j th monitor The system works as follows at any time if a source i displaystyle i nbsp is active y i 1 displaystyle y i 1 nbsp and it is connected to the monitor j displaystyle j nbsp g i j 1 displaystyle g ij 1 nbsp then the monitor j displaystyle j nbsp will observe some activity x j 1 displaystyle x j 1 nbsp Formally we have x i j 1 n g i j y j i 1 2 m displaystyle x i bigvee j 1 n g ij wedge y j i 1 2 ldots m nbsp where displaystyle wedge nbsp is Boolean AND and displaystyle vee nbsp is Boolean OR Noise is not explicitly modelled rather can be treated as independent sources The above problem can be heuristically solved 7 by assuming variables are continuous and running FastICA on binary observation data to get the mixing matrix G textstyle boldsymbol G nbsp real values then apply round number techniques on G textstyle boldsymbol G nbsp to obtain the binary values This approach has been shown to produce a highly inaccurate result citation needed Another method is to use dynamic programming recursively breaking the observation matrix X textstyle boldsymbol X nbsp into its sub matrices and run the inference algorithm on these sub matrices The key observation which leads to this algorithm is the sub matrix X 0 textstyle boldsymbol X 0 nbsp of X textstyle boldsymbol X nbsp where x i j 0 j textstyle x ij 0 forall j nbsp corresponds to the unbiased observation matrix of hidden components that do not have connection to the i displaystyle i nbsp th monitor Experimental results from 8 show that this approach is accurate under moderate noise levels The Generalized Binary ICA framework 9 introduces a broader problem formulation which does not necessitate any knowledge on the generative model In other words this method attempts to decompose a source into its independent components as much as possible and without losing any information with no prior assumption on the way it was generated Although this problem appears quite complex it can be accurately solved with a branch and bound search tree algorithm or tightly upper bounded with a single multiplication of a matrix with a vector Methods for blind source separation editProjection pursuit edit Signal mixtures tend to have Gaussian probability density functions and source signals tend to have non Gaussian probability density functions Each source signal can be extracted from a set of signal mixtures by taking the inner product of a weight vector and those signal mixtures where this inner product provides an orthogonal projection of the signal mixtures The remaining challenge is finding such a weight vector One type of method for doing so is projection pursuit 10 11 Projection pursuit seeks one projection at a time such that the extracted signal is as non Gaussian as possible This contrasts with ICA which typically extracts M signals simultaneously from M signal mixtures which requires estimating a M M unmixing matrix One practical advantage of projection pursuit over ICA is that fewer than M signals can be extracted if required where each source signal is extracted from M signal mixtures using an M element weight vector We can use kurtosis to recover the multiple source signal by finding the correct weight vectors with the use of projection pursuit The kurtosis of the probability density function of a signal for a finite sample is computed as K E y y 4 E y y 2 2 3 displaystyle K frac operatorname E mathbf y mathbf overline y 4 operatorname E mathbf y mathbf overline y 2 2 3 nbsp where y displaystyle mathbf overline y nbsp is the sample mean of y displaystyle mathbf y nbsp the extracted signals The constant 3 ensures that Gaussian signals have zero kurtosis Super Gaussian signals have positive kurtosis and Sub Gaussian signals have negative kurtosis The denominator is the variance of y displaystyle mathbf y nbsp and ensures that the measured kurtosis takes account of signal variance The goal of projection pursuit is to maximize the kurtosis and make the extracted signal as non normal as possible Using kurtosis as a measure of non normality we can now examine how the kurtosis of a signal y w T x displaystyle mathbf y mathbf w T mathbf x nbsp extracted from a set of M mixtures x x 1 x 2 x M T displaystyle mathbf x x 1 x 2 ldots x M T nbsp varies as the weight vector w displaystyle mathbf w nbsp is rotated around the origin Given our assumption that each source signal s displaystyle mathbf s nbsp is super gaussian we would expect the kurtosis of the extracted signal y displaystyle mathbf y nbsp to be maximal precisely when y s displaystyle mathbf y mathbf s nbsp the kurtosis of the extracted signal y displaystyle mathbf y nbsp to be maximal when w displaystyle mathbf w nbsp is orthogonal to the projected axes S 1 displaystyle S 1 nbsp or S 2 displaystyle S 2 nbsp because we know the optimal weight vector should be orthogonal to a transformed axis S 1 displaystyle S 1 nbsp or S 2 displaystyle S 2 nbsp For multiple source mixture signals we can use kurtosis and Gram Schmidt Orthogonalization GSO to recover the signals Given M signal mixtures in an M dimensional space GSO project these data points onto an M 1 dimensional space by using the weight vector We can guarantee the independence of the extracted signals with the use of GSO In order to find the correct value of w displaystyle mathbf w nbsp we can use gradient descent method We first of all whiten the data and transform x displaystyle mathbf x nbsp into a new mixture z displaystyle mathbf z nbsp which has unit variance and z z 1 z 2 z M T displaystyle mathbf z z 1 z 2 ldots z M T nbsp This process can be achieved by applying Singular value decomposition to x displaystyle mathbf x nbsp x U D V T displaystyle mathbf x mathbf U mathbf D mathbf V T nbsp Rescaling each vector U i U i E U i 2 displaystyle U i U i operatorname E U i 2 nbsp and let z U displaystyle mathbf z mathbf U nbsp The signal extracted by a weighted vector w displaystyle mathbf w nbsp is y w T z displaystyle mathbf y mathbf w T mathbf z nbsp If the weight vector w has unit length then the variance of y is also 1 that is E w T z 2 1 displaystyle operatorname E mathbf w T mathbf z 2 1 nbsp The kurtosis can thus be written as K E y 4 E y 2 2 3 E w T z 4 3 displaystyle K frac operatorname E mathbf y 4 operatorname E mathbf y 2 2 3 operatorname E mathbf w T mathbf z 4 3 nbsp The updating process for w displaystyle mathbf w nbsp is w n e w w o l d h E z w o l d T z 3 displaystyle mathbf w new mathbf w old eta operatorname E mathbf z mathbf w old T mathbf z 3 nbsp where h displaystyle eta nbsp is a small constant to guarantee that w displaystyle mathbf w nbsp converges to the optimal solution After each update we normalize w n e w w n e w w n e w displaystyle mathbf w new frac mathbf w new mathbf w new nbsp and set w o l d w n e w displaystyle mathbf w old mathbf w new nbsp and repeat the updating process until convergence We can also use another algorithm to update the weight vector w displaystyle mathbf w nbsp Another approach is using negentropy 12 13 instead of kurtosis Using negentropy is a more robust method than kurtosis as kurtosis is very sensitive to outliers The negentropy methods are based on an important property of Gaussian distribution a Gaussian variable has the largest entropy among all continuous random variables of equal variance This is also the reason why we want to find the most nongaussian variables A simple proof can be found in Differential entropy J x S y S x displaystyle J x S y S x nbsp y is a Gaussian random variable of the same covariance matrix as x S x p x u log p x u d u displaystyle S x int p x u log p x u du nbsp An approximation for negentropy is J x 1 12 E x 3 2 1 48 k u r t x 2 displaystyle J x frac 1 12 E x 3 2 frac 1 48 kurt x 2 nbsp A proof can be found in the original papers of Comon 14 12 it has been reproduced in the book Independent Component Analysis by Aapo Hyvarinen Juha Karhunen and Erkki Oja 15 This approximation also suffers from the same problem as kurtosis sensitivity to outliers Other approaches have been developed 16 J y k 1 E G 1 y 2 k 2 E G 2 y E G 2 v 2 displaystyle J y k 1 E G 1 y 2 k 2 E G 2 y E G 2 v 2 nbsp A choice of G 1 displaystyle G 1 nbsp and G 2 displaystyle G 2 nbsp are G 1 1 a 1 log cosh a 1 u displaystyle G 1 frac 1 a 1 log cosh a 1 u nbsp and G 2 exp u 2 2 displaystyle G 2 exp frac u 2 2 nbsp Based on infomax edit Infomax ICA 17 is essentially a multivariate parallel version of projection pursuit Whereas projection pursuit extracts a series of signals one at a time from a set of M signal mixtures ICA extracts M signals in parallel This tends to make ICA more robust than projection pursuit 18 The projection pursuit method uses Gram Schmidt orthogonalization to ensure the independence of the extracted signal while ICA use infomax and maximum likelihood estimate to ensure the independence of the extracted signal The Non Normality of the extracted signal is achieved by assigning an appropriate model or prior for the signal The process of ICA based on infomax in short is given a set of signal mixtures x displaystyle mathbf x nbsp and a set of identical independent model cumulative distribution functions cdfs g displaystyle g nbsp we seek the unmixing matrix W displaystyle mathbf W nbsp which maximizes the joint entropy of the signals Y g y displaystyle mathbf Y g mathbf y nbsp where y W x displaystyle mathbf y mathbf Wx nbsp are the signals extracted by W displaystyle mathbf W nbsp Given the optimal W displaystyle mathbf W nbsp the signals Y displaystyle mathbf Y nbsp have maximum entropy and are therefore independent which ensures that the extracted signals y g 1 Y displaystyle mathbf y g 1 mathbf Y nbsp are also independent g displaystyle g nbsp is an invertible function and is the signal model Note that if the source signal model probability density function p s displaystyle p s nbsp matches the probability density function of the extracted signal p y displaystyle p mathbf y nbsp then maximizing the joint entropy of Y displaystyle Y nbsp also maximizes the amount of mutual information between x displaystyle mathbf x nbsp and Y displaystyle mathbf Y nbsp For this reason using entropy to extract independent signals is known as infomax Consider the entropy of the vector variable Y g y displaystyle mathbf Y g mathbf y nbsp where y W x displaystyle mathbf y mathbf Wx nbsp is the set of signals extracted by the unmixing matrix W displaystyle mathbf W nbsp For a finite set of values sampled from a distribution with pdf p y displaystyle p mathbf y nbsp the entropy of Y displaystyle mathbf Y nbsp can be estimated as H Y 1 N t 1 N ln p Y Y t displaystyle H mathbf Y frac 1 N sum t 1 N ln p mathbf Y mathbf Y t nbsp The joint pdf p Y displaystyle p mathbf Y nbsp can be shown to be related to the joint pdf p y displaystyle p mathbf y nbsp of the extracted signals by the multivariate form p Y Y p y y Y y displaystyle p mathbf Y Y frac p mathbf y mathbf y frac partial mathbf Y partial mathbf y nbsp where J Y y displaystyle mathbf J frac partial mathbf Y partial mathbf y nbsp is the Jacobian matrix We have J g y displaystyle mathbf J g mathbf y nbsp and g displaystyle g nbsp is the pdf assumed for source signals g p s displaystyle g p s nbsp therefore p Y Y p y y Y y p y y p s y displaystyle p mathbf Y Y frac p mathbf y mathbf y frac partial mathbf Y partial mathbf y frac p mathbf y mathbf y p mathbf s mathbf y nbsp therefore H Y 1 N t 1 N ln p y y p s y displaystyle H mathbf Y frac 1 N sum t 1 N ln frac p mathbf y mathbf y p mathbf s mathbf y nbsp We know that when p y p s displaystyle p mathbf y p s nbsp p Y displaystyle p mathbf Y nbsp is of uniform distribution and H Y displaystyle H mathbf Y nbsp is maximized Since p y y p x x y x p x x W displaystyle p mathbf y mathbf y frac p mathbf x mathbf x frac partial mathbf y partial mathbf x frac p mathbf x mathbf x mathbf W nbsp where W displaystyle mathbf W nbsp is the absolute value of the determinant of the unmixing matrix W displaystyle mathbf W nbsp Therefore H Y 1 N t 1 N ln p x x t W p s y t displaystyle H mathbf Y frac 1 N sum t 1 N ln frac p mathbf x mathbf x t mathbf W p mathbf s mathbf y t nbsp so H Y 1 N t 1 N ln p s y t ln W H x displaystyle H mathbf Y frac 1 N sum t 1 N ln p mathbf s mathbf y t ln mathbf W H mathbf x nbsp since H x 1 N t 1 N ln p x x t displaystyle H mathbf x frac 1 N sum t 1 N ln p mathbf x mathbf x t nbsp and maximizing W displaystyle mathbf W nbsp does not affect H x displaystyle H mathbf x nbsp so we can maximize the function h Y 1 N t 1 N ln p s y t ln W displaystyle h mathbf Y frac 1 N sum t 1 N ln p mathbf s mathbf y t ln mathbf W nbsp to achieve the independence of extracted signal If there are M marginal pdfs of the model joint pdf p s displaystyle p mathbf s nbsp are independent and use the commonly super gaussian model pdf for the source signals p s 1 tanh s 2 displaystyle p mathbf s 1 tanh mathbf s 2 nbsp then we have h Y 1 N i 1 M t 1 N ln 1 tanh w i T x t 2 ln W displaystyle h mathbf Y frac 1 N sum i 1 M sum t 1 N ln 1 tanh mathbf w i T x t 2 ln mathbf W nbsp In the sum given an observed signal mixture x displaystyle mathbf x nbsp the corresponding set of extracted signals y displaystyle mathbf y nbsp and source signal model p s g displaystyle p mathbf s g nbsp we can find the optimal unmixing matrix W displaystyle mathbf W nbsp and make the extracted signals independent and non gaussian Like the projection pursuit situation we can use gradient descent method to find the optimal solution of the unmixing matrix Based on maximum likelihood estimation edit Maximum likelihood estimation MLE is a standard statistical tool for finding parameter values e g the unmixing matrix W displaystyle mathbf W nbsp that provide the best fit of some data e g the extracted signals y displaystyle y nbsp to a given a model e g the assumed joint probability density function pdf p s displaystyle p s nbsp of source signals 18 The ML model includes a specification of a pdf which in this case is the pdf p s displaystyle p s nbsp of the unknown source signals s displaystyle s nbsp Using ML ICA the objective is to find an unmixing matrix that yields extracted signals y W x displaystyle y mathbf W x nbsp with a joint pdf as similar as possible to the joint pdf p s displaystyle p s nbsp of the unknown source signals s displaystyle s nbsp MLE is thus based on the assumption that if the model pdf p s displaystyle p s nbsp and the model parameters A displaystyle mathbf A nbsp are correct then a high probability should be obtained for the data x displaystyle x nbsp that were actually observed Conversely if A displaystyle mathbf A nbsp is far from the correct parameter values then a low probability of the observed data would be expected Using MLE we call the probability of the observed data for a given set of model parameter values e g a pdf p s displaystyle p s nbsp and a matrix A displaystyle mathbf A nbsp the likelihood of the model parameter values given the observed data We define a likelihood function L W displaystyle mathbf L W nbsp of W displaystyle mathbf W nbsp L W p s W x det W displaystyle mathbf L W p s mathbf W x det mathbf W nbsp This equals to the probability density at x displaystyle x nbsp since s W x displaystyle s mathbf W x nbsp Thus if we wish to find a W displaystyle mathbf W nbsp that is most likely to have generated the observed mixtures x displaystyle x nbsp from the unknown source signals s displaystyle s nbsp with pdf p s displaystyle p s nbsp then we need only find that W displaystyle mathbf W nbsp which maximizes the likelihood L W displaystyle mathbf L W nbsp The unmixing matrix that maximizes equation is known as the MLE of the optimal unmixing matrix It is common practice to use the log likelihood because this is easier to evaluate As the logarithm is a monotonic function the W displaystyle mathbf W nbsp that maximizes the function L W displaystyle mathbf L W nbsp also maximizes its logarithm ln L W displaystyle ln mathbf L W nbsp This allows us to take the logarithm of equation above which yields the log likelihood functionln L W i t ln p s w i T x t N ln det W displaystyle ln mathbf L W sum i sum t ln p s w i T x t N ln det mathbf W nbsp If we substitute a commonly used high Kurtosis model pdf for the source signals p s 1 tanh s 2 displaystyle p s 1 tanh s 2 nbsp then we haveln L W 1 N i M t N ln 1 tanh w i T x t 2 ln det W displaystyle ln mathbf L W 1 over N sum i M sum t N ln 1 tanh w i T x t 2 ln det mathbf W nbsp This matrix W displaystyle mathbf W nbsp that maximizes this function is the maximum likelihood estimation History and background editThe early general framework for independent component analysis was introduced by Jeanny Herault and Bernard Ans from 1984 19 further developed by Christian Jutten in 1985 and 1986 20 21 22 and refined by Pierre Comon in 1991 14 and popularized in his paper of 1994 12 In 1995 Tony Bell and Terry Sejnowski introduced a fast and efficient ICA algorithm based on infomax a principle introduced by Ralph Linsker in 1987 There are many algorithms available in the literature which do ICA A largely used one including in industrial applications is the FastICA algorithm developed by Hyvarinen and Oja which uses the negentropy as cost function 23 Other examples are rather related to blind source separation where a more general approach is used For example one can drop the independence assumption and separate mutually correlated signals thus statistically dependent signals Sepp Hochreiter and Jurgen Schmidhuber showed how to obtain non linear ICA or source separation as a by product of regularization 1999 24 Their method does not require a priori knowledge about the number of independent sources Applications editICA can be extended to analyze non physical signals For instance ICA has been applied to discover discussion topics on a bag of news list archives Some ICA applications are listed below 4 nbsp Independent component analysis in EEGLABoptical Imaging of neurons 25 neuronal spike sorting 26 face recognition 27 modelling receptive fields of primary visual neurons 28 predicting stock market prices 29 mobile phone communications 30 colour based detection of the ripeness of tomatoes 31 removing artifacts such as eye blinks from EEG data 32 predicting decision making using EEG 33 analysis of changes in gene expression over time in single cell RNA sequencing experiments 34 studies of the resting state network of the brain 35 astronomy and cosmology 36 finance 37 Availability editICA can be applied through the following software SAS PROC ICA R ICA package scikit learn Python implementation sklearn decomposition FastICASee also edit nbsp Mathematics portalBlind deconvolution Factor analysis Hilbert spectrum Image processing Non negative matrix factorization NMF Nonlinear dimensionality reduction Projection pursuit Varimax rotationNotes edit Independent Component Analysis A Demo Hyvarinen Aapo 2013 Independent component analysis recent advances Philosophical Transactions Mathematical Physical and Engineering Sciences 371 1984 20110534 Bibcode 2012RSPTA 37110534H doi 10 1098 rsta 2011 0534 ISSN 1364 503X JSTOR 41739975 PMC 3538438 PMID 23277597 Isomura Takuya Toyoizumi Taro 2016 A local learning rule for independent component analysis Scientific Reports 6 28073 Bibcode 2016NatSR 628073I doi 10 1038 srep28073 PMC 4914970 PMID 27323661 a b Stone James V 2004 Independent component analysis a tutorial introduction Cambridge Massachusetts MIT Press ISBN 978 0 262 69315 8 Hyvarinen Aapo Karhunen Juha Oja Erkki 2001 Independent component analysis 1st ed New York John Wiley amp Sons ISBN 978 0 471 22131 9 Theorem 11 Comon Pierre Independent component analysis a new concept Signal processing 36 3 1994 287 314 Johan Himbergand Aapo Hyvarinen Independent Component Analysis For Binary Data An Experimental Study Proc Int Workshop on Independent Component Analysis and Blind Signal Separation ICA2001 San Diego California 2001 Huy Nguyen and Rong Zheng Binary Independent Component Analysis With or Mixtures IEEE Transactions on Signal Processing Vol 59 Issue 7 July 2011 pp 3168 3181 Painsky Amichai Rosset Saharon Feder Meir 2014 Generalized binary independent component analysis 2014 IEEE International Symposium on Information Theory pp 1326 1330 doi 10 1109 ISIT 2014 6875048 ISBN 978 1 4799 5186 4 S2CID 18579555 James V Stone 2004 Independent Component Analysis A Tutorial Introduction The MIT Press Cambridge Massachusetts London England ISBN 0 262 69315 1 Kruskal JB 1969 Toward a practical method which helps uncover the structure of a set of observations by finding the line transformation which optimizes a new index of condensation Pages 427 440 of Milton RC amp Nelder JA eds Statistical computation New York Academic Press a b c Pierre Comon 1994 Independent component analysis a new concept http www ece ucsb edu wcsl courses ECE594 594C F10Madhow comon94 pdf Hyvarinen Aapo Erkki Oja 2000 Independent Component Analysis Algorithms and Applications Neural Networks 4 5 13 4 5 411 430 CiteSeerX 10 1 1 79 7003 doi 10 1016 s0893 6080 00 00026 5 PMID 10946390 S2CID 11959218 a b P Comon Independent Component Analysis Workshop on Higher Order Statistics July 1991 republished in J L Lacoume editor Higher Order Statistics pp 29 38 Elsevier Amsterdam London 1992 HAL link Hyvarinen Aapo Karhunen Juha Oja Erkki 2001 Independent component analysis Reprint ed New York NY Wiley ISBN 978 0 471 40540 5 Hyvarinen Aapo 1998 New approximations of differential entropy for independent component analysis and projection pursuit Advances in Neural Information Processing Systems 10 273 279 Bell A J Sejnowski T J 1995 An Information Maximization Approach to Blind Separation and Blind Deconvolution Neural Computation 7 1129 1159 a b James V Stone 2004 Independent Component Analysis A Tutorial Introduction The MIT Press Cambridge Massachusetts London England ISBN 0 262 69315 1 Herault J Ans B 1984 Reseau de neurones a synapses modifiables Decodage de messages sensoriels composites par apprentissage non supervise et permanent Comptes Rendus de l Academie des Sciences Serie III 299 525 528 Ans B Herault J amp Jutten C 1985 Architectures neuromimetiques adaptatives Detection de primitives Cognitiva 85 Vol 2 pp 593 597 Paris CESTA Herault J Jutten C amp Ans B 1985 Detection de grandeurs primitives dans un message composite par une architecture de calcul neuromimetique en apprentissage non supervise Proceedings of the 10th Workshop Traitement du signal et ses applications Vol 2 pp 1017 1022 Nice France GRETSI Herault J amp Jutten C 1986 Space or time adaptive signal processing by neural networks models Intern Conf on Neural Networks for Computing pp 206 211 Snowbird Utah USA Hyvarinen A Oja E 2000 06 01 Independent component analysis algorithms and applications Neural Networks 13 4 411 430 doi 10 1016 S0893 6080 00 00026 5 ISSN 0893 6080 PMID 10946390 S2CID 11959218 Hochreiter Sepp Schmidhuber Jurgen 1999 Feature Extraction Through LOCOCODE PDF Neural Computation 11 3 679 714 doi 10 1162 089976699300016629 ISSN 0899 7667 PMID 10085426 S2CID 1642107 Retrieved 24 February 2018 Brown GD Yamada S Sejnowski TJ 2001 Independent components analysis at the neural cocktail party Trends in Neurosciences 24 1 54 63 doi 10 1016 s0166 2236 00 01683 0 PMID 11163888 S2CID 511254 Lewicki MS 1998 Areview of methods for spike sorting detection and classification of neural action potentials Network Computation in Neural Systems 9 4 53 78 doi 10 1088 0954 898X 9 4 001 S2CID 10290908 Barlett MS 2001 Face image analysis by unsupervised learning Boston Kluwer International Series on Engineering and Computer Science Bell AJ Sejnowski TJ 1997 The independent components of natural scenes are edge filters Vision Research 37 23 3327 3338 doi 10 1016 s0042 6989 97 00121 1 PMC 2882863 PMID 9425547 Back AD Weigend AS 1997 A first application of independent component analysis to extracting structure from stock returns International Journal of Neural Systems 8 4 473 484 doi 10 1142 s0129065797000458 PMID 9730022 S2CID 872703 Hyvarinen A Karhunen J amp Oja E 2001a Independent component analysis New York John Wiley and Sons a href Template Cite book html title Template Cite book cite book a CS1 maint multiple names authors list link Polder G van der Heijen FWAM 2003 Estimation of compound distribution in spectral images of tomatoes using independent component analysis Austrian Computer Society 57 64 Delorme A Sejnowski T Makeig S 2007 Enhanced detection of artifacts in EEG data using higher order statistics and independent component analysis NeuroImage 34 4 1443 1449 doi 10 1016 j neuroimage 2006 11 004 PMC 2895624 PMID 17188898 Douglas P 2013 Single trial decoding of belief decision making from EEG and fMRI data using independent components features Frontiers in Human Neuroscience 7 392 doi 10 3389 fnhum 2013 00392 PMC 3728485 PMID 23914164 Trapnell C Cacchiarelli D Grimsby J 2014 The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells Nature Biotechnology 32 4 381 386 doi 10 1038 nbt 2859 PMC 4122333 PMID 24658644 Kiviniemi Vesa J Kantola Juha Heikki Jauhiainen Jukka Hyvarinen Aapo Tervonen Osmo 2003 Independent component analysis of nondeterministic fMRI signal sources NeuroImage 19 2 253 260 doi 10 1016 S1053 8119 03 00097 1 PMID 12814576 S2CID 17110486 Wang Jingying Xu Haiguang Gu Junhua An Tao Cui Haijuan Li Jianxun Zhang Zhongli Zheng Qian Wu Xiang Ping 2010 11 01 How to Identify and Separate Bright Galaxy Clusters from the Low frequency Radio Sky The Astrophysical Journal 723 1 620 633 arXiv 1008 3391 Bibcode 2010ApJ 723 620W doi 10 1088 0004 637X 723 1 620 ISSN 0004 637X Moraux Franck Villa Christophe 2003 The dynamics of the term structure of interest rates An Independent Component Analysis Connectionist Approaches in Economics and Management Sciences Advances in Computational Management Science Vol 6 pp 215 232 doi 10 1007 978 1 4757 3722 6 11 ISBN 978 1 4757 3722 6 References editComon Pierre 1994 Independent Component Analysis a new concept Signal Processing 36 3 287 314 The original paper describing the concept of ICA Hyvarinen A Karhunen J Oja E 2001 Independent Component Analysis New York Wiley ISBN 978 0 471 40540 5 Introductory chapter Hyvarinen A Oja E 2000 Independent Component Analysis Algorithms and Application Neural Networks 13 4 5 411 430 Technical but pedagogical introduction Comon P Jutten C 2010 Handbook of Blind Source Separation Independent Component Analysis and Applications Academic Press Oxford UK ISBN 978 0 12 374726 6 Lee T W 1998 Independent component analysis Theory and applications Boston Mass Kluwer Academic Publishers ISBN 0 7923 8261 7 Acharyya Ranjan 2008 A New Approach for Blind Source Separation of Convolutive Sources Wavelet Based Separation Using Shrinkage Function ISBN 3 639 07797 0 ISBN 978 3639077971 this book focuses on unsupervised learning with Blind Source Separation External links editWhat is independent component analysis by Aapo Hyvarinen Independent Component Analysis A Tutorial by Aapo Hyvarinen A Tutorial on Independent Component Analysis FastICA as a package for Matlab in R language C ICALAB Toolboxes for Matlab developed at RIKEN High Performance Signal Analysis Toolkit provides C implementations of FastICA and Infomax ICA toolbox Matlab tools for ICA with Bell Sejnowski Molgedey Schuster and mean field ICA Developed at DTU Demonstration of the cocktail party problem EEGLAB Toolbox ICA of EEG for Matlab developed at UCSD FMRLAB Toolbox ICA of fMRI for Matlab developed at UCSD MELODIC part of the FMRIB Software Library Discussion of ICA used in a biomedical shape representation context FastICA CuBICA JADE and TDSEP algorithm for Python and more Group ICA Toolbox and Fusion ICA Toolbox Tutorial Using ICA for cleaning EEG signals Retrieved from https en wikipedia org w index php title Independent component analysis amp oldid 1185740793, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.