Math from Finland, yes, it's fast: FastICA

in #steemstem6 years ago (edited)

If you have missed the great series of posts devoted to Astrophotography (by @terrylovejoy) consider to read them even if you are unrelated to astrophotography.
The latest post was about the noise reduction in stacked images and this inspired me to write about the method for component analysis.

Here is the link to FastICA Toolbox for Matlab. If you want to use it for images, simply convert them to vectors. Keep in mind that the resolution can't be too high.

If you want to win, hire a Finn


This is a very true expression in motorsport, as the names like Vatanen, Salonen, Kankkunen, Mäkinen, Hakkinen became immortal.



Lancia Delta in Martini colors, from Wiki, CC-3.0. It has absolutely nothing common to this post, but it looks cool

Our "Math Finn" for today is Aapo Hyvärinen, the man behind the FastICA algorithm (ICA = Independent Component Analysis).

The aim of ICA


The algorithm came as the solution of the "cocktail party problem". Imagine that you are at the cocktail party where all the folks are chatting, laughing and there is some music playing from the speakers.

What you hear would be the summation of each individual source of the sound, multiplied by some scalar value that would be the equivalent of the "volume". Mathematically represented, this will be the system of linear equations, something like this: . x is the signal we are picking up, the sum of all individual sources (s) multiplied by their amplitudes (a).

If the s is known or a - it's the simplest possible thing to do, but in this case, we know nothing a priori.
This situation is known as the BSS, or the blind source separation.

And the tricky part is that a and s are directly connected. If you made the mistake and give the wrong estimates for the source signals, their relative contributions will be wrong and vice-versa.

But how is this even possible?


Well, the idea is to find how exactly are the different sources are statistically independent.

You have probably heard about the FB scandal and the "new discoveries" (not new at all...) that there are 5 types of people. Or you heard Jordan Peterson saying that it's "multivariate" phenomenon.

Those things are very similar, so similar that the starting equation is the same for those methods (principal component analysis or factor analysis) and ICA: . For non-math people, this is bold and it means we are speaking about the matrices (Tables). There will be the results of measurements (x), decomposed to "concentrations" (contributions, intensities...), A and source components (s).

The assumptions is that: sources (s) are statistically independent and that those components have non-Gaussian distribution

Why is this good? Because the noise is (mostly) Gaussian. Thus the ICA is good in observing something exceptional. And it's very robust for the analysis of the noise signals (in contrast to PCA/FA that want the F1 track smoothness of the data). Another difference is that ICA components are not the "first", "second"... but all of them are equally important.

However, it is not perfect for the sparse signals!.


Mika Hakkinen, from Wiki. PCA/FA require the perfectly smooth input data because those techniques are based on correlation, while the ICA can handle the rally-style dirt full of noise, because the assumption is that the components have non-Gaussian distribution

And there is yet another problem with ICA: because the unit variance is assumed to be 1, the sign could be either + and - because the s was squared.

If your components are strictly positive, consider using the NMF instead.

Defining the Independence


If two variables are uncorrelated - their covariance is zero:

Now you can easily understand why the components With Gaussian distribution are considered to be a noise for ICA.
Because it's symmetric, and no matter what you do, you can't transform such data into something independent.


Mika Hakkinen is making doughnuts. Check the paragraph above to see why is this such a great nerdy joke.

But wait, according to the Central Limit Theorem, mixed components will make some Gaussian-like distribution?

Right, and the (most likely) solution with the ICA logic is to choose the least Gaussian solution for the mixing matrix.

You can choose between the various options, but the measures inculde: kurtosis, negentropy, mutual information, maximum likelihood... The best way to do is to test with your own data.

Practical application:


Probably the "most classic" application of ICA is for the analysis of EEG. For FTIR it's so-so, not the perfect solution, but it works sometimes.

And it's solid for image analysis, link

References:


  • Easy to understand 30-page tutorial, link
  • And the colossal 500-page book, devoted to ICA, link
  • Personal page of Aapo Hyvärinen, with publications, link
  • there is a mistake, find it and win 1 SBD, (@lemouth 's style ;) )

Write some funny comments

Sort:  

Ooohhh.... I didn't find the mistake.... Except if it is connected to 2=0. Otherwise, things may get weirdly uncorrelated...

I liked the choice of illustrations btw :)

Yes covariance = 0, not 2, good spot @lemouth !

Ohhh.. I was not expecting this to be the stuff to find :D

Thank you for the shoutout @alexs1320 ! I had to go away and think about how this would be applied to image processi ng (for astronomical images of course) and there could be some interesting applications to explore. From what I read this is a technique of estimating the original contributions of multiple signals embedded in a single signal. I will definitely keep this and your links bookmarked!

(Probably) the best algorithm would be the SOBI-RO from the package ICALAB, by Andrzej Cichocki. That obscure method works with everything I've ever put as the input. It's especially good for sparse signals.

In your case, the workflow would be: each image --> image to vector (in Matlab) --> all the vectors packed into the input matrix --> SOBI-RO --> back to images (vector to matrix) --> and... see what is the signal and what is the noise :) --> discard the noise

From n images, n components could be extracted (both in ICA and SOBI), in contrast to PCA/FA where you need 2n+1 starting signals for only n components,

Congratulations @alexs1320! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the total payout received

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

Upvote this notification to help all Steemit users. Learn why here!

Coin Marketplace

STEEM 0.26
TRX 0.11
JST 0.033
BTC 62796.11
ETH 3045.55
USDT 1.00
SBD 3.85