As we known, t-sne focus on local relationship. In order to comparing two embedding result, we try to align those t-sne cluster into same place, which is intuitive to compare them.
The basic idea is adding a L2 distance term to align two t-sne embedding to together.
Here is the result:
Following is the original t-sne for two different embedding methods:
Following is the yoke t-sne for those two embedding result:
This is awesome! Did you run different iterations with different weights for the L2 loss or did you just pick one? What were the weights and how did they compare to the non-L2 loss? I worry about how much we can read into the similarity of the plots given that cranking that term up or down would make them more or less similar -- curious to hear your or Robert's take on why that's not a legit concern! 🙂
In this first experiment, I just pick a fixed weights, 10^(-8) for the L2 loss, and as Dr. Pless said in the reply, measuring the extra align error is important and which is next step. What's more, trying different weights to find a sweet spot between KL distance and L2 distance is also the next step.
I love pictures! These are amazing. I know you showed them to me in the lab the other day as well.
Here are the things that I see in the figure:
(1) Both independent t-SNE figures have the same cluster of 4 groups (blue+green+light-blue+black), in mostly the same configuration.
(2) that grouping stays together in the Yoked t-SNE, but is harder to notice because many of the groups are now closer together. I like that because it shows that we are keeping some of the local structure.
(3) The yoked t-sne is *really* well aligned.
#3 is interesting. Of course we want things to be well aligned, but at what cost? Writing this out, we have:
t-SNELoss1 --- the t-SNE loss just from embedding 1.
t-SNELoss2 --- the t-SNE loss just from embedding 2.
t-SNELossAligned --- the t-SNE loss when solving for two aligned t-SNEs.
Expanding the t-SNELossAligned loss function we get:
Loss = t-SNELoss1a + t-SNELoss2a + lambda * alignment-error
Where
t-SNELoss1a is the part of the overall loss that arises from embedding 1,
t-SNELoss2a is the part of the overall loss that arises from embedding 2, and
alignment error is a comparison of out t-SNE output for the two embeddings.
So: if lambda = 0, we would expect
t-SNELoss1a == t-SNELoss1
because then the overall optimization for t-SNELossAligned has no alignment term and the optimization should work on the two embeddings separately.
What I am most interested in is the case where lambda is small. Then:
t-SNELoss1a >= t-SNELoss1
because the aligned t-SNE is trying to do both (regular t-sne optimization and alignment) --- trying to make two things aligned only makes it hard to do the regular part of the t-sne optimization.
So i'm interested in seeing what happens when you try many values of lambda and computing the embedding for each lambda, and solving for:
error ratio1 = t-SNELoss1/t-SNELoss1a [this should be between 0 and 1]
and the alignment error.
if we plot (error ratio1, alignment error) on a scatter plot, where
alignment error is on the x-axis and
error ration1 is on the y-axis
then we might get to see if we can get very good alignment with only a little extra error, because those are the plots that we eventually want to see.