Skip to content

Yoke tsne by adding a L2 distance term in original loss function

As we known, t-sne focus on local relationship. In order to comparing two embedding result, we try to align those t-sne cluster into same place, which is intuitive to compare them.

The basic idea is adding a L2 distance term to align two t-sne embedding to together.

Here is the result:

Following is the original t-sne for two different embedding methods:

Following is the yoke t-sne for those two embedding result:

3 thoughts on “Yoke tsne by adding a L2 distance term in original loss function

  1. astylianou

    This is awesome! Did you run different iterations with different weights for the L2 loss or did you just pick one? What were the weights and how did they compare to the non-L2 loss? I worry about how much we can read into the similarity of the plots given that cranking that term up or down would make them more or less similar -- curious to hear your or Robert's take on why that's not a legit concern! 🙂

    Reply
    1. liuxiaotong2017

      In this first experiment, I just pick a fixed weights, 10^(-8) for the L2 loss, and as Dr. Pless said in the reply, measuring the extra align error is important and which is next step. What's more, trying different weights to find a sweet spot between KL distance and L2 distance is also the next step.

      Reply
  2. pless

    I love pictures! These are amazing. I know you showed them to me in the lab the other day as well.

    Here are the things that I see in the figure:

    (1) Both independent t-SNE figures have the same cluster of 4 groups (blue+green+light-blue+black), in mostly the same configuration.

    (2) that grouping stays together in the Yoked t-SNE, but is harder to notice because many of the groups are now closer together. I like that because it shows that we are keeping some of the local structure.

    (3) The yoked t-sne is *really* well aligned.

    #3 is interesting. Of course we want things to be well aligned, but at what cost? Writing this out, we have:

    t-SNELoss1 --- the t-SNE loss just from embedding 1.
    t-SNELoss2 --- the t-SNE loss just from embedding 2.

    t-SNELossAligned --- the t-SNE loss when solving for two aligned t-SNEs.

    Expanding the t-SNELossAligned loss function we get:

    Loss = t-SNELoss1a + t-SNELoss2a + lambda * alignment-error

    Where
    t-SNELoss1a is the part of the overall loss that arises from embedding 1,
    t-SNELoss2a is the part of the overall loss that arises from embedding 2, and
    alignment error is a comparison of out t-SNE output for the two embeddings.

    So: if lambda = 0, we would expect

    t-SNELoss1a == t-SNELoss1

    because then the overall optimization for t-SNELossAligned has no alignment term and the optimization should work on the two embeddings separately.

    What I am most interested in is the case where lambda is small. Then:

    t-SNELoss1a >= t-SNELoss1

    because the aligned t-SNE is trying to do both (regular t-sne optimization and alignment) --- trying to make two things aligned only makes it hard to do the regular part of the t-sne optimization.

    So i'm interested in seeing what happens when you try many values of lambda and computing the embedding for each lambda, and solving for:

    error ratio1 = t-SNELoss1/t-SNELoss1a [this should be between 0 and 1]

    and the alignment error.

    if we plot (error ratio1, alignment error) on a scatter plot, where

    alignment error is on the x-axis and
    error ration1 is on the y-axis

    then we might get to see if we can get very good alignment with only a little extra error, because those are the plots that we eventually want to see.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *