This post has several purposes.
FIRST: We need a better name or acronym than yoked t-sne. it kinda sucks.
- Loosely Aligned t-Sne
- Comparable t-SNE (Ct-SNE)?
- t-SNEEZE? t-SNEs
SECOND: How can we "t-SNEEZE" many datasets at the same time?
Suppose you are doing image embedding, and you start from imagenet, then from epoch to epoch you learn a better embedding. It might be interesting to see the evolution of where the points are mapped. To do this you'd like to yoke (or align, or tie together, or t-SNEEZE) all the t-SNEs together so that they are comparable.
t-SNE is an approach to map high dimensional points to low dimensional points. Basically, it computes the similarity between points in high dimension, using the notation:
P(i,j) is (something like) how similar point i is to point j in high dimensions --- (this is measured from the data), and
Q(i,j) is (something like) how similar point i is to point j in low dimension.
the Q(i,j) is defined based on where the 2-D points are mapped in the t-SNE plot, and the optimization finds 2-D points that makes Q and P as similar as possible. Those points might be defined as (x(i), y(i)).
With "Yoked" t-SNE we have two versions of where the points go in high-dimesional space, so we have two sets of similarities. So there is a P1(i,j) and a P2(i,j)
yoked t-SNE solves for points x1, y1 and x2,y2 so that the
- Q1 defined by the x1, y1 points is similar to P1, and the
- Q2 defined by the x2,y2 points are similar to P2 and the
- x1,y1 points are similar to x2,y2
by adding this last cost (weight by something) to the optimization. If we have *many* high dimensional points sets (e.g. P1, P2, ... P7, for perhaps large versions of "7") what can we do?
Idea 1: exactly implement the above approach, with steps 1...7 talking about how each embedding should have Q similar to P, and have step 8 penalize all pairwise distances between the x,y embeddings for each point.
Idea 2: (my favorite?). The idea of t-SNE is to find which points are similar in high-dimensions and embed those close by. I wonder if we can find all pairs of points that are similar in *any* embedding. So, from P1... P7, make Pmax, so that Pmax(i,j) is the most i,j are similar in any high-dimensional space. Then solve for each other embedding so that it has to pay a penalty to be different from Pmax? [I think this is not quite the correct idea yet, but something like this feels right. Is "Pmin" the thing we should use?]
lol t-SNEEZE
Don't encourage him!