This experiment aims to measure whether a embedding method have a good generalization ability by yoke-tsne.
The basic idea of this experiment is trying to find the clustering effect of same category in training embedding and test embedding. In this experiment, we split Stanford Cars dataset in dataset A(random 98 categories) and dataset B(resting 98 categories). And, train Resnet-50 by N-pair loss on A, and get embedding points of those data in A. Second, train Resnet-50 by N-pair loss on dataset B, and using this trained model to find embedding points of data in dataset A. Finally, compare those two embedding effect by yoke-tsne.
The result is as following:
The left figure is the embedding result of dataset A as training data, and the right figure is the embedding result of dataset A as testing data. As we can see, the cluster in left figure is tight while the cluster in right part is looser. In spite of this, the points in left part was clustered into group, which means the generalization ability of N-Pair loss is not bad.
Next step, I want to try some embedding methods which are considered as bad 'generalization ability' to validate whether yoke-t-sne is a good tool to measure generalization ability.
This is great! This is exactly the plot that I want to see --- we have something that is mostly aligned, and you can see some variations about groups that are more merged in one of the plots. Perfect.
I'm not exactly sure what you mean by generalization, but I think we need to loop back to re-evaluate the "what is the right lambda" question. Your figure shows that we can get an alignment that still shows some differences. Can you make the version of this with 5 different values of lambda, (one of which is "0"), and show both the pair of embeddings, AND the "alignment error, T-SNE error" in each case.
Sure, I will work on the 'finding right lambda'.
Eventually, I think the goal will be to have a good rule to find the "right" lambda.
Right now, I really want to see what happens for many different values of lambda, and understand how those different values of lambda give trade-offs between "more aligned t-sne" and "t-sne that fits the high dimensional data".