The car dataset I use contains 8131 images of 64 dimensions. These data have 98 classes which had been labeled form 0 to 97, there are about 60 to 80 images of each class. What I am trying to do is instead of using Euclidean Distance or Manifold Ranking to query only one image, use Manifold Ranking to query two different images but in the same class at same time, to improve the accuracy.
Example results of one node pair(Green border images means the image has the same class as the query nodes, red ones means not same class) :
Node pair [0, 1]:
Using Euclidean Distance query 2 nodes separately and their ensemble result:
Using Manifold Ranking query 2 nodes separately:
Manifold Ranking ensemble result and the result of using our Two-Node Query to query them at same time:
Node Pair [1003, 1004]:
Using Euclidean Distance query 2 nodes separately and their ensemble result:
Using Manifold Ranking query 2 nodes separately:
Manifold Ranking ensemble result and the result of using our Two-Node Query to query them at same time:
Node Pair [3500, 3501]:
Using Euclidean Distance query 2 nodes separately and their ensemble result:
Using Manifold Ranking query 2 nodes separately:
Manifold Ranking ensemble result and the result of using our Two-Node Query to query them at same time:
The improved algorithm is as follows:
1. Use faiss library and setting nlist=100, nprobe=10, to get the 50 nearest neighbors of the two query nodes. The query nodes are different but in the same class.(faiss library use Cluster Pruning Algorithm, to split the dataset into 100 nlist(cluster), each cluster has a leader, choose 10(nprobe) nearest clusters of the query point and find the K nearest neighbors.)
2. To simplify the graph, just use the MST as the graph for Manifold Ranking. In other words, now we have two adjacency matrices of the two query node.
3. Create a Link Strength Matrix, which at first is a 0 matrix has the same shape as the Adjacency Matrix, and if the there is a node near the 1st query point has the same class as the other node near the 2nd query point, then give a beta value to create an edge between these two nodes in the Link Strength Matrix.
4. Splicing Matrices, for example, the Adjacency Matrix of the 1st query point at top left, the Link Strength Matrix at top right and the transposed matrix of it at bottom left, the Adjacency Matrix of the 2nd query point at bottom right.
5. Normalizing the new big matrix as the Manifold Ranking does. However, when computing the Affinity Matrix, use the Standard Deviation of the non-zero values of the two pre-Adjacency Matrices only, to make the curve converge at ensemble result when the beta value is large.
6. Giving the two query nodes both 1 signal strength and others 0 when initialization. Then get the Manifold Ranking of all nodes.
The following plot shows the mAP Score of different beta values for all node pairs in the dataset, which means 8131 images in 97 classes, over 330k node pairs. I give the top ranked node score 1 if it has the same class as the query nodes and score 0 if not, the n-th top ranked node score 1/n if it has the same class as the query node and score 0 if not. As we can see, as the beta value increase, the mAP Score got the maximum value when the beta value at 0.7-0.8, which is better than only use one query node and the ensemble result of two query nodes.
My next step is to find if I can do some improvement to the time complexity of the algorithm, and try to improve the algorithm.