Skip to content

(1) The one sentence scientific/engineering goal of your project

The current goal is to train a classifier to classify the "Awned/Awnless" classes.

(2) Your target end-goal as you know it (e.g. paper title, possible venue).

The further goal we imagine is to predict the "Heading Percentage", then train a model to find the "Wheat Spikes", the seed part of the wheat, to prove we can do it from UAV data.

(3) Your favorite picture you’ve created relative to your project since you last posted on the blog.

In the dataset, there is only one plot have the phenotype value "Mix", which means in this plot, the wheat have both awned and awnless phenotype. (It seems interesting, but we decided to remove them form the dataset first.)

(4) Your specific plan for the next week.

We imagine to finishing training the simple classifier this week, and see if we need to do more improvement work on it, or we can move on to the next step.

The car dataset I use contains 8131 images of 64 dimensions. These data have 98 classes which had been labeled form 0 to 97, there are about 60 to 80 images of each class. What I am trying to do is instead of using Euclidean Distance or Manifold Ranking to query only one image, use Manifold Ranking to query two different images but in the same class at same time, to improve the accuracy.

Example results of one node pair(Green border images means the image has the same class as the query nodes, red ones means not same class) :

Node pair [0, 1]:

Using Euclidean Distance query 2 nodes separately and their ensemble result:

Using Manifold Ranking query 2 nodes separately:

Manifold Ranking ensemble result and the result of using our Two-Node Query to query them at same time:

 

 

Node Pair [1003, 1004]:

Using Euclidean Distance query 2 nodes separately and their ensemble result:

Using Manifold Ranking query 2 nodes separately:

Manifold Ranking ensemble result and the result of using our Two-Node Query to query them at same time:

 

Node Pair [3500, 3501]:

Using Euclidean Distance query 2 nodes separately and their ensemble result:

Using Manifold Ranking query 2 nodes separately:

Manifold Ranking ensemble result and the result of using our Two-Node Query to query them at same time:

 

The improved algorithm is as follows:

1. Use faiss library and setting nlist=100nprobe=10, to get the 50 nearest neighbors of the two query nodes. The query nodes are different but in the same class.(faiss library use Cluster Pruning Algorithm, to split the dataset into 100 nlist(cluster), each cluster has a leader, choose 10(nprobe) nearest clusters of the query point and find the K nearest neighbors.)

2. To simplify the graph, just use the MST as the graph for Manifold Ranking. In other words, now we have two adjacency matrices of the two query node.

3. Create a Link Strength Matrix, which at first is a 0 matrix has the same shape as the Adjacency Matrix, and if the there is a node near the 1st query point has the same class as the other node near the 2nd query point, then give a beta value to create an edge between these two nodes in the Link Strength Matrix.

4. Splicing Matrices, for example, the Adjacency Matrix of the 1st query point at top left, the Link Strength Matrix at top right and the transposed matrix of it at bottom left, the Adjacency Matrix of the 2nd query point at bottom right.

5. Normalizing the new big matrix as the Manifold Ranking does. However, when computing the Affinity Matrix, use the Standard Deviation of the non-zero values of the two pre-Adjacency Matrices only, to make the curve converge at ensemble result when the beta value is large.

6. Giving the two query nodes both 1 signal strength and others 0 when initialization. Then get the Manifold Ranking of all nodes.

The following plot shows the mAP Score of different beta values for all node pairs in the dataset, which means 8131 images in 97 classes, over 330k node pairs. I give the top ranked node score 1 if it has the same class as the query nodes and score 0 if not, the n-th top ranked node score 1/n if it has the same class as the query node and score 0 if not. As we can see, as the beta value increase, the mAP Score got the maximum value when the beta value at 0.7-0.8, which is better than only use one query node and the ensemble result of two query nodes.

 

 


My next step is to find if I can do some improvement to the time complexity of the algorithm, and try to improve the algorithm.

The car dataset I use contains 8131 images of 64 dimensions, which means in shape [8131, 64]. These data have 98 classes which had been labeled form 0 to 97, there are about 60 to 80 images of each class.

The algorithm is as follows:

1. Use faiss library and setting nlist=100, nprobe=10, to get the 50 nearest neighbors of all nodes. (faiss library use Cluster Pruning Algorithm, to split the dataset into 100 nlist(cluster), each cluster has a leader, choose 10(nprobe) nearest clusters of the query point and find the K nearest neighbors.)

2. Get all node pairs that in the same class without duplicate. For 2 nodes in a node pair, use the longest edge of MST to build a connected graph for Manifold Ranking, separately, as the Adjacency Matrix. Leave the two Adjacency Matrix keeping Euclidean Distance without doing normalization.

3. Create a Pipe Matrix, which at first is a 0 matrix has the same shape as the Adjacency Matrix, and if the there is a node near the 1st query point has the same class as the other node near the 2nd query point, then give a beta value to the edge of these two nodes in the Pipe Matrix.

4. Splicing Matrices, for example, the Adjacency Matrix of the 1st query point at top left, the Pipe Matrix at top right and bottom left, the Adjacency Matrix of the 2nd query point at bottom right.

5. Normalizing the new matrix and doing the Manifold Ranking to get the label of the highest scored node as prediction. Particularly, give the two query points an initial signal weight 1, other nodes 0.

The following plot shows the accuracy of different beta value for images in class 0. As we can see, as the beta value increase, the accuracy got the maximum value when the beta value at 0.8, which is better than only use one query point.

 


My next step is doing this process to all image classes to see the results, and make another plot that shows that either two close query points or far query points perform better.

This week, I use Manifold Ranking and Euclidean Distance to predict the label of certain nodes, and compared the results of these two methods.

The data is from Hong's work, it's a tensor in size 8131*64, which is an image data onto 64 dimensions. Also, I have the ground truth to every node, it is a dictionary that stores the labels for each node. The data structure is shown below.

...continue reading "Comparison of Manifold Ranking and Euclidean Distance in Real Data"

Last week I recurrent the paper: Ranking on Data Manifold.

I created these 2-moon shape data randomly and added some noise on it. The left plot has 50 random 2-moon shape nodes,  while the right one has 100 (the following plots correspond to these two).

...continue reading "Recurring Ranking on Data Manifold and Refine with MST"

1. Nearest Neighbor Classifier

NN (Nearest Neighbor Classifier):

This classifier has nothing to do with Convolutional Neural Networks and it is very rarely used in practice, but it will allow us to get an idea about the basic approach to an image classification problem.

One of the simplest possibilities is to compare the images pixel by pixel and add up all the differences. In other words, given two images and representing them as vectors I1,I2 , a reasonable choice for comparing them might be the L1 distance:

...continue reading "kNN, SVM, Softmax and SGD"

View Data

These images are labeled satellite image chips with atmospheric conditions and various classes of land cover/land use. Resulting algorithms will help the global community better understand where, how, and why deforestation happens all over the world - and ultimately how to respond.

...continue reading "Training Model use Dataset with Multi-Labal"

I tried to creating my own dataset from internet image resources this week, and it works well.

 

Get a list of URLs
Go to Google Images and search for the images you are interested in. The more specific you are in your Google Search, the better the results and the less manual pruning you will have to do.

Now you must run some Javascript code in your browser which will save the URLs of all the images you want for you dataset. Press Ctrl+Shift+J in Windows/Linux and Cmd+Opt+J in Mac, and a small window the javascript 'Console' will appear. That is where you will paste the JavaScript commands.

You will need to get the urls of each of the images. You can do this by running the following commands:

urls = Array.from(document.querySelectorAll('.rg_di        .rg_meta')).map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));

...continue reading "Creating Your Own Dataset from Google Images and Training a Classifier"