Skip to content

The car dataset I use contains 8131 images of 64 dimensions, which means in shape [8131, 64]. These data have 98 classes which had been labeled form 0 to 97, there are about 60 to 80 images of each class.

The algorithm is as follows:

1. Use faiss library and setting nlist=100, nprobe=10, to get the 50 nearest neighbors of all nodes. (faiss library use Cluster Pruning Algorithm, to split the dataset into 100 nlist(cluster), each cluster has a leader, choose 10(nprobe) nearest clusters of the query point and find the K nearest neighbors.)

2. Get all node pairs that in the same class without duplicate. For 2 nodes in a node pair, use the longest edge of MST to build a connected graph for Manifold Ranking, separately, as the Adjacency Matrix. Leave the two Adjacency Matrix keeping Euclidean Distance without doing normalization.

3. Create a Pipe Matrix, which at first is a 0 matrix has the same shape as the Adjacency Matrix, and if the there is a node near the 1st query point has the same class as the other node near the 2nd query point, then give a beta value to the edge of these two nodes in the Pipe Matrix.

4. Splicing Matrices, for example, the Adjacency Matrix of the 1st query point at top left, the Pipe Matrix at top right and bottom left, the Adjacency Matrix of the 2nd query point at bottom right.

5. Normalizing the new matrix and doing the Manifold Ranking to get the label of the highest scored node as prediction. Particularly, give the two query points an initial signal weight 1, other nodes 0.

The following plot shows the accuracy of different beta value for images in class 0. As we can see, as the beta value increase, the accuracy got the maximum value when the beta value at 0.8, which is better than only use one query point.

 


My next step is doing this process to all image classes to see the results, and make another plot that shows that either two close query points or far query points perform better.

Terra dataset contains 350,000 sorghum images from day 0 to day 57. Images from continuous 3 days are grouped into a class, forming 19 class in total. The following shows samples from each class:

All images are randomly divided into train set and test set with ratio 8:2. A Resnet18 pre-trained on ImageNet are fine-tuned on the train set (lr = 0.01, epoch = 30). The training history of network (with and without zero epoch) is the following:

  1. At epoch 0, train_acc and test_acc are both 5%. Resnet randomly predict the one of each class
  2. The first 3 epoch dramatically push the train_acc and test_acc to 80%
  3. Network converge to train_acc = 95% and test_acc = 90%

The confusion matrix on test set is the following:

When network makes wrong prediction, it mistakenly predict the sorghum image to neighboring class.

Several samples of wrong prediction is shown in the following:

At (2,4) and (5,5) network do not even predict neighboring classes. It can be seen that these images are not very 'typical' in their class. But the prediction is still hard to explain.

At (4,6) the image is 'typical' in class 1. but predicted to class 5, which is mysterious.

Deepdream is applied to the network to reveal what the network learns:

The structure of resnet18 is given as follow:

An optimization of output of conv2_x, conv3_x, conv4_x, conv5_x and fc layer is conducted:

original image:

conv2,3,4,5:

fc layer:

As the receptive field increase, it can be observed that network learns more complex local structure (each small patch becomes less similar) instead of global structure (a recognizable plant). Maybe the local texture is good enough to classify the image?

 

 

This week, I use Manifold Ranking and Euclidean Distance to predict the label of certain nodes, and compared the results of these two methods.

The data is from Hong's work, it's a tensor in size 8131*64, which is an image data onto 64 dimensions. Also, I have the ground truth to every node, it is a dictionary that stores the labels for each node. The data structure is shown below.

...continue reading "Comparison of Manifold Ranking and Euclidean Distance in Real Data"

Utilizing the scripts outlined in previous blog posts by Grady and I, we were able to create an optimization function to solve for the groove distance and orientation of the diffraction gratings on our holographic glitter sheet. 

Using the data gathered from the plot, pictured above, we were able to get the angles needed for our diffraction grating formula

With the left half of the equation solved using the angles we got from our graph, and the wavelength solved using the method outlined in last week’s blog post about wavelengths, we were able to create an optimization function to solve for our missing information: the distance between the grooves and the orientation of the grooves.

However, there seems to be multiple combinations of orientations and groove distances that can produce the same angle combinations, therefore we need to use more information for our optimization function.

We decided to use all of the lines from a lit square on the monitor to one glitter board to see if acquiring more data to run our optimization on would provide more specific results.

However, there are still multiple combinations of distance and orientation for the grooves that result in near-zero values for the error value. To combat this, we are looking for more parameters for our optimization function that would add constraints to the answers we receive, such as a minimum or maximum value for the groove distance. We have begun working on a script that will look at all pixels and their angles to light squares on the monitor, rather than just one pixels’. Hopefully this large amount of data will produce more specific results from our optimization function.

In the past few days, I learned about SVM, IsoMap and followed the link on slack to read about CapsNet and Using Causal Effect to explain classifier.

Here is some intuition about CapsNet:

As far as I understand, CapsNet groups several neurons together so that the 'feature map' in CapsNet consists of vectors instead of scalars.

This design allows variation in certain representation in feature map. So it encourage different view of same object to be represented in the same capsule.

It also use coupling coefficient to replace max-pooling procedure in traditional CNN. (the procedure from primary caps to digit caps corresponds to global pooling)

This design encourage CapsNet explicitly encode the part-whole relationship. So that the lower level feature tends to be the spacial parts of high level feature.

The paper shows that CapsNet performs better in recognizing overlapping digits than traditional CNN on MNIST dataset.

May be CapsNet will have better performance in dataset consists of more complicated objects?

Last week I recurrent the paper: Ranking on Data Manifold.

I created these 2-moon shape data randomly and added some noise on it. The left plot has 50 random 2-moon shape nodes,  while the right one has 100 (the following plots correspond to these two).

...continue reading "Recurring Ranking on Data Manifold and Refine with MST"

Last week, I find a problem of UMAP which is that if the high graph of a embedding representation is not connected, like NPair result on CAR training dataset, the optimizer of UMAP will keep push each cluster far away, which doesn't matter in visualization, but in TUMAP, we need to measure the loss of each map.

So, we try some different way to avoid or solve this problem.

First, we compute KL distance of normal UMAP with TUMAP result instead of compare their loss.

Second, we try to optimize the repulsive gradient of edge in the high graph instead of each connection of every two points. But the result of this method gonna wired.

And, I try add a 0 vector into the high dimensional vectors, and make it equally very far from each points when constructing high-D graph. It doesn't work.

Grady and I have been working on a script that takes in a large set of pictures of a sheet of holographic glitter, where a picture is taken at every location of a small white box moving along a monitor across from the glitter sheet. The graph we created shows the color of a single glitter pixel when the white box is at that location on the monitor. From this data, I combined the graph with the script I wrote a few weeks ago and created a graph that shows the same thing, just displaying the closest monochromatic wavelength to the RGB value shown and recording that value.

Any photo where the pixel we selected was not lit was not used for this process, as well as any photo where the pixel was fully saturated, or not saturated to at least 15%.

We plan to use these lines to compute a various set of angles that we need, along with the wavelength, to determine the groove spacing of the diffraction grating on our specific holographic glitter sheet.

I spent a good amount of time in the last week planning and fine-tuning what would be best for the models (database tables) and how they need to be able to interact with each other. I also have been working on getting the project set up on the server and the web app moved over onto the server. 

Below shows one of my diagrams of keys (primary and foreign) that I will be using in the models- this took a couple of re-dos in order for me to consider it efficient, but I learned some really important lessons about relations between tables and had to seriously think through how I will want to be accessing and changing these databases through the web app.

Having never worked with a database before, let alone a framework like Django, I really have had to dedicate a great deal of time to understanding how different elements work with one another, and how to configure the whole set-up so it actually works. Not that I have completely mastered anything, or that I’ve done every Django and SQL tutorial out there (just... most of them), but I really am feeling more confident about using Django and creating a back-end for the web app that I ever anticipated.

A couple successes & updates in the last week:

Dr. Pless suggested an addition to the web app (for right now, as the database is not up yet) - storing the photo that was last taken on the app as the overlay for the next photo. I used the LocalStorage JavaScript API, which I had already used to display the photo after it was taken on the next page, to make this happen and while it will only show the last photo you took using the web app, it’s still a pretty cool thing until we can get the database up and running!

The photo on the left shows the overlay in action- the blurriness is due to my shaking hand in trying to take the screenshot, but it was really great to see a sneak peek of what the web app will eventually become. The photo on the left shows the page after the photo is taken, displaying the photo with the choice to hold down to save to your camera roll along with sharing on social media and other options that I haven’t worked on yet.

I’ve been working on the plans for the flow as well as appearance of the site and have been drawing out my ideas, so there is a plan for the front-end! I’m currently more focused on the back-end, but when the time comes, I’m really excited to start working on giving Project rePhoto the eye-catching, modern look it deserves!

We’re still trying to come up with a Project rePhoto tagline- a couple of strong ones so far have been “chronicle change in your world” and “track what you care about”, and I’d love to hear more! There’s sticky notes by my desk if you ever think of an idea!

I would really appreciate any feedback that anyone has on what I’ve done so far; thanks for reading and happy Thursday!

This past week, I have been working on solving for the gaussian which fits the intensity plots of my glitter pieces, and using this to solve for the center of the receptive field of the glitter. Below is an example of an intensity plot of a centroid:

 

 

 

 

 

 

 

 

 

 

Below are the equations I am using to write the optimization function to solve for the receptive field. Here I am computing the integral of the response of a piece of glitter at every location on the monitor, where I(x,y) is a binary of whether the monitor is lit or not at each location:

Right now, I am solving this using the circular gaussian with a single sigma, and I spent a few days working through writing the optimization function. Yesterday I was able to write it in a way that speeds it up (each iteration is looking at 200 frames and ~700 centroids, so it was very slow).

My next step is to work on analyzing the final error that I am getting and looking at the optimal solution for the center of the receptive field and the sigma value (I am getting a solution, but have not spent too much time trying to understand its validity).