Skip to content

In this post, I am going to run through everything that I have tried thus far, and show some pictures of the results that each attempt has rendered. I will first discuss the results of the experiments involving a set of vertically co-linear glitter, 10 centroids. Then, I will discuss the results of the experiments involving 5 pieces of glitter placed in a circle around the camera & light, such that the camera & light are vertically co-linear.

Calculation of Coefficients

In order to account for the surface normals having varying magnitudes (information that is necessary in determining which ellipse a piece of glitter lies on), I use the ratios of the surface normal's components and the ratios of the gradient's components (see previous post for derivation).

 

Once I have constructed the matrix A as follows:
A = [-2*SN_y * x, SN_x *x - SN_y *y, 2*SN_x *y, -SN_y, SN_x], derived by expanding the equality of ratios and putting the equation in matrix form. So, we are solving the equation: Ax = 0, where x = [a, b, c, d, e]. In order to solve such a homogeneous equation, I am simply finding the null space of A, and then using some linear combination of these vectors to get my coefficients:
Z = null(A);
temp = ones(size(Z,2),1);
C = Z*temp

Using these values of coefficients, I can then calculate the value of f in the implicit equation by directly solving for it for each centroid.

1. Glitter is vertically co-linear

For the first simulation, I placed the camera and light to the right of the glitter line, and vertically co-linear to each other, as seen in the figure to the left. Here, the red vectors are the surface normals of the glitter and the blue vectors are the tangents, or the gradients, as calculated based on the surface normals.

  1. Using the first 5 pieces of glitter:
    1. Coefficients are as follows:
      • a = -0.0333
      • b = 0
      • c = 0
      • d = 0.9994
      • e = 0
    2. No plot - all terms with a y are zeroed out, so there is nothing to plot. Clearly not right...
  2. Using the last 5 pieces of glitter:
    1. Same results as above.

This leads me to believe there is something unsuitable about the glitter being co-linear like this.
For the second simulation, I placed the camera and the light to the right of the glitter line, but here they are not vertically co-linear with each other, as you can see in the figure to the left.

 

 

  1. Using the first 5 pieces of glitter:
    1. Coefficients are as follows:
      • a = 0.0333
      • b = 0
      • c = 0
      • d = -0.9994
      • e = 0
    2. No plot - all terms with a y are zeroed out, so there is nothing to plot. Clearly not right...
  2. Using the last 5 pieces of glitter:
    1. Same results as above.

If we move the camera and light to the other side of the glitter, there is no change. Still same results as above.

2. Glitter is not vertically co-linear

In this experiment, the glitter is "scattered" around the light and camera as seen in the figure to the left.

 

 

 

I had a slight victory here - I actually got concentric ellipses in this experiment when I move one of the centroids so that it was not co-linear with any of the others:

 

 

 

 

 

 

In the process of writing this post and running through all my previous failures, I found something that works; so, I am going to leave this post here. I am now working through different scenarios of this experiment and trying to understand how the linearity of the centroids affects the results (there is definitely something telling in the linearity and the number of centroids that are co-linear with each other). I will try to have another post up in the near future with more insight into this!

3

This experiment aims to measure whether a embedding method have a good generalization ability  by yoke-tsne.

The basic idea of this experiment is trying to find the clustering effect of same category in training embedding and test embedding. In this experiment, we split Stanford Cars dataset in dataset A(random 98 categories) and dataset B(resting 98 categories). And, train Resnet-50 by N-pair loss on A, and get embedding points of those data in A. Second, train Resnet-50 by N-pair loss on dataset B, and using this trained model to find embedding points of data in dataset A. Finally, compare those two embedding effect by yoke-tsne.

 

The result is as following:

The left figure is the embedding result of dataset A as training data, and the right figure is the embedding result of dataset A as testing data. As we can see, the cluster in left figure is tight while the cluster in right part is looser. In spite of this, the points in left part was clustered into group, which means the generalization ability of N-Pair loss is not bad.

Next step, I want to try some embedding methods which are considered as bad 'generalization ability' to validate whether yoke-t-sne is a good tool to measure generalization ability.

4

Step-by-step description of the process:

  • Load the data in the following format

  • Create an ID for each hotel based on hotel name for training purposes
  • Remove hotels without names and keep hotels with at least 50 reviews
  • Load GloVe
  • Create sequence of embeddings and select maximum sentence length (100)

  • Select anchors, positives, and negatives for the triplet loss. Anchor and positive have to be reviews coming from the same hotels, whereas the negative has to be a review from a different hotel
  • ||f(A) - f(P)||^2 <= ||f(A) - f(N)||^2   if A=anchor, P=positive, N=negative
    d(A,P) <= d(A,N)
  • LOSS = max(||f(A) - f(P)||^2 - ||f(A) - f(N)||^2 + α, 0)
  • COST = sum of losses for the trainings set of different triplets
  • Train a model with triplet loss in 5 epochs
  • Plot training and validation loss

  • Potentially having more epochs could reduce the loss
  • However, based on the test results we can see that triplet loss doesn't make any valuable results. Especially since we are looking at just one location, it is easier for reviews to cluster based on things they mention rather than on the hotels they are based on

In last week, we try our yoke t-sne method (add a L2 distance term into t-sne loss function). In this week, we try different scales of this L2 distance term to see the effect of t-sne.

This loss function of t-sne is that:

C = KL distance 1(embed 1 with t-sne1) + KL distance 2 + Ⲗ * (t-sne1 - t-sne2)^2

In this measurement, with lambda change, we record the ratio between KL distance in yoke t-sne and KL distance in original t-sne, and record the L2 distance.

The result is as following:

Its the ratio of for the first embedding.

Its the ratio of for the second embedding.

It this alignment error (the L2 distance)

 

As we can see in the above figures, we the weight of the L2 distance term increase, the ratio increase, which imply that when we 'yoke' heavier the t-sne, the distribution of t-sne plane is less like the distribution in high embedding plane. And, the decreasing alignment error shows that the two t-sne is align more perfect with lambda increasing.

In the image embedding task, we always focus on the design of the loss and make a little attention on the output/embedding space because the high dimensional space is hard to image and visualized. So I find an old tool can help us understand what happened in our high-dimension embedding space--SVD and PCA.

SVD and PCA

SVD:

Given a matrix A size (m by n), we can write it into the form:

A = U E V

where A is a m by n matrix, U is a m by m matrix, E is a m by n matrix and V is a n by n matrix.

PCA

What PCA did differently is to pre-process the data with extracting the mean of data.

Especially, V is the high-dimensional rotation matrix to map the embedding data into a in coordinates and E is the variance of each new coordinates

Experiments

The feature vector is coming from car dataset(train set) trained with standard N-pair loss and l2 normalization

For a set of train set points after training, I apply PCA on with the points and get high-dimensional rotation matrix, V.

Then I use V to transform the train points so that I get the new representation of the embedding feature vectors.

Effect of apply V to the embedding points:

  • Do not change the neighbor relationship
  • ‘Sorting’ the dimension with the variance/singular value

Then let go back to view the new feature vectors. The first digit of the feature vectors represents the largest variance/singular value projection of V. The last digit of the feature vector represents the smallest variance/singular value projection of V

I scatter the first and the last digit value of the train set feature vectors and get the following plots. The x-axis is the class id and the y-axis is each points value in a given digit.

The largest variance/singular value projection dimension

The smallest variance/singular value projection dimension

We can see the smallest variance/singular value projection or say the last digit of the feature vector has very small values distribution and clustered around zero.

When comparing a pair of this kind of feature vector, the last digit contributes very small dot product the whole dot product(for example, 0.1 * 0.05 =0.005 in the last digit). So we can neglect this kind of useless dimension since it looks like a null space.

Same test with various embedding size

I try to change the embedding size with 64, 32 and 16. Then check the singular value distribution.

 

Then, I remove the digit with small variance and Do Recall@1 test to explore the degradation of the recall performance

Lastly, I apply the above process to our chunks method

A quick recap, our problem is that we want to identify cars in traffic cams according to 2 categories (1. Color, 2. Car type). Each of these have 8 possible classes (making for a total of 64 possible combination classes).

Our preliminary approach is to simply create 2 object detectors, 1 for each category.

We successfully trained these 2 neural nets using the same RetinaNet implementation that worked well last semester for our corrective weight net.

We used the ~1700 labels from the SFM to train, and got some results. However, definitely not as great as we would have hoped. Here are some of our test images:

Color:

Type:

 

As you can see, it sometimes is right, sometimes is wrong, but it also just misses many of the vehicles in the image (like in the 1st 'Color' image). In addition, the confidence is pretty low, even when it gets it correct.

 

Clearly, something is wrong. We're thinking that its probably just a hard problem due to the nature of the data. For the color, its understandable that it might not be able to get more rare/ intermediate colors such as red or green, but some cars which were clearly white were getting a black label, or vice-versa, with the same confidence scores as when it was actually correct. We're not sure why this would be the case for some.

 

For the next week, we will work on getting to the root of the issue, as well as trying to brainstorm more creative ways to tackle this problem.

The last couple of days, I have focused on formally writing up the derivation in 2D of the constraints on the glitter that I am using in defining ellipses. I believe there is something wrong/incomplete in how I am thinking about the magnitude of the surface normals when using them to calculate the gradient vector. The difference in magnitude of the surface normals for each piece of glitter definitely has a bearing on the size of the ellipse associated with that piece of glitter.

I have attached my write-up to this post. In the write-up, there is a derivation of the constraints as well as my initial attempt at motivating this problem. I think I need to tie the motivation into the overall camera calibration problem instead of just talking about how the glitter can define ellipses.

Glitter_and_Ellipses-1xaxzep

My immediate next steps include re-working the last part of the derivation, the part which involves the magnitude of the surface normals (the ratio). I am also going to try to find other approaches to this problem. I REALLY believe the surface normals of the lit glitter is enough to determine the set of ellipses, so perhaps this implicit equation approach isn't the correct one! In the next day or so, I will put up a more comprehensive post on what results (including pretty/not-so-pretty pictures) I have achieved so far using the technique outlined in the write-up attached to this post.

My priority this week has been implementing the system architecture for my EDanalysis senior project/research on Amazon Web Services (AWS). First, I'll briefly  introduce the project then dive into what I've been up to this week with AWS.

For this project, we trained an instance of the ResNet convolutional neural network to recognize pro-eating disorder images, with the aim of developing software tools (called EDanalysis) to improve eating disorder treatment and patient health outcomes. For more information, check out this video I made describing the project's vision, featuring a sneak peek of some of the software we're building!

This week, we had a 70% Project Demo for GW's CS Senior Design class (see more about the Senior Design aspects of my project here!). My 70% demo goals involved setting up my project on AWS, which is a first for me. My rationale for choosing AWS as a cloud service provider was simple: our project's goal is to publicly deploy the EDanalysis tools; hence, whatever system we make needs room to grow. To my knowledge, AWS offers unparalleled design flexibility--especially for machine learning systems--at web scale (wow, buzzword alert). Disclaimer: my current AWS system is optimized for cost-efficiency (for Senior design purposes ;-)), but I plan to someday use an AWS ECS instance and other beefier architectures/features.

The EDanalysis system has 3 main parts: the R&D server, the website architecture/ backend, and the frontend components, which I refer to as parts 1, 2, and 3 below.

A detailed view of the EDAnalysis system with a focus on its AWS components
EDanalysis AWS System

This week, I completed the following:

  • part 1: communication from the R&D server to the S3 bucket
  • part 2: communication from the R&D server to the S3 bucket triggers a lambda function that forwards the passed data to the EC2 instance
  • part 2: a modification of the classifier testing script to download a single image from an input URL, run it through the classifier, and return its classification
  • part 2: a proof-of-concept script  for the pytorch EC2 instance that creates a Flask server that adheres to the REST API, communicates with the classifier and passes it an image url in JSON format, runs the classifier on that url, and passes back its classification to the server
  • the AWS design and architecture above

For the above, I found the Designing a RESTful API using Flask-RESTful and AWSeducate tutorials to be most useful.

My goals for next week are the following:

  • containerizing the classifier environment so it's easier to deal with all the requirements
  • instantiating the pytorch EC2 instance on AWS and getting the classifier and Flask server working there
  • instantiating the user database with DynamoDB (first, modifying my old MySQL schema)
  • cleaning up the Flask server code and accompanying classifier test code
  • experimenting with (outgoing) communication from GW SEAS servers to my AWS S3 bucket

Here's to learning more about ~the cloud~ and tinkering around with code!

I made a visualization of the leaf length/width pipeline for the 3D scanner Data.

Raw data first (part of):

Then is the cropping:

With the connected component, we got 6000+ regions. Then with the heuristic search:

Then is the leaf length and width for each single region. The blue lines are the paths for leaf length, the orange lines are leaf width. The green dots are key points on leaf length path for the leaf width. Those key points are calculated by equally separate the weighted length path as 6 parts. The width with zero means it did not find any good width path

For the leaf width paths that are still on the same side, I'm going to restrict more on the cosine distance instead of only positive cross product: