Skip to content

For the last few years, I been doing development by writing code, deploying it to the server, running stuff on the server without any GUI, and then downloading anything that I want to visualize. This has been a huge pain! We work with images and having that many steps between code debugging and visualizing things is stupidly inefficient, but sorting out a better way of doing things just hadn't boiled up to the top of my priority list. But I've been crazy jealous of the awesome things Hong has been able to do with jupyter notebooks that run on the server, but which he can operate through the browser on his local machine. So I asked him for a bit of help getting up and running so that I could work on lilou from my laptop and it turns out it's crazy easy to get up and running!

I figured I would detail the (short) set of steps in case other folk's would benefit from this -- although maybe all you cool kids already know about running iPython and think I'm nuts for having been working entirely without a GUI for the last few years... 🙂

On the server:

  1. Install anaconda:

    https://www.anaconda.com/distribution/#download-section

    Get the link for the appropriate anaconda version

    On the server run wget [link to the download]

    sh downloaded_file.sh

  2. Install jupyter notebook:

    pip install —user jupyter

    Note: I first had to run pip install —user ipython (there was a python version conflict that I had to resolve before I could install jupyter)

  3. Generate a jupyter notebook config file:jupyter notebook --generate-config
  4. In the python terminal, run:

    from notebook.auth import passwd

    passwd()

    This will prompt you to enter a password for the notebook, and then output the sha1 hashed version of the password. Copy this down somewhere.

  5. Edit the config file (~/.jupyter/jupyter_notebook_config.py):

    Paste the sha1 hashed password into line 276:

    c.NotebookApp.password = u'sha1:xxxxxxxxxxxxxx'

  6. Run “jupyter notebook” to start serving the notebook

Then to access this notebook locally:

  1. Open up the ssh tunnel:ssh -L 8000:localhost:8888 username@lilou.seas.gwu.edu
  2. In your local browser, go to localhost:8000
  3. Enter the password you created for your notebook on the server in step 4 above
  4. Create iPython notebooks and start running stuff!

Quick demo of what this looks like:

I spent a lot of the past week hacking with containers, AWS, and database code, making things work.

After finally getting permission from AWS to spin EC2 instances with GPUs, I've been able to test the Docker container I made for the classifier and its dependencies, including the Flask server for URL classification. To do this, I created a Dockerfile that starts with an Ubuntu 16 image (with CUDA 10) that installs all the necessary dependencies for running the classifier. In AWS, I spun up an EC2 instance with a GPU where I ran an image of the container I made. The communication between that EC2 and the container works fine, as does the communication between the Flask server and the classifier, but I'm having an issue loading and using the model on the EC2's GPU.

I also refactored the Flask application code itself, modifying the format for HTTP calls and improving how the JSON objects were structured, along with cleaning up the classifier code base substantially. Another feature I added to the system is a super simple DynamoDB database to keep track of all of the URLs each container receives, their predicted label, and whether a human has labeled them. I used Python to write and test basic scripts to read, query, add to, and update the database.

Tasks for next week include successfully loading and running the classifier on AWS and forging ahead with developing the dynamic web server.

Last experiment picked a particular lambda of yoke-tsne, and in this experiment, we pick up several lambda trying to see how lambda affect yoke-tsne.

As last experiment, we split Stanford Cars dataset in dataset A(random 98 categories) and dataset B(resting 98 categories). And, train Resnet-50 by N-pair loss on A, and get embedding points of those data in A. Second, train Resnet-50 by N-pair loss on dataset B, and using this trained model to find embedding points of data in dataset A. Finally, compare those two embedding effect by yoke-tsne.

In this measurement, with lambda change, we record the ratio between KL distance in yoke t-sne and KL distance in original t-sne, and record the L2 distance.

The result is as following:

We can see the KL distance ratio get an immediately increase when lambda is 1e-9. The yoke tsne result figure show that when lambda is 1e-8, the two result look like perfect aligned.

And, when lambda was downed to 1e-11, the yoke tsne seems no effect.

And, 1e-9 and 1e-10 works well, the training tsne get some local relation(between clusters) of testing one and keep the cluster inside relation:

Here is the original tsne:

Here is 1e-9:

Here is 1e-10:

 

In lambda 1e-9 1e-10, what is in our expectation, the location of each cluster is pretty same and the loose degree of each cluster was similar to the original t-sne.

 

For a reasonable lambda, I think it is depended on points number and KL distance between two embedding space.

This week I worked on an odd combination of tasks aimed towards deploying EDanalysis: in-depth prototyping of the explainED web app I'm building and containerizing the classifier. 

So, first, let's discuss paper prototyping the explainED app. Disclaimer: I don't know too much about UI/UX design, I'm not an artist, and I haven't taken a human-computer interaction class.

Here is an (extremely rough) digital mockup I made of the app a while ago: explainED clinician tool mockup

In this iteration of my UX design, I focused on the key functionality I want the app to have: displaying pro-ED trends and statistics, a URL analyzer (backed by the classifier), pro-ED resources, and a patient profile tab. I sketched out different pages, interactions, and buttons to get a feel of what I need to design (process inspired by this Google video). I still have a ways to go (and a few more pages to sketch out) but at least I have a better idea of what I want to implement.

explainED prototype drawings

On a completely different note, I continued towards my long-term goal of getting EDanalysis on AWS.

I made a Docker container from a 64-bit ubuntu image with pre-installed CUDA by installing all the classifier/system dependencies from source. The benefit of containerizing our pytorch application is that we can parallelize the system by spinning multiple instances of the containers. Then, on AWS, we could use Amazon's Elastic Container Service and throw in a load balancer to manage requests to each container, and boom: high concurrency.

In the upcoming week, I plan to spend time implementing the end-to-end functionality of the system on AWS: sending data from the SEAS server to the S3 bucket and then communicating with the pytorch container and starting to design the dynamic backend of the explainED webapp.

I think Robert and I will need to discuss what can be posted publicly due to the nature of this research, so this post reflects only part of the information shared in the GCA meeting held this week.

Phase 2 kickoff will scheduled for March 12 or 13th.  At this point meetings will become bi-weekly rather than monthly.  The proposed schedule for meetings is every other Monday at the same time as the monthly tag-ups.

Phase 1 has been pulling data from Digital Globe and Descartes Labs.  These data sets include data covering typhoon and hurricane affected areas.  Some of these data sets contain significant cloud cover.

There is consideration of using radargrammetry via Radar Sat data as a low resolution filter so that only optical data from areas of interest will need to be processed.

 

2

After our less than stellar results last week, we talked to Dr. Pless, and decided to pivot the goal of our project.

Instead, our new model will be designed to take 2 images from the same traffic camera, and output the time that has passed between the 2 frames. (ex. if one is taken at 12:30:34, the next at 12:30:36, we should get an output of 2seconds),

The reasoning behind this is, in order to be able to distinguish time differences, the network must learn to recognize the moving objects in the images (ie. cars, trucks, etc.). This way, we are forcing it to learn what vehicles look like, keep track of colors, keep track of vehicle sizes, etc, without having to label every single one individually.

In order to accomplish this, we need to learn about embedding, so that we can create 2 feature vectors, which should represent the similarity between images. This can then be used to train the network to actually detect time differences.

What we know about embedding

We know that an embedding maps an image to a vector in high dimensional space, which will represent different features of the image. Similar images will map to similar places, thus we can use this to gauge similarity.

We found some examples of embedding networks online to learn a bit about how they work. One example used ResNet50, pretrained on imagenet. To get the embedding vector, it passes an image through this network up to the 'avg_pool' layer. This layer output is considered the embedding.

We understand that, because this net is trained on image classification, it must learn some features about the image, and getting an intermediate layer output should show us the 'high dimensional space vector'.

What we don't understand is: what do we train our embedding network on? It seems that there is some initial task that creates a net that will create weights that relate to the objects in the image. Our final task will be to get the time diff between 2 images, but I don't believe we can train our network initially for this task. If we did try this, and were successful in just training a net that takes 2 images as input, then we wouldn't need the embedding (maybe we would still use it for visualization?). But we believe we need some initial task to train a network about our images, that will make it learn the features in some way first. Then, we can use some intermediate layer of this network to extract the embedding, which then could be passed to some other network that takes the vector as input, and whose output will be this time diff.


We also gathered some images from a higher framerate camera (we gathered at ~3images/second). We needed these over AMOS cameras because we need to detect smaller scale time differences, 30 min would be way too long, and any cars in the image would be long gone.

 

3

This experiment aims to measure whether a embedding method have a good generalization ability  by yoke-tsne.

The basic idea of this experiment is trying to find the clustering effect of same category in training embedding and test embedding. In this experiment, we split Stanford Cars dataset in dataset A(random 98 categories) and dataset B(resting 98 categories). And, train Resnet-50 by N-pair loss on A, and get embedding points of those data in A. Second, train Resnet-50 by N-pair loss on dataset B, and using this trained model to find embedding points of data in dataset A. Finally, compare those two embedding effect by yoke-tsne.

 

The result is as following:

The left figure is the embedding result of dataset A as training data, and the right figure is the embedding result of dataset A as testing data. As we can see, the cluster in left figure is tight while the cluster in right part is looser. In spite of this, the points in left part was clustered into group, which means the generalization ability of N-Pair loss is not bad.

Next step, I want to try some embedding methods which are considered as bad 'generalization ability' to validate whether yoke-t-sne is a good tool to measure generalization ability.

4

Step-by-step description of the process:

  • Load the data in the following format

  • Create an ID for each hotel based on hotel name for training purposes
  • Remove hotels without names and keep hotels with at least 50 reviews
  • Load GloVe
  • Create sequence of embeddings and select maximum sentence length (100)

  • Select anchors, positives, and negatives for the triplet loss. Anchor and positive have to be reviews coming from the same hotels, whereas the negative has to be a review from a different hotel
  • ||f(A) - f(P)||^2 <= ||f(A) - f(N)||^2   if A=anchor, P=positive, N=negative
    d(A,P) <= d(A,N)
  • LOSS = max(||f(A) - f(P)||^2 - ||f(A) - f(N)||^2 + α, 0)
  • COST = sum of losses for the trainings set of different triplets
  • Train a model with triplet loss in 5 epochs
  • Plot training and validation loss

  • Potentially having more epochs could reduce the loss
  • However, based on the test results we can see that triplet loss doesn't make any valuable results. Especially since we are looking at just one location, it is easier for reviews to cluster based on things they mention rather than on the hotels they are based on

In last week, we try our yoke t-sne method (add a L2 distance term into t-sne loss function). In this week, we try different scales of this L2 distance term to see the effect of t-sne.

This loss function of t-sne is that:

C = KL distance 1(embed 1 with t-sne1) + KL distance 2 + Ⲗ * (t-sne1 - t-sne2)^2

In this measurement, with lambda change, we record the ratio between KL distance in yoke t-sne and KL distance in original t-sne, and record the L2 distance.

The result is as following:

Its the ratio of for the first embedding.

Its the ratio of for the second embedding.

It this alignment error (the L2 distance)

 

As we can see in the above figures, we the weight of the L2 distance term increase, the ratio increase, which imply that when we 'yoke' heavier the t-sne, the distribution of t-sne plane is less like the distribution in high embedding plane. And, the decreasing alignment error shows that the two t-sne is align more perfect with lambda increasing.