Skip to content

Just a reminder on what we are doing, we are embedding images of a highway, where images within a small timespan window are mapped to a certain class.

Since our last post, we mainly focused on creating a TSNE and visualizations of our embedding.

Dr. Pless gave us an idea for our tsne: plot a line between all the points (where the line goes in temporal order).

First we created our tsne, and observed that all the clusters were more or less in a line.

And we were like "hey, wouldn't it be cool if these cluster-lines represent the timeline of the images within the class", and it looks like they actually are!

If these line-clusters were to represent the timeline of the images, we would expect that our temporally-advancing line would enter a cluster at one end, would continue 1 by 1, in order of the line-cluster, and exit out the other end.

This would be impressive to me, because the embedding is not only learning the general timeframe of the image (class), but also the specific time within that class, even though it has no knowledge of this time beforehand.

Here are the results on our training data (click on the image if you don't see the animation):

Here are the results on our validation data:

You can see this isn't quite as nice as our training data (of course). The line generally comes in to one end, and most of the time exits the cluster in the middle-ish.

(this was a model only trained for a couple epochs, so we should get better results after we train it a bit more)

 

For next week we plan to:

- Correct our training and test sets (previously we mentioned taking every other time interval as part of our evaluation set. This is bad because our test set is too close to our training set (the end of one training time window = start of one evaluation time window)

- Do some of the overfitting test/ etc. that we did last model

- Train more!

1

Last week, our confusion was: How were we going to use NPairs loss without discrete classes.

With Dr. Pless and Abby, we figured out that we could create classes by grouping together a handful of images from a traffic camera that were all taken within a certain time interval (we decided to go with 15 frame groups). We then skip the next 15 frames before we create the next class, so there isn't overlap between classes. We did this process on this video: https://www.youtube.com/watch?v=wqctLW0Hb_0

This gave us a total of 51,000 frames.

We just threw these new classes into Hong's npairs model without changing anything about the model itself. This trained, and the loss decreased. We plan on doing the over-training check we talking about for our last model (test on a small # of classes to see if loss -> 0).

We got some results (we literally just started training, and it's taking quite a while, so these results are only from epoch #2):

(^ I feel like this is bad)

Will have to dig in to this a bit more to see what these graphs actually mean, I figured I would include them because these are the graphs that Hong outputs.

We also are training on resnet18, we wish to try it on resnet50. Its also really slow (40min/epoch => 7hrs for 10 epochs). We are using ~25,000 images to train, and 25,000 images to validate (because we need to leave a gap between the classes, we can use the gaps as our validation data).

We plan to train the model more. And we plan to run a T-SNE on the data to visualize our embedding.

1

We started this week attempting to fix our weird batch size bug. We talked to Abby, and we determined that this (probably) some weird Keras bug. Abby also recommended that we switch to PyTorch, to stay consistent within the lab, and to avoid this weird bug.

Hong sent us his code for NPair loss, which we took a look at, and started to modify it to work with our dataset. However, its not as easy as just swapping in our images. Hong's model works by saying "we have N classes with a bunch of samples in each, train so X class is grouped together, and is far away from the other N-1 classes". The problem for us is that each images by itself is not in any class. It's only in a class relative to some other image (near or far). We believe our options for the loss function are this:

  1. Change N pair loss to some sort of thresholded N pair loss. This would mean the negatives we would be pushing away from would be, some fraction of the dataset we determine to be far away (for now I'll say 3/4).

if these are the timestamps of the images we have, and we are on image [0]:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The loss would try to push [0] close to [1, 2, 3] (the images we defined to be close to it, time-wise), and far away from images [4, 5, 6, 7, 8, 9].

1a. We could make this continuous instead of discrete (which I think makes more sense). Where the loss is proportional to the distance between the timestamps

2. Implement triplet loss in pytorch.

(Please let us know if this doesn't make any sense/ we have a fundamental misunderstanding of what we are doing)

Since last week, we gathered 20000 images. We visualized the triplets to ensure that our triplet creation code was working correctly. We discovered that the last 5000 images we were using were actually from a boat camera instead of the highway cam (we used a youtube video which must have autoplayed). So we had to cut down the images to 15000. We visually verified the triplet creation was correct for the rest of the images.

We also realized that our code was extra slow because we were loading the original resolution images and resizing all 15000 each time we ran. We took some time to resize and save all images beforehand, that way we don't have to waste time resizing every run.

We also had a quick issue where cv2 wasn't importing. We have absolutely no idea why this happened. We just reinstalled cv2 in our virtual environment and it worked again.

We are getting some weird errors when training now. We are a little confused as to why. For some reason, it appears that we need a batch size divisible by 8. This isn't so bad, because we can just choose a batch size that IS divisible by 8, but we just aren't sure WHY. If we don't do this we get an error that says: `tensorflow.python.framework.errors.InvalidArgumentError: Incompatible shapes: [8] vs. [<batch_size>]`. Has anyone seen this error before?

...some of them just look dynamic. Very convincingly so. Even have documentation on "backend framework", etc.

I learned that this week. Twice over. The first frontend template, which I spent chose for its appearance and flexibility (in terms of frontend components), has zero documentation. Zero. So I threw that out the window because I need some sort of backend foundation to start with.

After another long search, I finally found this template. Not only is it beautiful (and open source!) but it also has a fair amount of documentation on how to get it up and running and how to build in a "backend framework."  The demo website even has features that appear to be dynamic. 4 hours and 5 AWS EC2 instances later, after I tried repeatedly to (in a containerized environment!) re-route the dev version of the website hosted locally to my EC2's public DNS, I finally figured out it isn't. Long story short, the dev part is dynamic---you run a local instance and the site updates automatically when you make changes---but the production process is not. You compile all the Javascript/Typescript into HTML/CSS/etc and upload your static site to a server.

Now, after more searching, the template I'm using is this one, a hackathon starter template that includes both a configurable backend and a nice-looking (though less fancy) frontend. I've been able to install it on an EC2 instance and get it routed to the EC2's DNS, so it's definitely a step in the right direction.

My laundry list of development tasks for next week includes configuring the template backend to my liking (read: RESTful communication with the Flask server I built earlier) and building a functional button on the page where a user can enter a URL. Also, on a completely different note, writing an abstract about my project for GW Research Days, which I am contractually obligated to do.

This week I focused on designing the backend for the dynamic web app I'm developing. In my design, I tried to focus on the bare-bones functionality that I need to get the basic prototype working. I have never done dynamic web development before, so I spent a fair amount of time reading and looking into tutorials.

I picked a flexible, open-source frontend template that I really like, so I decided to start from there to choose the frameworks/languages I'll work with:  Javascript and .NET. I read several introductions to both to get some background, then some introductions to dynamic web dev (links posted soon).

After a fair amount of thought deliberation, and research into AWS' dynamic web development frameworks, here is the design I came up with for the dynamic backend:

explainED backend diagram
explainED backend

I also redesigned the schema for the user database to make it as simple as possible.

My tasks for the upcoming week are as follows:

  1. retrain the classifier and save the state_dict parameters
  2. learn about AWS Cognito and how it functions
  3. do some quick Javascript tutorials.
  4. create a nodeJS docker container for the template
  5. get the website template up and running locally
  6. make HTML button and form for entering a URL for analysis.

 

We modified our code a bit, to use ResNet without the softmax layer on the end. We then did our 'over training check', by training on a minimal number of triplets, to ensure the loss went to 0:

We then tried a test with all of our data (~500 images), to see if our loss continued to decrease:

One thing we found interesting is that when we train with all of our data, the val_loss converges to the margin value we used in our triplet loss (in this example margin = 0.02, and in other examples where we set margin=0.2, val_loss -> 0.2).

We believe this could be troubling if you look at this equation for triplet loss (https://omoindrot.github.io/triplet-loss):

L=max(d(a,p)d(a,n)+margin,0)

It appears to me that the only way for L = margin is if the distance from the anchor to the positive is the exact same as the distance from the anchor to the negative, whereas we want the positive to be closer to the anchor than the negative.

Dr. Pless recommended that we visualize our results to see if what we are training here actually means anything, and to use more data.

We set up our data-gathering script on lilou, which involved installing firefox, flash, and selenium drivers. The camera that we were gathering from before just happened to crash, so we spent some time finding a new camera. We are currently gathering ~20,000 images from a highway cam that we'll use to train on.

After this we will visualize our results to see what the net is doing. However we are a bit confused on how to do this. We believe that we could pass a triplet into the net, and check the loss, but after that, how could we differentiate a false positive from a false negative? If we get a high loss, does this mean it is mapping the positive too close to the anchor, or the negative too far from the anchor? Do we care?

Is there some other element to 'visualization' other than simply looking at the test images ourselves and seeing what the loss is?

 

After the feedback we got last week we now have a solid understanding of the concept behind triplet loss, so we decided to go ahead and work on the implementation.

We ran into lots of questions about the way the data should be set up. We look at Anastasija's implementation of triplet loss for an example. We used a similar process but with images as the data and ResNet as our model.

Our biggest concerns are making sure we are passing the data correctly and what the labels should be for the images. We grouped the images by anchor, positive, and negative, but other than that they don't have labels. We are considering using the time the image was taken as the label.

We have a theory that the labels we pass in to the model.fit() don't matter (??). This is based on looking at Anastasija's triplet loss function, which takes parameters y_true and y_pred, where it only manipulates the y_pred, and doesn't touch y_true at all.

def triplet_loss(y_true, y_pred):
    size = y_pred.shape[1] / 3

    anchor = y_pred[:,0:size]
    positive = y_pred[:,size: 2 * size]
    negative = y_pred[:,2 * size: 3 * size]
    alpha = 0.2
    
    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)
    basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
    loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)
    return loss

We are thinking that the loss function would be the one place that the labels (aka y_true) would matter. Thus if the labels aren't used here, they can just be arbitrary.

In Anastasija's model she adds an embedding layer, but, since we are using ResNet and not our own model, we are not. We are assuming this will cause problems, but we aren't sure where we would add it. We are still a little confused on where the output of the embedding network is. Will the embedded vector simply be the output of the network, or do we have to grab the embedding from somewhere in the middle of the network. If the embedded vector is the net's output, why do we see an 'Embedding' layer here in the beginning of the network Anastasija uses:

    model = Sequential()
    model.add(Embedding(words_len + 1,
                     embedding_dim,
                     weights = [word_embedding_matrix],
                     input_length = max_seq_length,
                     trainable = False,
                     name = 'embedding'))
    model.add(LSTM(512, dropout=0.2))
    model.add(Dense(512, activation='relu'))
    model.add(Dense(out_dim, activation='sigmoid'))
    ...
    model.compile(optimizer=Adam(), loss = triplet_loss)
 

If the embedding vector that we want actually is in the middle of the network, then what is the net outputting?

We tried to fit out model, but ran into an out of memory issue which we believe we can solve.

This week we hope to clear up some of our misunderstandings and get our model to fit successfully.

I spent a lot of the past week hacking with containers, AWS, and database code, making things work.

After finally getting permission from AWS to spin EC2 instances with GPUs, I've been able to test the Docker container I made for the classifier and its dependencies, including the Flask server for URL classification. To do this, I created a Dockerfile that starts with an Ubuntu 16 image (with CUDA 10) that installs all the necessary dependencies for running the classifier. In AWS, I spun up an EC2 instance with a GPU where I ran an image of the container I made. The communication between that EC2 and the container works fine, as does the communication between the Flask server and the classifier, but I'm having an issue loading and using the model on the EC2's GPU.

I also refactored the Flask application code itself, modifying the format for HTTP calls and improving how the JSON objects were structured, along with cleaning up the classifier code base substantially. Another feature I added to the system is a super simple DynamoDB database to keep track of all of the URLs each container receives, their predicted label, and whether a human has labeled them. I used Python to write and test basic scripts to read, query, add to, and update the database.

Tasks for next week include successfully loading and running the classifier on AWS and forging ahead with developing the dynamic web server.

This week I worked on an odd combination of tasks aimed towards deploying EDanalysis: in-depth prototyping of the explainED web app I'm building and containerizing the classifier. 

So, first, let's discuss paper prototyping the explainED app. Disclaimer: I don't know too much about UI/UX design, I'm not an artist, and I haven't taken a human-computer interaction class.

Here is an (extremely rough) digital mockup I made of the app a while ago: explainED clinician tool mockup

In this iteration of my UX design, I focused on the key functionality I want the app to have: displaying pro-ED trends and statistics, a URL analyzer (backed by the classifier), pro-ED resources, and a patient profile tab. I sketched out different pages, interactions, and buttons to get a feel of what I need to design (process inspired by this Google video). I still have a ways to go (and a few more pages to sketch out) but at least I have a better idea of what I want to implement.

explainED prototype drawings

On a completely different note, I continued towards my long-term goal of getting EDanalysis on AWS.

I made a Docker container from a 64-bit ubuntu image with pre-installed CUDA by installing all the classifier/system dependencies from source. The benefit of containerizing our pytorch application is that we can parallelize the system by spinning multiple instances of the containers. Then, on AWS, we could use Amazon's Elastic Container Service and throw in a load balancer to manage requests to each container, and boom: high concurrency.

In the upcoming week, I plan to spend time implementing the end-to-end functionality of the system on AWS: sending data from the SEAS server to the S3 bucket and then communicating with the pytorch container and starting to design the dynamic backend of the explainED webapp.