Skip to content

2

After our less than stellar results last week, we talked to Dr. Pless, and decided to pivot the goal of our project.

Instead, our new model will be designed to take 2 images from the same traffic camera, and output the time that has passed between the 2 frames. (ex. if one is taken at 12:30:34, the next at 12:30:36, we should get an output of 2seconds),

The reasoning behind this is, in order to be able to distinguish time differences, the network must learn to recognize the moving objects in the images (ie. cars, trucks, etc.). This way, we are forcing it to learn what vehicles look like, keep track of colors, keep track of vehicle sizes, etc, without having to label every single one individually.

In order to accomplish this, we need to learn about embedding, so that we can create 2 feature vectors, which should represent the similarity between images. This can then be used to train the network to actually detect time differences.

What we know about embedding

We know that an embedding maps an image to a vector in high dimensional space, which will represent different features of the image. Similar images will map to similar places, thus we can use this to gauge similarity.

We found some examples of embedding networks online to learn a bit about how they work. One example used ResNet50, pretrained on imagenet. To get the embedding vector, it passes an image through this network up to the 'avg_pool' layer. This layer output is considered the embedding.

We understand that, because this net is trained on image classification, it must learn some features about the image, and getting an intermediate layer output should show us the 'high dimensional space vector'.

What we don't understand is: what do we train our embedding network on? It seems that there is some initial task that creates a net that will create weights that relate to the objects in the image. Our final task will be to get the time diff between 2 images, but I don't believe we can train our network initially for this task. If we did try this, and were successful in just training a net that takes 2 images as input, then we wouldn't need the embedding (maybe we would still use it for visualization?). But we believe we need some initial task to train a network about our images, that will make it learn the features in some way first. Then, we can use some intermediate layer of this network to extract the embedding, which then could be passed to some other network that takes the vector as input, and whose output will be this time diff.


We also gathered some images from a higher framerate camera (we gathered at ~3images/second). We needed these over AMOS cameras because we need to detect smaller scale time differences, 30 min would be way too long, and any cars in the image would be long gone.

 

A quick recap, our problem is that we want to identify cars in traffic cams according to 2 categories (1. Color, 2. Car type). Each of these have 8 possible classes (making for a total of 64 possible combination classes).

Our preliminary approach is to simply create 2 object detectors, 1 for each category.

We successfully trained these 2 neural nets using the same RetinaNet implementation that worked well last semester for our corrective weight net.

We used the ~1700 labels from the SFM to train, and got some results. However, definitely not as great as we would have hoped. Here are some of our test images:

Color:

Type:

 

As you can see, it sometimes is right, sometimes is wrong, but it also just misses many of the vehicles in the image (like in the 1st 'Color' image). In addition, the confidence is pretty low, even when it gets it correct.

 

Clearly, something is wrong. We're thinking that its probably just a hard problem due to the nature of the data. For the color, its understandable that it might not be able to get more rare/ intermediate colors such as red or green, but some cars which were clearly white were getting a black label, or vice-versa, with the same confidence scores as when it was actually correct. We're not sure why this would be the case for some.

 

For the next week, we will work on getting to the root of the issue, as well as trying to brainstorm more creative ways to tackle this problem.

My priority this week has been implementing the system architecture for my EDanalysis senior project/research on Amazon Web Services (AWS). First, I'll briefly  introduce the project then dive into what I've been up to this week with AWS.

For this project, we trained an instance of the ResNet convolutional neural network to recognize pro-eating disorder images, with the aim of developing software tools (called EDanalysis) to improve eating disorder treatment and patient health outcomes. For more information, check out this video I made describing the project's vision, featuring a sneak peek of some of the software we're building!

This week, we had a 70% Project Demo for GW's CS Senior Design class (see more about the Senior Design aspects of my project here!). My 70% demo goals involved setting up my project on AWS, which is a first for me. My rationale for choosing AWS as a cloud service provider was simple: our project's goal is to publicly deploy the EDanalysis tools; hence, whatever system we make needs room to grow. To my knowledge, AWS offers unparalleled design flexibility--especially for machine learning systems--at web scale (wow, buzzword alert). Disclaimer: my current AWS system is optimized for cost-efficiency (for Senior design purposes ;-)), but I plan to someday use an AWS ECS instance and other beefier architectures/features.

The EDanalysis system has 3 main parts: the R&D server, the website architecture/ backend, and the frontend components, which I refer to as parts 1, 2, and 3 below.

A detailed view of the EDAnalysis system with a focus on its AWS components
EDanalysis AWS System

This week, I completed the following:

  • part 1: communication from the R&D server to the S3 bucket
  • part 2: communication from the R&D server to the S3 bucket triggers a lambda function that forwards the passed data to the EC2 instance
  • part 2: a modification of the classifier testing script to download a single image from an input URL, run it through the classifier, and return its classification
  • part 2: a proof-of-concept script  for the pytorch EC2 instance that creates a Flask server that adheres to the REST API, communicates with the classifier and passes it an image url in JSON format, runs the classifier on that url, and passes back its classification to the server
  • the AWS design and architecture above

For the above, I found the Designing a RESTful API using Flask-RESTful and AWSeducate tutorials to be most useful.

My goals for next week are the following:

  • containerizing the classifier environment so it's easier to deal with all the requirements
  • instantiating the pytorch EC2 instance on AWS and getting the classifier and Flask server working there
  • instantiating the user database with DynamoDB (first, modifying my old MySQL schema)
  • cleaning up the Flask server code and accompanying classifier test code
  • experimenting with (outgoing) communication from GW SEAS servers to my AWS S3 bucket

Here's to learning more about ~the cloud~ and tinkering around with code!