Skip to content

This lab is trying an experiment --- a distributed approach of exploring following idea:

"Given many images of one scene, predicting the time an image was taken is very useful. Because you have to have learned a lot about the scene to do that well, the network must learn good representations to tell time, and those are likely to useful for a wide variety of other tasks"

So far we've made some progress (which you can partially follow in the #theworldisfullofclocks slack channel), with a start on framing the problem of: Given many images of a scene, how can you tell what time it is?

This google doc already lays out what are reasonable approaches to this problem. Here I want to share some visualizations that I want to make as we try to debug these approaches. These visualizations are of the data, and of the results of the data.

  • An annual summary montage, with rows organized as "day of the year" and columns organized as "time of day" (maybe subselecting days and times to make the montage feasible)
  • A daily summary montage with *all* the images from on day of a camera shown in a grid.
  • An "average day" video/gif that shows the average 7:00am image (averaged over all days of the year), and average 7:10a image... etc.

Kudos to everyone who has started to work on this; I think we have some good ideas of directions to go!

Hi everyone,

Here are a list of 20 fantastic! blogs about current machine learning research.

David Ahmadovhttps://blogs.gwu.edu/ahmedavid/
Farida Aliyevahttps://blogs.gwu.edu/ffaliyeva2022
Leyla Aliyevahttps://blogs.gwu.edu/leylaaliyeva
Ibrahim Alizadahttps://blogs.gwu.edu/ibrahim_alizada
Mustafa Aslanovhttps://blogs.gwu.edu/aslanovmustafa
Aydin Bagiyevhttps://blogs.gwu.edu/abagiyev
Aygul Bayramova https://blogs.gwu.edu/abayramova99/
Samir Dadash-zadahttps://blogs.gwu.edu/samirdadashzada
Habil Gadirlihttps://blogs.gwu.edu/hgadirli
Farid Jafarovhttps://blogs.gwu.edu/fjafarov
Narmin Jamalovahttps://blogs.gwu.edu/njamalova54
Steve Kaislerhttps://blogs.gwu.edu/skaisler/
Ilyas Karimovhttps://blogs.gwu.edu/ilyaskarimov
Kheybar Mammadnaghiyevhttps://blogs.gwu.edu/mammadnaghiyevk
Fidan Musazadehttps://blogs.gwu.edu/fmusazade
Natavanhttps://blogs.gwu.edu/ntakhundova/
Aykhan Nazimzadehttps://blogs.gwu.edu/anazimzada2020
Robert Plesshttps://blogs.gwu.edu/pless
Jalal Rasulzadehttps://blogs.gwu.edu/jrasulzade
Kamran Rzayevhttps://blogs.gwu.edu/kamran_rzayev
Ismayil Shahaliyevhttps://blogs.gwu.edu/shahaliyev

  1. There is a camera at the Tufandag Ski resort:

https://www.tufandag.com/en/skiing-riding/webcam/#webcam1

I think it shows video, and maybe there is a way to get a "live" still image from it. What can you do from many videos of this webcam? For example: can you predict the live weather parameters (wind speed or direction?) ? Can you highlight anomalous behaviors? Can you make a 3D model of that scene? For each of these problems, can you answer the Heilmeyer questions for this?

What could you do with many images from bird nest cameras?

There are YouTube streams of one box: https://www.youtube.com/watch?v=56wcz_Hl9RM and pages where you could write a program to save images over time: http://horgaszegyesulet.roszkenet.hu/node/1

2. Some live cameras give streams of audio and video:

(Many examples)

https://hdontap.com/index.php/video/stream/pa-farm-country-bald-eagle-live-cam

https://www.youtube.com/watch?v=2uabwdYMzV

Live Bar Scene

https://www.webcamtaxi.com/en/sound.html (tropical murphy's bar is good).

There is relatively little Deep Learning done that tries to think about one camera over very long time periods. Can you predict the sound from the video stream? Can you predict the video stream from the sound? Can you show the part of the image that is most correlated with the sound? Can you suppress the part of the sound that is unrelated to the video?

3. Some places give live video + text.

Twitch feeds have chat windows that are loosely aligned with the video. Live YouTube feeds also have a text chat.

https://www.youtube.com/watch?v=EEIk7gwjgIM

There is *lots* of work right now trying to merge the analysis of text and video, but very little that is done for one specific viewpoint or event. Can you build a system to:
(a) predict the chat comments that will come up from a video stream (given that you can train on *lots* of video from that specific video stream),

(b) Can you identify times in the video that will have more or less text?

(c) Can you show what part of the video is related to a text comment?

4. COVID image datasets

https://datascience.nih.gov/covid-19-open-access-resources

https://wiki.cancerimagingarchive.net/display/Public/COVID-19

I'm Robert Pless --- chair of the Computer Science Department at GWU, and I'd like to briefly introduce myself.

I was born in Baltimore, Maryland and have lived also in Columbus, Ohio, and Washington D.C. and Warsaw, Poland (although I was 4 at that time).

Within Computer Science, I work mostly on problems in Computer Vision (trying to automatically understand images), and Computational Geometry (building data structures for points and lines and shapes in space), and Machine Learning. A few of my favorite papers that I've written are here.

I'm especially interested in problem domains where new algorithms can help to improve social justice, healthier interactions with social media, and medical image understanding.

Outside of Computer Science, I have a four-and-a-half year old daughter who is learning to argue more and more effectively, and a grumpy dog. I'm interested in ultimate frisbee and modern art. My favorite artists are Dan Flavin and David Hockney, and I've written papers about the Art of Hajime Ouchi and Isia Leviant.

Glitter pieces which are measured off, predicted on, and the prediction follows what we would actually expect

There are quite a few pieces of glitter which are measured to have an intensity of 0 (or < 0.1 normalized) in the test image, but then predicted to have an intensity        > 0.6 normalized. If we look at the intersection of the scan lines which correspond to the intensity plots of this centroid, we see that they intersect right around where the light was displayed on the monitor for the test image. So we would expect to see this centroid in our test image.

 

 

 

 

 

 

 

Below, I have shown the centroid on the image (pink dot), and we can verify that we don't see a glitter piece lit there, nor are there any lit pieces nearby. If there was another lit piece close to this centroid, we may believe that we just matched the centroid found in the test image to the wrong centroid from our glitter characterization. This does not seem to be the case though, so it's a bit of a mystery.

UPDATE: we have ideas 🙂

  1. threshold intensities the same way I did in my gaussian optimization (finding receptive fields) in the camera calibration error function
  2. throw out gaussians that are "bad"..."bad" can mean low eigenvalue ratio or intensity plots not well-matched

Aperture vs. Exposure

I played around with the aperture of the camera, and took many pictures with varying aperture - as the f-number decreases (aperture becomes larger), we see that a lot more light is being let in. As this happens, it seems that the center of a glitter piece slowly shifts, probably because if the exposure remains the same for all aperture settings, some images will have saturated pixels. The image below shoes multiple apertures (f/1.8, f/2, f/2.2,  ..., f/9), where the exposure was kept the same in all of the images. We can see that in some of the smaller f-number images, there seem to be some saturated pixels.

 

 

 

 

 

 

 

 

After realizing that the exposure needed to be different as the aperture varies, I re-took the pictures with f-number f/1.8, f/4 and f/8, and varied the shutter speed and it looks like the shift happens less if image taken with some exposure and low f-number look similar in brightness to an image taken with some exposure and a higher f-number.

 

 

 

 

 

 

 

 

Next Steps

  1. fix the intensity plots in the camera calibration and re-run
  2. try throwing out weird-gaussian centroids and re-run calibration
  3. take iphone pictures with the XR, figure out which exposure works best with its aperture (f/1.8) so that we get images that look similar to our camera images (f/8, 1 second exposure)

Goal: I am working on accurately calibrating a camera using a single image of glitter.

Paper Title: SparkleCalibration: Glitter Imaging Model for Single-Image Calibration ... maybe

Venue: CVPR 2020

For the last week, I have been specifically working on getting the calibration pipeline and error function up and running using the receptive fields of the glitter. Now, in the error function, I am predicting the intensity of each piece of glitter, and comparing this predicted intensity to the actual intensity (measured from the image we capture from the camera). I am also using a single receptive field for all of the glitter pieces instead of different receptive fields for each glitter piece, because we found that there were enough 'bad' receptive fields to throw off the normalization of the predicted intensities in the optimization.

 

 

 

 

 

 

 

 

 

This plot shows the predicted vs. measured intensities of all of the glitter pieces we "see" in our image (many have very low intensities since most glitter pieces we "see" are not actually lit. Here we see that there is a slightly noticeable trend along the diagonal as we expect to see. The red points are the glitter pieces which are incorrectly predicted to be off, the green points are the glitter pieces which are correctly predicted to be on, the black points are the glitter pieces which are incorrectly predicted to be on, and the rest are all of the other glitter pieces.

I also tried using the old on/off method for the error function (distance from the light location as the error function) and found that the results were quite a bit worse than the receptive field method (yay!)

Goal: My next tasks are the following:

  • search for the right gaussian to use for all the glitter pieces as the receptive field
  • run the checkerboard calibration for my currently physical setup
  • re-take all images in base and shifted orientations, re-measure the physical setup, take 'white-square' test images in this setup, and maybe some iphone pictures

 

The goal is to have the website completely functional with an attractive, effective, efficient, and user friendly interface. The website should allow people such as CS PhD students and climate scientists to find needed data.

Currently, I have redone the "History" and "Dataset Info" pages, and have plans for the "About Us" page. Some other changes were made by the students during the group session that took place three weeks ago, including some work on the "Home" page. I have made further changes to the "Home" page. I have been looking through the camera images on the local server, and have selected certain images that I plan to use on some of the web pages.

My plan for the rest of this week and the next is to add the chosen images to the website, make additional changes, start the new "Publications" page, and hopefully finally get to see the updates on the website. I also would like the links to the "Cameras" pages to be functional. Then there is a list of other goals to be met.

(1) The one sentence scientific/engineering goal of your project

The current goal is to train a classifier to classify the "Awned/Awnless" classes.

(2) Your target end-goal as you know it (e.g. paper title, possible venue).

The further goal we imagine is to predict the "Heading Percentage", then train a model to find the "Wheat Spikes", the seed part of the wheat, to prove we can do it from UAV data.

(3) Your favorite picture you’ve created relative to your project since you last posted on the blog.

In the dataset, there is only one plot have the phenotype value "Mix", which means in this plot, the wheat have both awned and awnless phenotype. (It seems interesting, but we decided to remove them form the dataset first.)

(4) Your specific plan for the next week.

We imagine to finishing training the simple classifier this week, and see if we need to do more improvement work on it, or we can move on to the next step.

The car dataset I use contains 8131 images of 64 dimensions. These data have 98 classes which had been labeled form 0 to 97, there are about 60 to 80 images of each class. What I am trying to do is instead of using Euclidean Distance or Manifold Ranking to query only one image, use Manifold Ranking to query two different images but in the same class at same time, to improve the accuracy.

Example results of one node pair(Green border images means the image has the same class as the query nodes, red ones means not same class) :

Node pair [0, 1]:

Using Euclidean Distance query 2 nodes separately and their ensemble result:

Using Manifold Ranking query 2 nodes separately:

Manifold Ranking ensemble result and the result of using our Two-Node Query to query them at same time:

 

 

Node Pair [1003, 1004]:

Using Euclidean Distance query 2 nodes separately and their ensemble result:

Using Manifold Ranking query 2 nodes separately:

Manifold Ranking ensemble result and the result of using our Two-Node Query to query them at same time:

 

Node Pair [3500, 3501]:

Using Euclidean Distance query 2 nodes separately and their ensemble result:

Using Manifold Ranking query 2 nodes separately:

Manifold Ranking ensemble result and the result of using our Two-Node Query to query them at same time:

 

The improved algorithm is as follows:

1. Use faiss library and setting nlist=100nprobe=10, to get the 50 nearest neighbors of the two query nodes. The query nodes are different but in the same class.(faiss library use Cluster Pruning Algorithm, to split the dataset into 100 nlist(cluster), each cluster has a leader, choose 10(nprobe) nearest clusters of the query point and find the K nearest neighbors.)

2. To simplify the graph, just use the MST as the graph for Manifold Ranking. In other words, now we have two adjacency matrices of the two query node.

3. Create a Link Strength Matrix, which at first is a 0 matrix has the same shape as the Adjacency Matrix, and if the there is a node near the 1st query point has the same class as the other node near the 2nd query point, then give a beta value to create an edge between these two nodes in the Link Strength Matrix.

4. Splicing Matrices, for example, the Adjacency Matrix of the 1st query point at top left, the Link Strength Matrix at top right and the transposed matrix of it at bottom left, the Adjacency Matrix of the 2nd query point at bottom right.

5. Normalizing the new big matrix as the Manifold Ranking does. However, when computing the Affinity Matrix, use the Standard Deviation of the non-zero values of the two pre-Adjacency Matrices only, to make the curve converge at ensemble result when the beta value is large.

6. Giving the two query nodes both 1 signal strength and others 0 when initialization. Then get the Manifold Ranking of all nodes.

The following plot shows the mAP Score of different beta values for all node pairs in the dataset, which means 8131 images in 97 classes, over 330k node pairs. I give the top ranked node score 1 if it has the same class as the query nodes and score 0 if not, the n-th top ranked node score 1/n if it has the same class as the query node and score 0 if not. As we can see, as the beta value increase, the mAP Score got the maximum value when the beta value at 0.7-0.8, which is better than only use one query node and the ensemble result of two query nodes.

 

 


My next step is to find if I can do some improvement to the time complexity of the algorithm, and try to improve the algorithm.

This week, Resnet18 trained on Terra data dreamed with single scale to get rid of the confusion of multi-scaling. (i.e. set the layer of gaussian pyramid = 1 in deep dream algorithm, and feed the network image of size 224x224). Here is the result of using different criterion:

L2 norm criterion: (amplifying large output in all channels within certain layer)

Block 1                                          Block 2

Block 3:                                        Fc layer:

 

It can be observed that the size of repetitive pattern becomes larger when network goes deeper (because the deeper layer has larger receptive field). And the pattern becomes more complex (because deeper layer undergoes more non-linearity)

One-hot criterion: (maximizing single neuron in fc layer, each neuron represent a class. class 0, 8, 17 from left to right):

It can be observed that without the confusion of multi-scaling, there is still no recognizable difference between each class. In order to verify the validity of the algorithm, same experiment is done on resent-50 trained on ImageNet:

fish (middle):                                                  bird (up-right):

long hair dog (bottom-left):                        husky(middle right):

It is shown that one-hot criterion is capable of revealing what network encodes for different classes. Therefore, It is very likely that class 0, 8 and 17 in the previous figure  actually represent different features, but those features lacks semantic meaning and thus hard to be recognized. 

One possible reason behind this phenomenon is that terra dataset is relatively monotonic and the differences between each class is subtle. So network do not have to encode semantically meaningful high-level features to achieve good result. Instead, those unrecognizable feature may best represent the data distribution of each class.

The following experiments can be used as next step to verify these hypothesis:

  1. Mix the data in ImageNet with terra to make the classification harder. It is expected that high level structure such as sorghum will be learned.
  2. Only include class 0 and class 18 in dataset to make classification easier. The features for each class should have greater difference.
  3. Visualize the embedding of dream picture and data points. The dream picture should locate at the center of the data points.