Skip to content

Hi everyone,

Here are a list of 20 fantastic! blogs about current machine learning research.

David Ahmadovhttps://blogs.gwu.edu/ahmedavid/
Farida Aliyevahttps://blogs.gwu.edu/ffaliyeva2022
Leyla Aliyevahttps://blogs.gwu.edu/leylaaliyeva
Ibrahim Alizadahttps://blogs.gwu.edu/ibrahim_alizada
Mustafa Aslanovhttps://blogs.gwu.edu/aslanovmustafa
Aydin Bagiyevhttps://blogs.gwu.edu/abagiyev
Aygul Bayramova https://blogs.gwu.edu/abayramova99/
Samir Dadash-zadahttps://blogs.gwu.edu/samirdadashzada
Habil Gadirlihttps://blogs.gwu.edu/hgadirli
Farid Jafarovhttps://blogs.gwu.edu/fjafarov
Narmin Jamalovahttps://blogs.gwu.edu/njamalova54
Steve Kaislerhttps://blogs.gwu.edu/skaisler/
Ilyas Karimovhttps://blogs.gwu.edu/ilyaskarimov
Kheybar Mammadnaghiyevhttps://blogs.gwu.edu/mammadnaghiyevk
Fidan Musazadehttps://blogs.gwu.edu/fmusazade
Natavanhttps://blogs.gwu.edu/ntakhundova/
Aykhan Nazimzadehttps://blogs.gwu.edu/anazimzada2020
Robert Plesshttps://blogs.gwu.edu/pless
Jalal Rasulzadehttps://blogs.gwu.edu/jrasulzade
Kamran Rzayevhttps://blogs.gwu.edu/kamran_rzayev
Ismayil Shahaliyevhttps://blogs.gwu.edu/shahaliyev

  1. There is a camera at the Tufandag Ski resort:

https://www.tufandag.com/en/skiing-riding/webcam/#webcam1

I think it shows video, and maybe there is a way to get a "live" still image from it. What can you do from many videos of this webcam? For example: can you predict the live weather parameters (wind speed or direction?) ? Can you highlight anomalous behaviors? Can you make a 3D model of that scene? For each of these problems, can you answer the Heilmeyer questions for this?

What could you do with many images from bird nest cameras?

There are YouTube streams of one box: https://www.youtube.com/watch?v=56wcz_Hl9RM and pages where you could write a program to save images over time: http://horgaszegyesulet.roszkenet.hu/node/1

2. Some live cameras give streams of audio and video:

(Many examples)

https://hdontap.com/index.php/video/stream/pa-farm-country-bald-eagle-live-cam

https://www.youtube.com/watch?v=2uabwdYMzV

Live Bar Scene

https://www.webcamtaxi.com/en/sound.html (tropical murphy's bar is good).

There is relatively little Deep Learning done that tries to think about one camera over very long time periods. Can you predict the sound from the video stream? Can you predict the video stream from the sound? Can you show the part of the image that is most correlated with the sound? Can you suppress the part of the sound that is unrelated to the video?

3. Some places give live video + text.

Twitch feeds have chat windows that are loosely aligned with the video. Live YouTube feeds also have a text chat.

https://www.youtube.com/watch?v=EEIk7gwjgIM

There is *lots* of work right now trying to merge the analysis of text and video, but very little that is done for one specific viewpoint or event. Can you build a system to:
(a) predict the chat comments that will come up from a video stream (given that you can train on *lots* of video from that specific video stream),

(b) Can you identify times in the video that will have more or less text?

(c) Can you show what part of the video is related to a text comment?

4. COVID image datasets

https://datascience.nih.gov/covid-19-open-access-resources

https://wiki.cancerimagingarchive.net/display/Public/COVID-19

I'm Robert Pless --- chair of the Computer Science Department at GWU, and I'd like to briefly introduce myself.

I was born in Baltimore, Maryland and have lived also in Columbus, Ohio, and Washington D.C. and Warsaw, Poland (although I was 4 at that time).

Within Computer Science, I work mostly on problems in Computer Vision (trying to automatically understand images), and Computational Geometry (building data structures for points and lines and shapes in space), and Machine Learning. A few of my favorite papers that I've written are here.

I'm especially interested in problem domains where new algorithms can help to improve social justice, healthier interactions with social media, and medical image understanding.

Outside of Computer Science, I have a four-and-a-half year old daughter who is learning to argue more and more effectively, and a grumpy dog. I'm interested in ultimate frisbee and modern art. My favorite artists are Dan Flavin and David Hockney, and I've written papers about the Art of Hajime Ouchi and Isia Leviant.

The goal is to have the website completely functional with an attractive, effective, efficient, and user friendly interface. The website should allow people such as CS PhD students and climate scientists to find needed data.

Currently, I have redone the "History" and "Dataset Info" pages, and have plans for the "About Us" page. Some other changes were made by the students during the group session that took place three weeks ago, including some work on the "Home" page. I have made further changes to the "Home" page. I have been looking through the camera images on the local server, and have selected certain images that I plan to use on some of the web pages.

My plan for the rest of this week and the next is to add the chosen images to the website, make additional changes, start the new "Publications" page, and hopefully finally get to see the updates on the website. I also would like the links to the "Cameras" pages to be functional. Then there is a list of other goals to be met.

This week, Resnet18 trained on Terra data dreamed with single scale to get rid of the confusion of multi-scaling. (i.e. set the layer of gaussian pyramid = 1 in deep dream algorithm, and feed the network image of size 224x224). Here is the result of using different criterion:

L2 norm criterion: (amplifying large output in all channels within certain layer)

Block 1                                          Block 2

Block 3:                                        Fc layer:

 

It can be observed that the size of repetitive pattern becomes larger when network goes deeper (because the deeper layer has larger receptive field). And the pattern becomes more complex (because deeper layer undergoes more non-linearity)

One-hot criterion: (maximizing single neuron in fc layer, each neuron represent a class. class 0, 8, 17 from left to right):

It can be observed that without the confusion of multi-scaling, there is still no recognizable difference between each class. In order to verify the validity of the algorithm, same experiment is done on resent-50 trained on ImageNet:

fish (middle):                                                  bird (up-right):

long hair dog (bottom-left):                        husky(middle right):

It is shown that one-hot criterion is capable of revealing what network encodes for different classes. Therefore, It is very likely that class 0, 8 and 17 in the previous figure  actually represent different features, but those features lacks semantic meaning and thus hard to be recognized. 

One possible reason behind this phenomenon is that terra dataset is relatively monotonic and the differences between each class is subtle. So network do not have to encode semantically meaningful high-level features to achieve good result. Instead, those unrecognizable feature may best represent the data distribution of each class.

The following experiments can be used as next step to verify these hypothesis:

  1. Mix the data in ImageNet with terra to make the classification harder. It is expected that high level structure such as sorghum will be learned.
  2. Only include class 0 and class 18 in dataset to make classification easier. The features for each class should have greater difference.
  3. Visualize the embedding of dream picture and data points. The dream picture should locate at the center of the data points.

 

Terra dataset contains 350,000 sorghum images from day 0 to day 57. Images from continuous 3 days are grouped into a class, forming 19 class in total. The following shows samples from each class:

All images are randomly divided into train set and test set with ratio 8:2. A Resnet18 pre-trained on ImageNet are fine-tuned on the train set (lr = 0.01, epoch = 30). The training history of network (with and without zero epoch) is the following:

  1. At epoch 0, train_acc and test_acc are both 5%. Resnet randomly predict the one of each class
  2. The first 3 epoch dramatically push the train_acc and test_acc to 80%
  3. Network converge to train_acc = 95% and test_acc = 90%

The confusion matrix on test set is the following:

When network makes wrong prediction, it mistakenly predict the sorghum image to neighboring class.

Several samples of wrong prediction is shown in the following:

At (2,4) and (5,5) network do not even predict neighboring classes. It can be seen that these images are not very 'typical' in their class. But the prediction is still hard to explain.

At (4,6) the image is 'typical' in class 1. but predicted to class 5, which is mysterious.

Deepdream is applied to the network to reveal what the network learns:

The structure of resnet18 is given as follow:

An optimization of output of conv2_x, conv3_x, conv4_x, conv5_x and fc layer is conducted:

original image:

conv2,3,4,5:

fc layer:

As the receptive field increase, it can be observed that network learns more complex local structure (each small patch becomes less similar) instead of global structure (a recognizable plant). Maybe the local texture is good enough to classify the image?

 

 

In the past few days, I learned about SVM, IsoMap and followed the link on slack to read about CapsNet and Using Causal Effect to explain classifier.

Here is some intuition about CapsNet:

As far as I understand, CapsNet groups several neurons together so that the 'feature map' in CapsNet consists of vectors instead of scalars.

This design allows variation in certain representation in feature map. So it encourage different view of same object to be represented in the same capsule.

It also use coupling coefficient to replace max-pooling procedure in traditional CNN. (the procedure from primary caps to digit caps corresponds to global pooling)

This design encourage CapsNet explicitly encode the part-whole relationship. So that the lower level feature tends to be the spacial parts of high level feature.

The paper shows that CapsNet performs better in recognizing overlapping digits than traditional CNN on MNIST dataset.

May be CapsNet will have better performance in dataset consists of more complicated objects?

In this week:

  1. I finished training my first network on terra dataset. The network is trained on 1000 random samples from each class with data-augmentation.

The result looks like this:

The fluctuation of validation accuracy implies too small dataset or too large learning rate. For the first issue, I intend to over sample the minority class and used all data from majority class. For the second issue, I intend to use learning rate decay. I have finish coding, but training the network with all data will takes longer time.

My confusion about the above is: Based on experience, how difference the class size should be in order to be described as imbalanced data? My class size range from 1k to 30k. Is it reasonable to oversample the minority, so that all classes stays in range of 15k to 30k?

 

  1. I have finished the code for confusion matrix, but the code cannot generate meaningful result because I cannot differentiate train-set and test-set now. I have solved this problem by fixing seed in the random split of train-set and test-set. I hope we can use

 

  1. While wait for the program to run, I study the paper about finding the best bin and learned the rationale behind PCA. (I finally understand why we need those mathematical procedures)

 

The next step will be:

Waiting for the training of the next network with all above improvement. Read the paper about ISO map, and other papers introduced in previous paper discussion sections. (I found the record on slack)

Terra data contains about 300,000 sorghum images of size (3000x2000). Taken in 57 days from April to June.

I group 3 days into a class. Here is some sample from each classes:

we want to train a network to predict the growing stage (i.e. date) of the plant based on the structure of the leaves. Therefore we need to crop the image to fit the input size of the network.

I used two ways to crop the image:

  1. use a fixed size bounding box to crop out the part of the image that most likely to be a plant. Here is some samples:

This method will gives you images with the same resolution, but ignore the global structure of the large plant that we may interested in. (such as flower)

2. the bounding box is size of a whole plant, then rescale the cropped image into fixed size:

This method allows network to cheat:predict the date based on resolution. Instead of the structure of the plant.

Another issue is about noise: both method will gives you images like these:

I don't know how frequently will these noise appear in the whole dataset and whether it is necessary to improve the pre-process method to get rid of these.

The next step will be improving these methods and train a network following one of them.

 

 

One of the fundamental challenges and requirements for the GCA project is to determine where water is especially when water is flooding into areas where it is not normally present.  To this end, I have been studying the flooding in Houston that resulted from hurricane Harvey in 2017.  One of the specific areas of interests (AOI) is centered around the Barker flood control station on Buffalo Bayou.

To get an understanding of the severity of the flooding in this area, this is what the Barker flood control station looked like on December 21, 2018...

And this is what the Barker flood control station looked like on August 31, 2017...

Our project specifically explores how to determine where transportation infrastructure is rendered unusable by flooding.  Our first step in the process is to detect where the water is.  I have been able to generate a water mask by using the near infrared band available on the satellite that took these overhead photos.  This rather simple water detection algorithm produces a water mask that looks like this...

If the mask is overlayed onto the flooded August 31, 2017 image, it suggests that this water detection approach is sufficient for detecting deep water...

There are specific areas of shallow water that are not detected by the algorithm; however, parameter tuning increases the frequency of false positives.  There are other approaches that are available to us; however, our particular contribution to the project is not water detection per se and other contributors are working on this problem.  Our contribution is instead dependent on water detection, so this algorithm appears to be good enough for the time being.  We have already run into some issues with this simplified water detection, namely that trees obscure the sensor which causes water to not be detected in some areas.

Besides water detection, our contribution also depends on road networks.  Again, this is not a chief contribution of our project and others are working on it; however, we require road information to meet our goals.  To this end, we used Open Street Maps (OSM) to pull the road information near the Barker Flood Control AOI and to generate a mask of the road network.

By overlaying the road network onto other imagery, we can start to see the extent of the flooding with respect to road access.

Our contribution looks specifically at road traversability in flooded areas, so by intersecting the water mask generated through a water detection algorithm with the road network, we can determine where water covers the road significantly and we can generate a mask for each of the passable and impassible roads.

 

The masks can be combined together and layered over other images of the area to provide a visualization of what roads are traversable.

The big caveat to the above representations is that we are assuming that all roads are passable and we are disqualifying roads that are covered with water.  This means that the quality of our classification is heavily dependent on the quality of water detection.  You can see many areas that are indicated to be passable that should not be.  For example, the shaded box in the following image illustrates where this assumption breaks down...

The highlighted neighborhood in the above example is almost entirely flooded; however, the tree cover in the neighborhood has masked much of the water where it would intersect the road network.  There is a lot of false negative information, i.e. water not detected and therefore not intersected, so these roads remain considered traversable while expert analysis of the overhead imagery suggests the opposite.

We are also combining our data with Digital Elevation Models (DEM) which are heightmaps of the area which can be derived from a number of different types of sensor.   Here is a heightmap from the larger AOI that we are studying derived from the Shuttle Radar Topography Missions (SRTM) conducted late in the NASA shuttle program.  This is a sample from the SRTM data of the larger AOI we are studying...

Unfortunately, the devil is in the details and the resolution of the heightmap within the small Barker Flood Control AOI is very bad...

A composite of our data shows that the SRTM data omits certain key features, for example the bridge across Buffalo Bayou is non-existent and that our water detection is insufficient, the dark channel should show a continuous band of water due to the overflow.

The SRTM data is unusable for our next steps, so we are exploring DEM data from a variety of sources.  Our goal for the next few days is to assess more of the available and more current DEM data sources and to bring this information into the pipeline.