Skip to content

In this week:

  1. I finished training my first network on terra dataset. The network is trained on 1000 random samples from each class with data-augmentation.

The result looks like this:

The fluctuation of validation accuracy implies too small dataset or too large learning rate. For the first issue, I intend to over sample the minority class and used all data from majority class. For the second issue, I intend to use learning rate decay. I have finish coding, but training the network with all data will takes longer time.

My confusion about the above is: Based on experience, how difference the class size should be in order to be described as imbalanced data? My class size range from 1k to 30k. Is it reasonable to oversample the minority, so that all classes stays in range of 15k to 30k?

 

  1. I have finished the code for confusion matrix, but the code cannot generate meaningful result because I cannot differentiate train-set and test-set now. I have solved this problem by fixing seed in the random split of train-set and test-set. I hope we can use

 

  1. While wait for the program to run, I study the paper about finding the best bin and learned the rationale behind PCA. (I finally understand why we need those mathematical procedures)

 

The next step will be:

Waiting for the training of the next network with all above improvement. Read the paper about ISO map, and other papers introduced in previous paper discussion sections. (I found the record on slack)

1. Nearest Neighbor Classifier

NN (Nearest Neighbor Classifier):

This classifier has nothing to do with Convolutional Neural Networks and it is very rarely used in practice, but it will allow us to get an idea about the basic approach to an image classification problem.

One of the simplest possibilities is to compare the images pixel by pixel and add up all the differences. In other words, given two images and representing them as vectors I1,I2 , a reasonable choice for comparing them might be the L1 distance:

...continue reading "kNN, SVM, Softmax and SGD"

In the past few days I've been working on finding away to approximate the wavelengths of lights in photos. This isn't technically possible to do with complete accuracy, but I was able to write a script that can approximate the wavelength with some exceptions (mentioned later in the post).

Using data from a chart that has the XYZ colorspace of all monochromatic wavelengths of visible light, I was able to convert my RGB image to XYZ colorspace and find the closest wavelength value to the XYZ values of the pixel. I've produced a histogram of this script in action on an entire image, however I also produced a smaller version of the script that can be used as a function and convert one pixel at a time from an RGB value to wavelength value.

Here is the histogram of wavelengths of light found in the photo:
Here is a histogram of the hues present in the same photo:

As you can see, there are a few issues with the wavelength approximations. Some color hues (most notably pinks and purples) cannot be produced using a single wavelength. My script matches the RGB value of the pixel to the closest monochromatic wavelength, and therefore colors that cannot be created with one wavelength are approximated to the closest component wave, which can explain why there are larger amounts of blue and red light shown in the wavelength histogram.

This behavior seems fairly consistent across images, but in the future I might have to figure out a way to approximate these more complex colors if we find that more accuracy is needed. For now this script gives us an accurate enough idea of the wavelengths of light produced in the image in order to begin using holographic light diffraction grating equations to approximate other information such as the angle of the incoming wavelength before the diffraction occurred.

View Data

These images are labeled satellite image chips with atmospheric conditions and various classes of land cover/land use. Resulting algorithms will help the global community better understand where, how, and why deforestation happens all over the world - and ultimately how to respond.

...continue reading "Training Model use Dataset with Multi-Labal"

Terra data contains about 300,000 sorghum images of size (3000x2000). Taken in 57 days from April to June.

I group 3 days into a class. Here is some sample from each classes:

we want to train a network to predict the growing stage (i.e. date) of the plant based on the structure of the leaves. Therefore we need to crop the image to fit the input size of the network.

I used two ways to crop the image:

  1. use a fixed size bounding box to crop out the part of the image that most likely to be a plant. Here is some samples:

This method will gives you images with the same resolution, but ignore the global structure of the large plant that we may interested in. (such as flower)

2. the bounding box is size of a whole plant, then rescale the cropped image into fixed size:

This method allows network to cheat:predict the date based on resolution. Instead of the structure of the plant.

Another issue is about noise: both method will gives you images like these:

I don't know how frequently will these noise appear in the whole dataset and whether it is necessary to improve the pre-process method to get rid of these.

The next step will be improving these methods and train a network following one of them.

 

 

The last week has seen a few dozen changes and improvements to the flow and functionality of the web app for Project rePhoto. Some changes of note: I added the ability to create new projects and subjects (requiring one subject with one entry to be created when the project is), the overlay will always be the first photo that was uploaded to the subject, and you can upload photos to a subject as a URL, which will then be converted to an .png image by a Python script I wrote this week.

Here’s a diagram I made to map out the flow of the web app, which is reflected in the current version of the web app (though it is a skeleton version without any backend functionality): rePhoto-flow

I also spent a decent amount of time earlier this week taking a deeper look at and getting more comfortable with Django. This is my current plan for the file management for the functionality of the web app: rePhoto-file-mgmt

I’ve also been working, without too much luck, on saving the photos locally on the devices they are taken on, and am working today on getting the “hold down to save” functionality working (which would be clunky but users are relatively familiar with doing this on their phones).

A few questions that I’ve been working towards answering:

  • When making the choice to require each new project to have at least one subject, with one entry, I thought about projects like RinkWatch (a collection of ice rinks around North America) and Scenic Overlooks (a compilation of views from various hikes), which have had a lot of subjects, but very few entries. Should we require the user to add an entry to each subject when it is created, even though we would no longer be able to allow for projects like RinkWatch?
  • Instead of providing a list of subjects, which would require our users to know the exact name of the subject they’re looking to add to, I thought it might be useful to use an Open Street Maps API, like the one that already exists in the View Projects section of the rePhoto website, to allow users to be able to view the subjects on a map before choosing one to contribute to. Kartograph (https://kartograph.org/) also seems to be another option, so I’m going to do some research in the next week about which one fits the goals of our project better, and would love any input on which might be best or why Open Street Maps was chosen for the Project rePhoto site a few years ago.
  • Is it best to keep the Projects -> Subjects -> Entries pipeline? Otherwise, we could use tags to connect subjects to each other, which would serve the organizational purpose of projects. Why I’m excited about tags- many people, across geographic boundaries, can see each other’s work; for example, with a “home-improvement” tag, someone in Alaska and another in Brazil could both be building outdoor fireplaces, and get ideas from watching the evolution of each other’s projects. My only concerns- if a user is not required to put any tags on their subjects, we could have a lot of subjects that wouldn’t be tied to anything, making the process of organization and re-finding subjects a bit more tedious.

I’d love any feedback on what I’ve done and what should be done next, and happy Friday! 🙂

I tried to creating my own dataset from internet image resources this week, and it works well.

 

Get a list of URLs
Go to Google Images and search for the images you are interested in. The more specific you are in your Google Search, the better the results and the less manual pruning you will have to do.

Now you must run some Javascript code in your browser which will save the URLs of all the images you want for you dataset. Press Ctrl+Shift+J in Windows/Linux and Cmd+Opt+J in Mac, and a small window the javascript 'Console' will appear. That is where you will paste the JavaScript commands.

You will need to get the urls of each of the images. You can do this by running the following commands:

urls = Array.from(document.querySelectorAll('.rg_di        .rg_meta')).map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));

...continue reading "Creating Your Own Dataset from Google Images and Training a Classifier"

We've been playing around with different pooling strategies recently -- what regions to average over when pooling from the final convolutional layer to the pooled layer (which we sometimes use directly in embedding, or which gets passed into a fully connected layer to produce output features). One idea that we were playing with for classification was to use class activation maps to drive the pooling. Class activation maps are a visualization approach that visualize which regions contributed to a particular class.

Often we make these visualizations to understand what regions contributed to the predicted class, but you can actually visualize which regions "look" like any of the classes. So for ImageNet, you can produce 1000 different activation maps ('this is the part that looks like a dog', 'this is the part that looks like a table', 'this is the part that looks like a tree').

The CAM Pooling idea is to then create 1000 different pooled features, where each filter of the final conv layer is pooled over only the 'active' regions from the CAM for each respective class. Each of those CAM pooled features can then be pushed through the fully connected layer, giving 1000 different 1000 element class probabilities. My current strategy is to then select the classes which have the highest probability over any of the CAM pooled features (a different approach would be to sum over all of the probabilities for each of the 1000 CAM pooled features and sort the classes that way -- I think this approach to how we combine 'votes' for a class together is actually probably very important, and I'm not sure what the right strategy is).

So does this help? So far, not really. It actually hurts a bit, although there are examples where it helps:

The following pictures show examples where the CAM pooling helped (top) and where it hurt (bottom). (In each case, I'm only considering examples where one of the final results was in the top 5 -- there might be cases where CAM pooling improved from predicting the 950th class to 800th, but those aren't as interesting).

In each picture, the original query image is shown in the top left, then the CAM for the correct class, followed by the top-5 CAMs for the original feature (CAMs for the top 5 predicted class), and then in the bottom row the CAMs for the top-5 CAMs for the classes predicted by the CAM pooled features.

Original index of correct class: 21
CAM Pooling index of correct class: 1

 

Original index of correct class: 1
CAM Pooling index of correct class: 11

More examples can be seen in: http://zippy.seas.gwu.edu/~astylianou/images/cam_pooling

One of the fundamental challenges and requirements for the GCA project is to determine where water is especially when water is flooding into areas where it is not normally present.  To this end, I have been studying the flooding in Houston that resulted from hurricane Harvey in 2017.  One of the specific areas of interests (AOI) is centered around the Barker flood control station on Buffalo Bayou.

To get an understanding of the severity of the flooding in this area, this is what the Barker flood control station looked like on December 21, 2018...

And this is what the Barker flood control station looked like on August 31, 2017...

Our project specifically explores how to determine where transportation infrastructure is rendered unusable by flooding.  Our first step in the process is to detect where the water is.  I have been able to generate a water mask by using the near infrared band available on the satellite that took these overhead photos.  This rather simple water detection algorithm produces a water mask that looks like this...

If the mask is overlayed onto the flooded August 31, 2017 image, it suggests that this water detection approach is sufficient for detecting deep water...

There are specific areas of shallow water that are not detected by the algorithm; however, parameter tuning increases the frequency of false positives.  There are other approaches that are available to us; however, our particular contribution to the project is not water detection per se and other contributors are working on this problem.  Our contribution is instead dependent on water detection, so this algorithm appears to be good enough for the time being.  We have already run into some issues with this simplified water detection, namely that trees obscure the sensor which causes water to not be detected in some areas.

Besides water detection, our contribution also depends on road networks.  Again, this is not a chief contribution of our project and others are working on it; however, we require road information to meet our goals.  To this end, we used Open Street Maps (OSM) to pull the road information near the Barker Flood Control AOI and to generate a mask of the road network.

By overlaying the road network onto other imagery, we can start to see the extent of the flooding with respect to road access.

Our contribution looks specifically at road traversability in flooded areas, so by intersecting the water mask generated through a water detection algorithm with the road network, we can determine where water covers the road significantly and we can generate a mask for each of the passable and impassible roads.

 

The masks can be combined together and layered over other images of the area to provide a visualization of what roads are traversable.

The big caveat to the above representations is that we are assuming that all roads are passable and we are disqualifying roads that are covered with water.  This means that the quality of our classification is heavily dependent on the quality of water detection.  You can see many areas that are indicated to be passable that should not be.  For example, the shaded box in the following image illustrates where this assumption breaks down...

The highlighted neighborhood in the above example is almost entirely flooded; however, the tree cover in the neighborhood has masked much of the water where it would intersect the road network.  There is a lot of false negative information, i.e. water not detected and therefore not intersected, so these roads remain considered traversable while expert analysis of the overhead imagery suggests the opposite.

We are also combining our data with Digital Elevation Models (DEM) which are heightmaps of the area which can be derived from a number of different types of sensor.   Here is a heightmap from the larger AOI that we are studying derived from the Shuttle Radar Topography Missions (SRTM) conducted late in the NASA shuttle program.  This is a sample from the SRTM data of the larger AOI we are studying...

Unfortunately, the devil is in the details and the resolution of the heightmap within the small Barker Flood Control AOI is very bad...

A composite of our data shows that the SRTM data omits certain key features, for example the bridge across Buffalo Bayou is non-existent and that our water detection is insufficient, the dark channel should show a continuous band of water due to the overflow.

The SRTM data is unusable for our next steps, so we are exploring DEM data from a variety of sources.  Our goal for the next few days is to assess more of the available and more current DEM data sources and to bring this information into the pipeline.

 

1

These are the plots of the final results of leaf length/width:

I looked up the hand measured result for 6/1/2017, The value of leaf length is around 600 (maybe mm).

But according to Abby's botanists folks, 600 mm at that stage is unreasonable. And the groth rate trands and values of this plot seems reasonable.

So next step is to upload these to the BETYDB.