Hodgepodge

[incoming: terrible blog post that's just a wall of text and no pictures! boo!]

My last couple of weeks have been extremely scattered, between the job stuff @ SLU, talk @ UMSL, travel, CVPPP/ICCV paper thoughts/discussions, etc. The specific significant things I worked on in the last week were:

Create a test AWS instance for the Temple University students that has a backup of the TraffickCam data and database for next gen API development
Drafting milestones and deliverables for OPEN
Making a summary of our current extractor statuses (statii?) for TERRA

We have a lot of TERRA extractors that are currently able to be deployed at either Danforth or GWU but cannot be deployed as full extractors, either because they require a GPU or because our processing pipeline didn't align with the TERRA data pipeline @ NCSA (for scan data, for instance, we process a full scan of the field and can then produce per plot statistics, whereas the only "trigger" we can get that there's new data is when a single strip is complete, preventing us from knowing when we've seen all strips for a particular plot). We've come up with a solution for the latter, but we're waiting for NCSA to produce a "scan completeness trigger" that will tell us when to run our extractor/aggregator.

I've additionally been working to get the training code from our AAAI paper cleaned up and ready to be released. We had previously released our trained snapshot and then code downloading the dataset and evaluating, but I was contacted by someone who couldn't reproduce our results. We identified several places where she has differences from our training that might account for the difference in accuracy, but want to release our training code so that there's no question about the legitimacy of our results.

Most of this cleanup work just involves things like making sure I don't have weird hard coded paths, or egregiously bad code, but I was reminded of one thing that slipped through the cracks in the rush to write the AAAI paper, and that I don't currently have an explanation for, am not super happy to be releasing in our training code, and want to investigate: the best accuracy that we achieved with batch all was when our triplet loss during training was computed with non-L2 normalized features and the Euclidean distance, and then evaluated with L2-normalized dot product similarity. This mismatch is strange. I don't currently have the snapshot that was trained w/ L2-normalization, so I need to train that, but in the mean time, I'm comparing the evaluation for euclidean distance and L2-normalized dot product similarity with our current snapshot to understand how big the discrepancy is. I was hoping to have that ready to go for this post/lab meeting but forgot that I have to first save out all million features from the training gallery, so I'll just have to post an update once that's finished running.

I'm coming in to the lab next week, Tuesday-Thursday, so will be around for the CVPPP/ICCV push. I don't have super high expectations for what will get done on my own research front next week, but for the week after that I really want to get back to TraffickCam research. That means first sorting out this weird L2-normalization issue. Then I want to get Hong's nearest neighbor loss implemented for Hotels-50K and see (1) what our improvement in accuracy is, and (2) if it yields significantly more interesting visualizations since we won't be trying to push bedrooms/bathrooms from the same hotel to the same place.

On the object search front, I need to re-write my visualization code to use the lower dimensional fully connected feature (the first one, so everything is still linear) rather than the 2048-D GAP layer and evaluate how the object search performs on those lower dimensional visualizations that we might actually be able to deploy at scale.

Leave a Reply Cancel reply