Last week I decided to pursue the optimization route to try and find the light & camera locations simultaneously. This post will focus on the progress and results thus far in the 3D optimization simulation!
In my simulation, there are 10,000 centroids all arranged in a grid on a plane (the pink shaded plane in the image below. There is a camera (denoted by the black dot) and a light (denoted by the red dot). I generate a random screen map - a list of positions on the monitor (blue shaded plane) such that a position on the monitor corresponds to a centroid. I use this screen map and the centroid locations to calculate the actual surface normals of each centroid - we will refer to these as the ground truth normals.
Then, I assume that all of the centroids are reflecting the point light (red dot), and calculate the surface normals of the centroids under this assumption - we will refer to these as the calculated normals. The centroids which are considered to be "lit" are those whose ground truth normals are very close to their calculated normals (using the dot product and finding all centroids whose normals are within ~2 degrees of each other - dot product > 0.999).
This visualization shows the centroids which are "lit" by the light and the rays from those centroids to their corresponding screen map location. As expected, all of these centroids have screen map locations which are very close to the light.
To optimize, I initialize my camera and light locations to something reasonable, and then minimize my error function.
Error Function
In each iteration of the optimization, I have some current best camera location and current best light location. Using these two locations, I can calculate the surface normals of each lit centroid - call these calculated normals. I then take the dot product of the ground truth normals and these calculated normals, and take the sum over all centroids. Since these normals are normalized, I know each centroid's dot product can contribute no more than 1 to the final sum. So, I minimize the function:
numCentroids - sum(dot(ground truth normals, calculated normals))
Results
No pictures because my current visualizations don't do this justice - I need to work on figuring out better ways to visualize this after the optimization is done running/while the optimization is happening (as a movie or something).
Initial Light: [80 50 50]
Initial Camera: [50 60 80]
Final Light: [95.839 80.2176 104.0960]
Final Camera: [118.3882 26.4220 61.7301]
Actual Light: [110 30 57.5]
Actual Camera: [100 80 110]
Final Error: 0.0031
Error in Lit Centroids: 0.0033
Discussion/Next Steps
1. Sometimes the light and camera locations get flipped in the optimization - this is to be expected because right now there is nothing constraining which is which. Is there something I can add to my error function to actually constrain this, ideally using only the surface normals of the centroids?
2. The optimization still seems to do less well than I would want/expect it to. It is possible that there is a local min that it is falling into and stopping at, so this is something I need to look at more.
3. It is unclear how much the accuracy (or lack thereof) affects the error. I want to try to perturb the ground truth surface normals by some small amount (pretend like we know there is some amount of error in the surface normals, which in real life there probably is), and then see how the optimization does. I'm not entirely sure what the best way to do this is, and I also am not sure how to go about measuring this.
Nice post.
A few comments:
1. Your error function is:
numCentroids - sum(dot(ground truth normals, calculated normals))
which is totally legit, but also makes it hard to know what the error means.
In this case, if your error function was:
1 - mean(dot(ground truth normals, calculated normals))
Then the error function would be "how far is the t eventually, if you average dot product from 1" which has a direct meaning.
2. I don't think that you can use one the surface normals of the centroids to disambiguate the camera and the light. They really are ambiguous: If you swapped the camera and the light, you can have the same set of lit glitter pieces. This doesn't both me:
(a) it is still beautiful, and
(b) if you look at *where* on the image you see the lit glitter pixels, you can figure out which is the camera (usually/almost always/unless you get really unlucky and all the light glitter pieces are on one line or in a perfect square or some very strange configuration).
3. The final light and camera position are quite far from the actual camera and light position. What is the error you get if you plug in the correct camera and light position (and how does that error relate to the final error).
Also, I'm not sure I understand the definition of these two:
Final Error: 0.0031
Error in Lit Centroids: 0.0033
1. Makes sense - I can implement that for the error function instead.
2. That is what I thought, just wasn't sure if there was some aspect that I wasn't thinking of for this
3. My name wasn't very clear, but 'Error in Lit Centroids' is the error I get (using the error function I described in my post) if the camera & light are in the correct positions. I realized that because this error is not 0 (because we are allowing 'lit' centroids to be within some range of the light location), it is possible for the optimization to find a camera/light location with a smaller error...I think this makes sense at least. I feel like this is a slightly bigger problem since this is causing the optimization to potentially hone in on an incorrect final set of locations.