Skip to content

2

Last week I decided to pursue the optimization route to try and find the light & camera locations simultaneously. This post will focus on the progress and results thus far in the 3D optimization simulation!

In my simulation, there are 10,000 centroids all arranged in a grid on a plane (the pink shaded plane in the image below. There is a camera (denoted by the black dot) and a light (denoted by the red dot). I generate a random screen map - a list of positions on the monitor (blue shaded plane) such that a position on the monitor corresponds to a centroid. I use this screen map and the centroid locations to calculate the actual surface normals of each centroid - we will refer to these as the ground truth normals.

Then, I assume that all of the centroids are reflecting the point light (red dot), and calculate the surface normals of the centroids under this assumption - we will refer to these as the calculated normals. The centroids which are considered to be "lit" are those whose ground truth normals are very close to their calculated normals (using the dot product and finding all centroids whose normals are within ~2 degrees of each other - dot product > 0.999).

 

 

 

 

 

 

 

 

This visualization shows the centroids which are "lit" by the light and the rays from those centroids to their corresponding screen map location. As expected, all of these centroids have screen map locations which are very close to the light.

To optimize, I initialize my camera and light locations to something reasonable, and then minimize my error function.

Error Function

In each iteration of the optimization, I have some current best camera location and current best light location. Using these two locations, I can calculate the surface normals of each lit centroid - call these calculated normals. I then take the dot product of the ground truth normals and these calculated normals, and take the sum over all centroids. Since these normals are normalized, I know each centroid's dot product can contribute no more than 1 to the final sum. So, I minimize the function:

numCentroids - sum(dot(ground truth normals, calculated normals))

Results

No pictures because my current visualizations don't do this justice - I need to work on figuring out better ways to visualize this after the optimization is done running/while the optimization is happening (as a movie or something).

Initial Light: [80 50 50]
Initial Camera: [50 60 80]

Final Light: [95.839 80.2176 104.0960]
Final Camera: [118.3882 26.4220 61.7301]

Actual Light: [110 30 57.5]
Actual Camera: [100 80 110]

Final Error: 0.0031
Error in Lit Centroids: 0.0033

Discussion/Next Steps

1. Sometimes the light and camera locations get flipped in the optimization - this is to be expected because right now there is nothing constraining which is which. Is there something I can add to my error function to actually constrain this, ideally using only the surface normals of the centroids?

2. The optimization still seems to do less well than I would want/expect it to. It is possible that there is a local min that it is falling into and stopping at, so this is something I need to look at more.

3. It is unclear how much the accuracy (or lack thereof) affects the error. I want to try to perturb the ground truth surface normals by some small amount (pretend like we know there is some amount of error in the surface normals, which in real life there probably is), and then see how the optimization does. I'm not entirely sure what the best way to do this is, and I also am not sure how to go about measuring this.

The last couple of days, I have focused on formally writing up the derivation in 2D of the constraints on the glitter that I am using in defining ellipses. I believe there is something wrong/incomplete in how I am thinking about the magnitude of the surface normals when using them to calculate the gradient vector. The difference in magnitude of the surface normals for each piece of glitter definitely has a bearing on the size of the ellipse associated with that piece of glitter.

I have attached my write-up to this post. In the write-up, there is a derivation of the constraints as well as my initial attempt at motivating this problem. I think I need to tie the motivation into the overall camera calibration problem instead of just talking about how the glitter can define ellipses.

Glitter_and_Ellipses-1xaxzep

My immediate next steps include re-working the last part of the derivation, the part which involves the magnitude of the surface normals (the ratio). I am also going to try to find other approaches to this problem. I REALLY believe the surface normals of the lit glitter is enough to determine the set of ellipses, so perhaps this implicit equation approach isn't the correct one! In the next day or so, I will put up a more comprehensive post on what results (including pretty/not-so-pretty pictures) I have achieved so far using the technique outlined in the write-up attached to this post.

1

The implicit equation for an ellipse looks like f(x, y) = ax^2 + bxy + cy^2 + dx + ey + f = 0 . The idea here is that if we have at least five pieces of glitter that are "on" for some location of the camera and light (unknown), and we know the surface normals of those pieces of glitter, then we can use that information to determine the values of the coefficients in the implicit equation, thus defining a set of concentric ellipses associated with our set of "on" glitter. Then, we think the two foci of these concentric ellipses (which will be the same for each ellipse in the set) will define the camera and light locations.

10 pieces of glitter are "on", and the light and camera lie alone a vertical line.

In this image, we can see a sample simulation in which there are 10 pieces of glitter, all of which are "on", and the camera and light are location along a vertical line. Here, we expect to see a set of concentric, vertically oriented (major axis aligned with the y-axis) ellipses such that each ellipse is tangent to at least one piece of glitter.

Previously, we thought that this set of concentric ellipses may be defined by some a, b, c, d, and e that are fixed for the whole set of ellipses, and an f which is different for each of the ellipses. In other words, a, b, c, d, and e defined the shape, orientation and location of the ellipses and f defined the "size" of each ellipse.

I am starting to believe that this is not quite true, and that the "division of labor" of the coefficients is not so clearly defined. Perhaps it is the case that there is some function which defines how the coefficients are related to each other for a given set of concentric ellipses, but I am not sure what that function or relationship is.