Week 3

What we have done:

Avi has a new strategy from Marios about how to classify faces that should train much faster than the current strategy, which we did not have enough data to make work.  This week, apart from presentations, Avi is implementing that strategy (and will have done so by midnight Sunday).

Claudia is in the process of collecting photos of individuals to be used as training data. This involves collecting around 100 photos each of about 20 people.

Dylan has tested the lens distortion calibration and the homography estimation between the camera and the projector. The lens distortion actually becomes much worse after the calibration, so Dylan is looking into reasons for why that is the case. However, since we are using the center for the image to get the face, and using the center of the projector to project onto the face, both of which are the places with the least lens distortion, there should be no need to get the lens distortion working.

 

What we are planning to do:

In the next week we will have collected a significant amount of training data and can start to implement and train our adversarial neural networks.

We can start projecting objects onto a person using the homography estimation between the projector and the camera.

Week 2

What we have done:

Claudia has written a script for taking photos using the webcam, and is in the midst of collecting photos to be used as training data. This involves taking ~100 photos of about 20 individuals each.

Avi has delved into neural network architecture design to figure out how to build the adversarial network.  The current plan is to use part of the architecture from the OpenFace network so that the adversarial network can learn to identify the same important features, and then use an architecture similar to a GAN network to reconstruct a 96×96 image to input to the OpenFace network.  This is still in the process of being figured out.  We hope to have three promising architectures to try set up by next week.

Dylan has been working on the calibration between the camera and the projector. For camera and projector lens distortion, we will correct for the lens distortion using the OpenCV fixing of lens distortion. Instead of correcting for distortion on the camera and then the projector, we are just going to calibrate them both since that seems to minimize error. We may not actually need to do much lens distortion fixing since we are only using the center of the photo and center of the projector where lens distortion is least. For finding the relationship between the camera and the projector, we will project chessboard corners with the projector and then use the solve for the homography matrix with ransac from OpenCV. Will test the camera projector calibration, but need to find a better way to project than mirroring displays.

 

What we are planning to do:

  • Finish up design review presentation for Monday
  • Research GAN network architectures

Week 1

What we have found out:

  • Information about projector we are using:
    • Minimal distance with focus for projected image is 1.4 meters away from projector
    • At that 1.4 meters, a face is about 200 pixels wide, which is more than the 80 pixels wide we wanted
    • When we project a black square just on the eyes, we were comfortable looking into the projector
      • Though we will want to increase current size of black square

 

What we have done:

  1. We selected and received a projector and 2 cameras from the 18500 inventory. All of them exceed the requirements for a projector and camera that we had, though the projector is much bigger than ideal. We do not have a requirement for projector size so that is fine.
  2. Avi set up OpenFace to convert images into simple 128 dimensional embeddings.  Then he pulled data from a database of celebrity images and ran them through the OpenFace network.  The output was used to train a simple classification network that is able to distinguish between 20 different celebrities in the network.  The data he pulled had about 100 images of each celebrity, and about 60 were used for training.  The remaining 40 were used for validation, and the network was about 40% accurate.  He concluded that it is probably a working model that does not have enough data.
  3. Claudia used dlib to extract facial landmarks, so that projections can be scaled to these landmarks. This can also be used to black out the eye area, making the experience more comfortable.
  4. Dylan used Claudia’s facial landmark extractions to create a function that finds the pixels that need to be blacked out so that the projector does not project anything on someone’s eyes.
  5. Dylan used Claudia’s facial landmarks to also create a height map across a person’s face. He did not finish the calculation for how far away each part of the face is from the camera that he and Claudia were working on because we found out that the depth map may not actually be important
  6. We tested out putting the black-square over each member of the group’s eyes, and everyone was comfortable(no immediate eye strain) with the projection on their face.

 

What we are planning to do:

  1. Avi is going to spend the next week researching neural network
    architectures to change an image without reducing its dimensionality
    and decide on appropriate loss functions for our future adversarial
    training.  If this goes well, he will also try to implement an image
    modifying network.
  2. Claudia will create the program to help collect the images. She will also start collecting images. Claudia will also work on translating the image that we want to project to what we project.
  3. Dylan will expand the black eye rectangle and make it cover each of the eyes. He will also complete the camera calibration for finding the geometric relationship between the camera and the projector.

Updates on design:

After talking with Marios and Emily, we found out that we may not even need depth information, since we can just correlate where we project with the facial features that dlib finds. We may decide to use the depth information in the future but for now, we will see how well projecting without depth information is and our backup will be using depth information. So for now we will only use 1 camera.