Team B3: 2D23D – Page 5 – Carnegie Mellon ECE Capstone, Spring 2020 – Alex Patel, Chakara Owarang, Jeremy Leung

Alex’s Status Update for 2/22/2020

During this week we had a breakthrough moment for the project. Instead of choosing a sensor by what we thought would be interesting, we got detailed, quantitative requirements for the sensor so it became easy to choose.

We started from our requirements. In the most extreme case, to accurately capture an object whose longest axis is 5cm, in order to meet the requirement that 90% of reconstructed points are within 2% of that 5cm, we must have points within 1mm to the ground truth model. From this, if we consider the number of samples we need across the surface, we must have accurate samples within 1mm for each direction (X, Y along the surface). Since the surface itself is a continuous signal we are sampling from, we can compute the Nyquist sampling rate as being every 0.5mm along each direction of the surface.

Now considering the rotational mechanism, the largest radius of the object from the center of the rotating platform will be within 15cm. We need then a single rotation per data capture to be such that the amount of the surface rotated passed the sensor is less than or equal to 0.5mm. This gives us a 0.5mm / 150mm = 0.0033 radians of rotation per sample.

There are five main types of depth sensors applicable to 3D scanning available that we have deeply considered:

Contact sensors, widely used in manufacturing. A Coordinate Measuring Machine (CMM) or similar may be used, which generally utilizes a probing arm to touch the sensor, and through angular rotations of the joints the coordinates of each probed area can be computed. This is a non-option for our application, both for price and the fact that we should not allow a large machine to touch timeless archeological artifacts.
Time-of-flight sensors. By recording the time between sending a beam of light and receiving a reflected signal, distance can be computed to a single point. The disadvantage of this approach is that we can only measure times so precisely, and the speed of light is very fast. With a timer that has 3.3 picosecond resolution, we are still not within sub-millimeter depth resolution, which is not reasonable for this project. Time-of-flight sensors in the domain of 3D scanning are more applicable to scanning large outdoor environments.
Laser triangulation sensors. The principle of such a sensor is that an emitting diode and corresponding CMOS sensor are located at slightly different angles of the device in comparison to the object, so depth can be computed by the location on the sensor the laser reflects to. Generally the position of the laser on the surface is controlled by a rotating (or pair of rotating) mirrors (https://www.researchgate.net/publication/236087175_A_Two-Dimensional_Laser_Scanning_Mirror_Using_Motion-Decoupling_Electromagnetic_Actuators). Assume that such a sensor is affordable, can easily measure with resolution of less than 0.5mm, and we do not encounter any mechanical issues. The total number of distance measurements we are required to record is (2pi / 0.0033) * (300mm / 0.5mm) = 1142398 by our calculations above. Assuming the sensor has a sampling rate of about 10khz (common for such a sensor), 1142398 points / 10000 points per second = 114.23 seconds theoretical minimum capture time with one sensor. From our timing requirement, assume that half of our time can be attributed to data collection (30 seconds). Then, with perfect parallelization, we could achieve our goals with 114.23s / 30s = 3.80 = 4 sensors collecting concurrently. With our budget, this is not achievable. Even if we had the budget, it is possible for systematic errors in, for example, the mounted angle of a sensor, to propagate throughout our data with no course for resolution. To add another set of sensors to mitigate this error, we would be even farther out of our budget. 8 sensors * $300 per sensor (low-end price) = $2400. Note these calculations are unrelated to any mechanical components, but are directly derived from required data points. To make budget not an issue, we could choose to adopt cheaper sensors, such as those with under 1khz sampling rate. Performing the same calculations as above with 1khz sampling rate shows us that we would require 39 sensors to meet our timing requirement, which is well out of the realm of possibility (and this is not accounting for error-reduction, which may require 78 sensors!). If we did not purchase this amount of sensors, we would drastically under perform for our timing requirement. Of course, there is an alternative to single-point laser triangulation. We may use a laser stripe depth sensor, which gets the depth for points along a fixed width stripe (https://www.researchgate.net/publication/221470061_Exploiting_Mirrors_for_Laser_Stripe_3D_Scanning). This would improve our ability to meet our timing requirements significantly. Such devices are not easily available with high accuracy to consumers, but are usually intended for industry and manufacturing. Because of this, we would have the responsibility of constructing such a device. We have considered the risk of building our own sensor, and since none of our team members are experts in the fields of sensors and electronics, there is a high potential for error on the accuracy of a constructed laser stripe depth sensor. However, we are up to the task. A stripe laser sensor consists of the pair of a projected linear light source and a CCD camera. After a calibration process to determine the intrinsic camera parameters as well as the exact angle and distance between the camera and the laser projector, linear transformations may be applied to map each point from screen space to world space coordinates. Generally the issue here, even if we do construct a perfect laser stripe depth sensor, is that micro-vibrations from the environment may introduce significant jitter in the data acquisition process. Because of this, laser sensors for 3D reconstruction are generally done in tandem with alternative sources of information, such as a digital camera or structured light depth sensor, to minimize vibrational error (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4279470/#b8-sensors-14-20041). An alternative solution is to add an additional stripe laser sensor (https://www.sciencedirect.com/science/article/pii/S1051200413001826), which would help eliminate such errors. Because of the ease of achieving sub-millimeter accuracy, and the relative independence on lighting conditions and materials that photogrammetry is harmed by, we plan on constructing a laser stripe triangular depth sensor by using an CCD camera.
Digital camera. By taking multiple digital photographs from many perspectives around the object, computer vision techniques can be used to match features between pairs of images, and linear transforms can be computed to align such features. After feature alignment, and depth calculation, point clouds can be generated. Computer vision techniques to accomplish the above include Structure from Motion (SfM) and Multi-View Stereo reconstruction (MVS) (http://vision.middlebury.edu/mview/seitz_mview_cvpr06.pdf). This approach has a fundamental flaw: concavities in the object to be scanned cannot be resolved, since cameras do not capture raw depth data. Surface points within the convex hull of an object cannot be easily distinguished from points on the convex hull. This is an immediate elimination for our project, since archeological objects may have concavities, and we do not want to limit the scope of what type of objects can be captured.
Structured/coded light depth sensor (RGB-D Camera). The idea of such a sensor is to project light in specific patterns on the object, and compute depth by the distortion of the patterns. Such sensing devices have become incredibly popular in the 3D reconstruction research community with the consumer availability of the Microsoft Kinect (https://www.researchgate.net/publication/233398178_Kinect_and_RGBD_Images_Challenges_and_Applications). The original Microsoft Kinect only has 320 pixels wide of depth information for a single depth image. With the upper bound of 30cm across the surface of the object, this results in 30cm/320px = 0.094cm between each pixel, which does not meet our sensor requirement of being able to detect differences within 0.5mm (0.05 cm). The newer Microsoft Kinect v2 actually uses a time-of-flight sensor, and thus does not get measurements more accurate than 1mm depth resolution. Intel RealSense has recently released new product lines for consumer and developer structured light depth sensors, that do not require industrial/manufacturing budgets. Most notably, for short range coded light depth sensing, the Intel RealSense SR305 offers 640×480 pixel depth maps, which correlates to 30cm/640px = 0.047cm, which is within our requirements. If vertical accuracy becomes a problem with the 480px, we can rotate the camera up or utilize a mirror to take two captures for each perspective. Ensuring that the capture of the camera covers most of the object, we need about 30cm distance with its minimum 66 degree horizontal camera angle, and 52 degree vertical. This works well with the camera’s minimum depth capture of 20cm. Finally, with structured light depth sensing, depth resolution is somewhat proportional to distance, so as long as our object is close to the sensor, we would hopefully achieve sub-millimeter depth resolution. Intel does not advertise any specific depth resolution for the device. We would need to perform extensive testing to discover actual depth accuracy before further implementation. Lighting environment and material of the object can also influence depth resolution, which we will investigate with a variety of objects. Compared to laser triangulation, in which most of the effort is up-front in sensing devices and mechanical components, structured light depth sensors require a significant algorithmic effort after data collection to correlate the views. We have already designed an algorithmic pipeline to go from multiple depth maps, to localized point clouds, to a global point cloud, to a triangular mesh. This heavy computation must meet our timing requirement of one minute, so we will be programming the algorithms from scratch to run on the GPU of the embedded system for the device.

A summary of our extensive sensor research this week: we have eliminated the options of contact sensors, time-of-flight sensors, and digital cameras. Both laser triangulation sensors and structured light depth sensors have promise. Because of the potential for vibrational and mechanical error for the laser triangulation sensor, we would likely need to use it in tandem with another laser triangulation sensor. As much as purchasing the Intel SR305 would speed up our development time, we do not know if it actually has sub-millimeter accuracy, which is a requirement, and as a product our project may be more cost effective if we build a laser triangulation sensor with two stripes and a CCD camera (depending on the cost of the CCD camera we may choose to buy a CMOS one instead).

Because of this, we will begin by searching for and buying a CCD camera, as well as two laser stripe projectors. From this we can begin implementing our calibration routines and discovering the accuracy of the scanner. We are mostly interested in the accuracy of the calibration routine, which will tell us the intrinsic parameters of the camera, which can be verified, and will also tell us computed distances and angles between the camera and the laser stripes, which can also be verified.

Next, we need to consider how we will evaluate the metrics and validation of our design. There are several components to the pipeline, and we will unit test each component individually, and integration test our final project’s results.

For accuracy of our sensor setup, we will verify the calibration procedure as mentioned above. We can also use early generated point clouds to determine standard deviation from plane fit, normalized RMS, the subpixel accuracy, distance accuracy and fill rate. From these values, we will be most interested in a sub-millimeter Z-accuracy, which is computed as the median of differences in depths to the actual distance of the plane. If the sensor does not achieve this goal, we may be forced to add an additional RGB-D camera to aid in data collection.

For accuracy of our software, we will write a simple sequential version of the code and ensure that for each stage of the pipeline, our results from the GPU-accelerated code match.

For accuracy of our rotating platform, we will measure that each step of rotation for the motor is equal to or less than our intended rotation of 0.0033 radians.

For verification of our project as a whole, we will test each of the requirements. Timing can be measured by stopwatch, and price can be measured by our order receipts. Accuracy will be a mix of using known objects such as coke bottles and measured cubes, and 3D printed archeological objects, for which we will allow an excess 1mm accuracy violation due to the inconsistencies of 3D printing.

Chakara’s Status Update for 2/15/2020

This week, I was working on two main parts, finalizing the project requirements with the team and researching on sensor capsule and platform. In finalizing the project requirements, I mainly worked on the Usability requirement as the other parts were mostly done together with other team members. At the beginning of the week, we decided we would potentially be using Intel real-sense depth camera (RGB-D camera using coded light) instead of other cameras and sensors for our sensor (though this may change now after getting feedback from Professor Mukherjee). First, we updated the Usability requirement where we specified our object dimensions to be from 5cm to 25cm along each axis. After getting sensor specifications and information from Jeremy, I computed the platform dimensions. The Intel real-sense depth camera S305 has Depth Field of View (Horizontal x Vertical) of 69 degrees +- 3 degrees and 54 degrees +- 2 degrees. From this information, I computed that the camera has to be around 26 cm away from the object side. The image below shows the basic calculation and our initial design of the platform.

Our initial design places the camera at the center of the largest object size possible, but this may have to be calibrated depending on the reconstruction technique we decide to use and the data we receive from the sensor. However, this would be done later once we have the basic implementation of the system done.

Another metric we added to the Usability requirement is also the weight limit of the input object size. Since our use-case is to be able to model archaeological documentation, I used baked clay and potteries as our base. The average pottery pot thickness is around 0.6-0.9cm in which I rounded up to 1cm. With this, I computed the volume based on our maximum input object dimensions and used the density of baked clay to compute the average mass of the object, 6.91kg. We rounded this up and used 7kg as our input object weight limit. Since we would also be testing by printing out 3D object from models, I used the density of plastic to compute that our test objects would have mass under our weight limit in which they do (around 4.3kg).

After having the weight requirement and platform dimensions, I started researching on how to design a rotational mechanism.

My progress is a little behind schedule. This is mainly because we still have not fully decided on the sensor type to use, so the platform design may have to change in the future. Our team would have to finalize our sensor by this Monday at the latest so that I would have time to design the platform and rotational mechanism. Moreover, I am also behind schedule because I underestimated the complexity of the rotational mechanism. I would have to do a lot more research and potentially ask for advice from someone with expertise in the area.

For next week, I hope to finish the draft design of the rotational mechanism and the platform. After doing some research, I now realize that I would have to take into account the object’s weight distribution to calculate moments and determine the ratio I need of the base to the platform to ensure stability. In addition, I would also need to compute how much torque I need on the motor to be able to order the right mechanical components. I would also need to design a feature extractor system that, based on the weight of the object, be able to control the rotational mechanism to give the sensor the rotation velocity it needs. I would also need to do more research on the material type of the platform itself.

Alex’s Status Update for 2/15/2020

This week I spent time researching the common pipeline generally utilized in 3D Mesh reconstruction from raw depth data. The core of the pipeline is the Pairwise Registration stage, which aims to join localized point clouds from each perspective into a global 3D coordinate space. A common technique to accomplish this is the Iterative Closest Point (ICP) algorithm, which estimates the rotation between two subsequent views of depth information, and continuously iterates on this value until relative points line up with a low mean squared error cost. Once the rotations have been computed between relative views, rotational matrices can be applied to each view to transform them into a global coordinate space, thus achieving a final point cloud. It is possible for us to provide an initial estimate for the rotation by capturing data from our rotational mechanism, but we will still utilize an iterative approach, since this initial recording may not be accurate with the depth info collected.

Problems with ICP include: time complexity is quadratic in number of points, since each pair of points is considered (Oct-tree or similar data structure can get this to O(nlogn)); some points may not have corresponding points in the subsequent view; if two views are at far enough angles, the algorithm can be trapped in a local minimum. Many variants of the ICP algorithm exist, which select points from each view, match points between the views, weight pairs of points between the views, and define error metrics differently. These decisions are aimed to eliminate systematic biases in our system which may result in the algorithm being stuck in a local maximum. For example, if our sensor captured RGB data with each depth point, matching points can be biased with this information, likely resulting in an algorithm that will match correct points more often. We prioritize accuracy more than we do performance, based on our stated project requirements during our project proposal. Because of this, our choice in a variant of the ICP algorithm will be based on avoiding local minima which will skew the accuracy of our final point cloud. From this we are pushed to make our rotational mechanism rotate a small angle between views, and possibly make our sensor capture RGB data for ease of point matching.

Other steps which we will also need to consider are pre-filtering and point-cloud generation for each view. These steps are largely dependent on the sensor data we receive, and thus are dependent on the devices we choose to capture the data, which Jeremy has done research in this week. Finally, mesh construction from the point cloud and optional (stretch goal) texture mapping must be done to form a usable 3D format for the user. We must choose a format for the output mesh (volume-based or triangular surface mesh), and then choose a corresponding algorithm appropriately. Triangular surface meshes are more generally usable by more software, so if we choose a volumetric approach, we may utilize a public library to triangulate the result. It will be easier to directly implement the construction of a volumetric mesh from point-cloud data, so for the sake of time (and not re-inventing the wheel of triangulation algorithm implementations), we will most likely choose this approach.

Below is a general diagram of our initial algorithmic pipeline. Much of the details need to be sorted out based on our data collection mechanism. Our ability to provide texture mapping is also dependent on whether our sensor captures RGB data.

Algorithmic Pipeline Diagram

In terms of developing our algorithmic approach, we are on the schedule set by our Gantt chart. In the next week we will have chosen a specific sensor approach, which will enable us to narrow in on our pre-filtering and point-cloud generation techniques. By this point we will also be able to choose a variant of the ICP algorithm for pairwise registration based on foreseen metric trade-offs. Finally, with our sensor chosen, we can determine whether or not we will be able to perform texture mapping as a stretch goal. At the end of next week, after our algorithmic pipeline has been determined to a large degree, we can plan the specific technologies we will utilize to implement each stage.

As of now to manage risk, we are choosing techniques general enough that regardless of sources of error from the sensor side, we can accommodate by adding an additional filtering step during pre-processing. This is, of course, assuming the sensor data does not have egregiously high noise.

Jeremy’s Status Update for 2/15/2020

This week I did research comparing different 3d reconstruction options as well as a bit of research on texturing 3d scans. There are several possible scanning options, some of which were suggested by Professor Tamal. These include RGB-D camera (gives depth information on each pixel using coded light), time-of-flight single point (one laser point with depth using time-of-flight), time-of-flight vertical line (like a barcode), and time-of-flight laser 2D depth map.

The time-of-flight single point laser scanner was the lowest priced option, but it was difficult finding many papers that used this method due to being very prone to mechanical errors, as well as being rather time-costly and complex mechanically. There were a few possible ways of executing this method which included the spiral method, where the laser point would slowly move down vertically while the object rotated. Depending on the controlling mechanism, this method would be prone to missing a lot of points scanned, especially if the laser shudders or other mechanical errors. A way to make this more efficient would simply be to use several different laser points since each was not very costly; however, the same issues would still arise.

The RGB-D camera using coded light was one idea we were very interested in, especially given that the camera would already help us do some of the processing to get the depth data. This would also allow for texture mapping, something that would be missing from the time-of-flight sensors (unless we combine those with camera data). This method would be less prone to error depending on our depth camera, and among the few options we looked at, we would most likely use one that can give within 1mm accuracy for depth data. This method would also allow for correction for bias using color data potentially. The price is also not too expensive (less than $100 for the coded light depth camera), which fits our requirements. However, we may need to do some work in figuring out confidence intervals for the accuracy ranges and determining if this reconstruction method would be able to figure out the depth accurate enough to fit our requirements for accuracy.

The laser-based approaches are still intriguing since time-of-flight lasers can usually give micrometer accuracy since we determine the exact distance using wavelength and time-of-flight data. This led to an idea from Professor Tamal to use LIDAR (1D/2D) for the scanning. The 1D LIDAR would behave like a barcode scanner with a vertical line to scan and the object rotating, but there may be certain complexities to explore with this method, and there have not been a lot of previous work using this method. The 2D LIDAR would be even more accurate and gives an accurate depth map, but it would cost quite a bit more. This method is certainly very promising and deserves extensive research to compare with the RGB-D camera method.

All of these methods would potentially require some filtering or smoothing techniques to remove the noise from the data, but the RGB-D camera and the 2D LIDAR would probably give us the easiest time in managing and converting the data into a 3D point cloud. Since the data is 2D, however, we would need to cross reference points and map several scans from different angles back into the same object, which would be one of the main algorithmic complexities of our project. We would also be able to leverage the rotational position of the platform to help us determine which pixel maps to which exact 3D point in space.

Thus, in the coming week, I will have to dive deep into researching the RGB-D and 2D LIDAR methods and doing more extensive comparisons between the two, and referencing their qualities back to our requirements. So far, a lot of our research has been very breadth based, since we were considering a large variety of options, such as previously considering using two cameras and computer vision to do the scanning. However, my research goal this week is to narrow down on the specific idea we use and justify it with qualities we look for based on our requirements. I will also be doing more research on piecing together scans from different angles, as well as working out math to figure out a 3D point given a pixel, depth, and camera position, as this will be necessary information for our algorithm later on regardless which scanning mechanism we choose (both will output depth data).

Table for Comparing Possible Sensors

Sensor	Cost	Mechanical Complexity	Pre-Filtering Complexity (estimated)	Potential Sources of Error	Texture Mapping	Algorithmic Implications
RGB-D Camera (structured/coded light)	~$70	Low	High	Less accurate than laser time-of-flight approaches, noise	Possible	Color may allow better matching with ICP
Time-of-Flight single point	~$10 per sensor	High	Medium	High risk of mechanical errors	No	Direct computation of point cloud, no ICP
Time-of-Flight vertical line	~$130?	Medium	Low	Noise (but less error than 2D?)	No	Direct computation of point cloud, no ICP
Time-of-Flight 2D depth map	~$200?	Low	Medium	Noise	No	Direct ICP available

Team Status Update for 2/15/2020

For this week, we did a majority of the work as a team. After getting feedback from the Proposal Presentation, we realized that our user-story was too vague and that some of our requirements were not as clear and quantifiable as they could be. As a team, after doing more research, we narrowed down the scope of our use-case to just archaeological documentation and refine our requirements to fit our use-case more. For example, we made the error rate for accuracy depends on the object size dynamically instead of just static measurements. We redefined our usability requirement to include the input object size and weight limit and redefined our time requirement to be one minute. We also refined some other requirements. As a team, we also discussed different sensors we researched on to use for the project.

Currently, the most significant risk we have right now is that we have not finalized the type of sensor to use yet, so we might be a little behind the schedule in terms of ordering parts. We are managing this risk by starting other tasks that could be done in parallel such as researching more on the common pipeline generally utilized in 3D Mesh reconstruction from raw depth data. We also assigned everyone on the team to do more research on different sensors available. We first agreed on using Intel real-sense depth camera (RGB-D camera using coded light), but was suggested by Professor Mukherjee to look into 1D/2D laser array/LIDAR sensor which could be more accurate and time-efficient. All of our team members are working on this to be able to finalize the sensor by this upcoming week and get back on schedule. Another risk factor we have is our lack of expertise in mechanical engineering and robotics. The rotational mechanism and platform design seem to be more complex than what we initially thought. The rotational mechanism is a crucial part of our system no matter which type of sensor we use. This risk is being managed by letting Chakara take care of the part and he only has to worry about this part for the upcoming week. We also started seeking advice from someone with more expertise in the area.

As mentioned earlier, there were some changes made to the requirements and use-case. The changes were necessary for us to be able to narrow down the project scope to be able to finish it in the given period of time. They were also necessary since we need requirements that are more clear and quantifiable to use them as metrics. These changes cost us time from last week but made our project goals and scope clearer which will benefit us in the future.

Below is the link to our updated schedule. Or you could also refer to our “Project Schedule” section.

https://docs.google.com/spreadsheets/d/1GGzn30sgRvBdlpad1TIZRK-Fq__RTBgIKN7kDVB3IlI/edit?usp=sharing