chenshen – Team B3: ScottySeat

December 11, 2022December 21, 2022

Chen’s Status Report for Dec 10

At the start of the week, I rewrote the script for front end, so now the front end just parses through the data the server sends back, update the list of rooms, and then update the map with the room that is current selected. Thus, the function of switching rooms has been achieved.

As explained in the pervious report, due to the difficulty, what we currently have is this psuedo depthconcept where we view the room as a 2d plane, and assign a depth to the coordinates based on the y coordinates, to simulate the straightening vertically. I believe this is similar to what the cv.warpPerspective provides.

December 4, 2022December 21, 2022

Chen’s Status Update, Dec 3 and Nov 19

This week, I wrote the function that handles requests from the CV module and store it in a JSON format file, and integrated my code with Mehar, meaning that no like before, the server tells the CV module to run when the client requests, and then read from the data file the CV module outputs, now the server will only have to handle requests sent from the CV module and then update it in the JSON database file with the JSON data the CV module sent me, and handle requests from the user and send the whole database to the user. This way, the server and the CV module can run seperately. This has been tested out and made sure it work, as we spent hours debugging and made sure we can send data in a local network.

Additionally, for the spatial accuracy metrics, as explained below, it is better for us to change the metrics. The loss function we are going to use is the quadratic loss function, namely, the

where, as suggested by Adnan, yi is defined to be ratio of “the y or x coordinates of each chair to the table in pixel values of the predicted image”, dividing “the y or x coordinates of each chair to the table of the overhead true image “. This ratio should be similar, thus we use the “yi hat” to imitate the true value, which is the average of all the ratios as defined above. This the larger the MSE, the more it deviates from the “normal” value, the more loss it has.

Perspective Adjustment:

Apply a perspective warp to a picture, the basic thing we need is the coordinates of the 4 points of the shape we want to warp.

Inside the CV library there are ways to do this. For example, cv2.getPerspectiveTransform, along with cv2.warpPerspective, or utilizing functions such as cv2.findHomography.

The problem with this is that the output is an image, thus it has to be perspective adjusted before feeding the image into the image recognition algorithm, in other words pre-processing. Then, in this case, it would be hard to recognize the objects. If we put it in post-processing, then it is hard to get the exact coordinates of the warped image.

More over, to warp it in the first place, we need to get the 4 corners, and to get that, we need to find the 4 corners of the ground. Additionally, applying warp if we crop it to only include the ground is not enough, because chairs have heights, so if we only crop it to include the ground, some chairs will be cut off. We have to cut it so that the upper chair and lower chairs are included, this means that some of the wall must be included. This means that error is inevitable in applying perspective adjustment, as shown below.

Since the lower 2 points are not visible, we assume it is the lower 2 corners of the image. For the upper 2, there are basically 3 way to gain it:

Hardcode it when setting it up
Detectron2 image segmentation
edge detection

1. First one is what we have now. I updated the algorithm this week. Before, the x coordinates are stretched to the side to form a rectangle, but the y coordinates, since the more you go into the image the more distance the same amount of pixels represent, I applied a linear function in the real distance it represents according to the y coordinate of the point. This week, realizing that

according to the above image, the “P” point is always the real center of the plane, that means that when the y coordinate is located at that point, in other words the ratio of the upper and lower edge of the trapezoid times the height of the image, the adjusted y coordinate should be 1/2. According to this, we form a y = ax^2+bx+c function and solve for it to gain a function that maps y coordinates to real y coordinates, with then x = 0, y = 0, x = “P”, y = 1/2, x = 1, y = 1.

2. Detectron2, however, might be an overkill to our case. I spent lot of time trying to implement detectron2 but was stuck on CUDA. Anyways, since detectron2 is trained on COCO dataset, it does not include the ground category, thus to use it we have to gain training data on floors to detect the grounds. Additionally, the standard output

as shown here, is similar to YOLOV5 – it only includes the bounding boxes coordinates. To access the exact outline, we have access the variable where the outline is stored, while it is stored in a “Tensor” format. Then, to utilize that to gain the corners of the ground, we will have to change in a cv2 readable format, in other words, a “numpy array” format, then some analyzation could be used to simplify it and potentially gain the corners. For example, cv2.approxPolyDP() could help us in shaping it into simple shapes, then the 4 corners can be easily retrieved. However, a more fatal problem is that, even if we have all of the above realized, our ground doesnt have a boundary, which will be shown in the next section.

This also rules out the possibility of applying feature recognition, a common way when working with homographies.

as shown in the image, I played around with CV methods endeavoring to get the outline of the upper 2 corners of the table or ground.

We first convert it into grayscale, and then apply cv2.bilateralFilter to accentuate edges but blur the rest, a better version of Gaussian blur in edge detection.

Then, I ran Canny Edge Detection and Harris Corner Detection, cv2.Canny and cv2.cornerHarris respectively.

Then, I look for the longest edge, however, I was stuck here as no matter how i adjust the numbers, the edge of the table is not continuous. To fix this, I adjusted the Canny Edge Detection so that it detects even the smallest edges but this will just not form an enclosed edge of the table. Another way is to merge close up edges, but this means that all the chairs and bags will also be merged into the edge.

Canny Edge Detection

Canny Edge Detection Longest Edge

Harris Corner Detection

Additionally, as you can see the ground doesnt have a clear boundary. Also, since the table is just part of the room, if we rotate the table, that means that the corners of the tables will change, and thus the ratio of the upper and lower half of the trapezoid will change, thus meaning that the perspective adjustment will change, leading to unpredictable results.

As above, the most stable way is definitely hardcoding the ratio of the upper and lower edge, or another good way to think about it is to gain the cos angle of the line of the edge of the table, since we are applying the perspective shift to the whole image instead just the ground or just the table anyways.

November 10, 2022December 21, 2022

Chen’s Status Report for Nov12, Nov 5 & Oct 30

Updated the perspective algorithm, so that it can now 1. crop image, 2. better adjust the perspective by applying an offset（the divide ratio/offset ratio）3. Better boundary check. The current function still needs to manually set the cropping dimensions but it is soon expected to crop automatically. This simply means that the code will crop the image based on the left/right/up/down most objects. The code still needs cleaning up.

The parameters needed for the perspective algorithm also includes the ratio of the length of the upper side of the table and the lower side of the table, or similar; This parameter has currently set manually. This not only adjusts the x coordinates, according to its y coorinates（the top 2 vertex of a trapezoid is stretched to the side）, it also adjusts the y coordinates（the farther it is to the back of the image, with the same amount of pixels, it represents more distance in reality）.More work could be done so that such ratio could be calculated automatically, such as image segmentation algorithm. However, it does not perfectly address the case where there is no table/the table is at angle to the left/right/ the table is just a smart part of the room.

One thing we can do is just to set the scope of this project, so that such parameter is given when setting up the camera.

Figure 1: original plan without straightening

Figure 2: adding perspective adjustment(not effective due to how man white space there is top of the image, thus adding crop is necessary)

Figure 3: adding cropping(Focus on the seats, the seats are more aligned, but the table is a problem)

Figure 4: Applying offset

Another thing we realized during the demo is that we are currently only using portrait images to test out the results. While landscape images will work, it will be stretched to make it fit in portrait mode. We could set it so that we only allow for a specific kind of import, or, from the user perspective, the “stretched” result does perfectly reflect where the seats and tables are and all other vital information for them to make the decision. Thus, this is not too big a problem.

I also updated tables during the past week. Tables will not be displayed. But there were 2 problems I encountered：

Tables might look “fat”. This is because yolo is not capable of getting the outline of the table, namely a rectangle instead of a trapezoid. This is solved partly today, by when calculating the length of the table and feeding the left-bottom coordinates and the right-top coordinates into the straightening method algorithm, for the right-top coordinate, we use the x value of itself but the y value of the left-bottom coordinate. I understand that this sounds confusing, but this is to counter the problem that since the right-top point it farther “back”, it will be stretched out more, making it “fat”.
Figure 5: after only using the y value of the bottom vertex of the bounding box of the table(since its a rectangle)
Seats might appear in the table. As a side effect of the problem above, seats could be in the table. This can be countered by either

a. shortening the table ： This means that the table is decreased in width and height based on the y and x coordinates of the seats that are in the table. This might be dangerous where there actually is a seat in the table, making the table look too thin.
b. moving out the chairs.： This method is implemented. However, the problem is that we can only choose to move the seat either to the left/right/top/ or bottom. This is currently determined by the distance of the chair to the closest border of the table, so whichever is the closest, it will be moved out in that direction. This distance is determined by the pixel ratio, so a number between 1 and 0, so it does not perfectly reflect the actual distance. For example, if the image is super thin, then the same change in the x coordinate will have a much larger ratio than that in the y coordinate.

Figure 6: after moving out the seats

Figure 7: code for parsing the file and store seats and tables information

Figure 8: code for calculating the availability of seats and moving seats out of tables

The availability is calculated according to the closest seat to a person. However, there are a few problems I encountered:

We wanted so that if the person is far away from the seat, the seat won’t be occupied, thus we want a threshold. This is applied at first, but the threshold value varies from different rooms to different rooms, so it is likely impossible to have a fixed value for threshold.
The distance is inaccurate due to perspective issues
We can not differentiate between a standing and a sitting person, thus we would mark a standing person as sitting.

Other parts of the views.py file is not included since it is not the core/critical part but only performs systematic functions

Database will be implemented this/next week, so that we can proceed with multiple rooms. The database model:

Figure 9: database model

sn.js file

This basically sends request to the backend and runs on load. Related functions are sendRoomRequest.

Figure 10: initialize(the setInterval function might change afterwards)

Figure 11: parsing the JSON response, and put a dot on whereever there’s a seat and drawing the table

Other functions include updatePage.

Figure 12: Other files. (html/css)

Plan: after we have data from different rooms, we can easily parse through all the files, store it in database, and when ever the frontend requests it will only be a “GET” method , the backend will only respond will all the information from the database, and then filter/search functions for rooms will be implemented.

This Friday will collect more training dataset and label images will LabelMe.

October 24, 2022December 21, 2022

Chen 10/22 Status Report

The basic framework has been built, creating a full pipeline image -> yolov5 -> outputs coordinates and size information in txt file (yolov5 default output format) -> the server parses through the file -> data post-processing adjusting the perspective -> display on the website interface

The current server works as whenever the front end sends a post request, the server parses through the output file. This will be changed to whenever the front end sends an http request, the server responds with the information of the room it’s requesting in a later update with the completion of the database, fulfilling the purpose of scaling the project to multiple rooms.

Additionally, the distance threshold will be applied in the later update to calculate availability.

This report also covers the week before fall break.

October 9, 2022December 21, 2022

Chen’s Status Reports for 10/9

I was the presenter this week for design review, so that was part of my focus. During the preparation, we were able to determine the details of implementation. This includes determining whether or not to use overhead cameras versus side cameras, as overhead cameras are easier to transform into coordinates but might not be accurate in terms of cv algorithm, while side cameras might be more accurate in recognizing seats, but really hard to translate into 2d maps, and we decided on the latter, as accuracy is the most important. For this week, my main progress if building up the environment for the webapp, and finishing front end, and part of the back end.

I expect to finish and have some psuedo map showing for next week, and I expect to finish displaying seats and tables assuming we have the coordinate data soon. Further work after this would be assisting Mehar with CV algorithm and data processing.

September 25, 2022December 21, 2022

Chen’s Status Report 9/24

After the presentation, we’ve gained various questions and suggestions from professors and peers regarding the user interface. Together, we decided to change from fixed seats mapping/dynamic seats mapping, to dynamic but insensitive to small movements(<1m)seat mapping, to reflect the position accurately but more friendly to users since small seat movements are not informative to users at all. To do this, at least for now, we will number the seats top left to bottom right to compare its current position to its last position. This will also address the case where a seat is removed or added, and free from frequent refresh. After this, we have our initial solution to the user interface. Other functions such as reserve seats or notification could also be implemented. I also looked a little bit into how exactly CV will be implemented, hopefully I may have the chance to participate in CV part if it time permits or it works out with my team.