(Semi-)Transparent Drinking Glasses Detection Using Real-World & Synthetic Imagery.

Background/Problem Statement
Over the years, detecting semi-transparent objects in generalized contexts has proven to be a difficult task. Traditional image processing techniques are available, however, due to object transparency, they are ineffective.

Challenge
There have been several attempts to identify translucent or semi-transparent objects, but they have all been limited to well-known controlled circumstances. Furthermore, these systems operate in unique environments, necessitating the use of highly specialized and expensive equipment.

Research Goal
This project intends to detect semi-transparent objects in a specific as well as a generalized environment without using any of the special types of equipment such as time of flight cameras or X-ray tomography, to overcome the limitations in previous solutions.

Proposed Methodology
Dataset Used

To achieve the desired goal, more than 40,000 synthetically generated images were generated in different environments along with 8,000 real-world images of multiple types of drinking glasses placed in ambient surroundings.
The data is collected using two approaches such as manual collection by labeling images and automatic generation by labeling data using a computer graphics tool. The overall number of semi-transparent objects (such as drinking glasses) was 25. These real-world photographs were captured in a variety of lighting conditions and with light sources positioned in various places relative to the camera, whereas a new Blender plugin was created to assist the user in the mass production of synthetic images.
Additionally, a user must tag different models within it, after which the plugin takes care of changing the rendering conditions. Below are some real-world and artificially made image examples.

Real-World images from the dataset

Synthetic Images from the dataset

Experimental Setup
Following the collection of data, these various combinations of real-world and synthetic images were trained on five different CNNs’; Faster R-CNN inception, Faster R-CNN resnet101, R-FCN resnet101, SSD inception, and SSD mobilenet. Tensorflow’s object detection API is employed for this training purpose. These CNNs are also trained on high-performance computing systems for several days to produce accurate results in a generalized setting. To evaluate the detection accuracy, two separate test sets benchmarks are analyzed to compare the performance utilizing different proportions of synthetic and real-world photos.

Results

Models are initially trained on vast quantities of synthetic data and then subsequently trained using a small number of real-world photographs, according to the findings, as opposed to models trained solely on large quantities of real-world images. In this way, IoU values against each benchmark can be calculated using five separate CNNs. Faster R-CNN resnet101 had the best detection accuracy for the home testing dataset, with 87% for IoU = 0.3 and 76% for IoU = 0.5. For the aged care test dataset, R-FCN resnet101 performed better, with a detection accuracy of 84% for IoU = 0.3 and 80% for IoU = 0.5.

Results depicting the detection accuracy

Limitation

The major obstacle this project faced was the lack of a large enough number of training images required to train a convolutional neural network in a generalized setting to obtain the best detection accuracy.


Conclusion

Considering the challenging nature of semi-transparent objects, which can be rendered invisible even to the human eye by certain angles and light reflections, these results are excellent. Using real-world and synthetic photos, this project was able to recognize semi-transparent objects in a generalized setting.

Our vision is to lead the way in the age of Artificial Intelligence, fostering innovation through cutting-edge research and modern solutions. 

Quick Links
Contact

Phone:
+92 51 8912223

Email:
info@neurog.ai