Humans possess a remarkable ability to parse images by simply looking at them. In a blink of an eye, we are able to fully analyze a scene and separate all the components on it. Furthermore, humans can easily generalize from observing a set of objects to recognize which ones have the same properties or are similar to a given sample. Even if some have never been seen before.
Why should we care about automatically detecting these objects? Well, they are a primary source of information to create real understanding from the scenes. The detected objects can be used to further reason about the world. Hence, these algorithms can be used in robotics, medical imaging, surveillance, among many other applications.
Segmentation is commonly divided into four subproblems: image classification, object detection, semantic segmentation, and instance segmentation. Image classification attempts to identify all the objects present in an image, independent of their location. Object detection is interested in using a bounding box to specify the location of each object. Semantic segmentation aims to densely label each pixel. And finally, instance segmentation consists of identifying different objects belonging to the same class. Nevertheless, the models on computer vision to perform semantic and instance segmentation proved to be particularly difficult to build, train, and test.
I explore the semantic and instance segmentation, with emphasis on video understanding. This task comprises the coherent and consistent identification of the boundaries and classes of the different objects within a set of images. The ultimate task is to use these detections as a source for information to perform reasoning in the world.
- Semantic Segmentation on Videos. PhD Scholarship, São Paulo Research Foundation (FAPESP). 2017