Penn Researchers Win Best Paper at International Robotics Conference
A team of robotic vision experts from the GRASP lab has won the “Best Conference Paper” at the 2017 IEEE International Conference on Robotics and Automation in Singapore. Their paper, Probabilistic Data Association for Semantic SLAM, describes a technique for helping robots identify and label navigational landmarks in the maps they build for themselves.
The paper’s authors are George J. Pappas, Joseph Moore Professor and Chair of the Department of Electrical and Systems Engineering, Kostas Daniilidis, Ruth Yalom Stone Professor of Computer and Information Science, graduate student Sean Bowman, and postdoctoral researcher Nikolay Atanasov.
SLAM, or simultaneous localization and mapping, is a computer vision task that many autonomous robots must engage with to navigate within unknown environments.
“SLAM research has provided robots with algorithms so that they can localize themselves indoor and without GPS. Location and maps were described with paths and point clouds, but not a single label, like looking at Google maps without any text on it,” Daniilidis says. “Semantic mapping recognizes objects and landmarks around and creates maps that are easy to comprehend and help in giving directions a human would do. They also help to better localize when you revisit the same place. Driverless cars, for example, need to label buildings, shops, traffic lights, stop signs and other vehicles.”
This kind of labeling is challenging because, from a computer vision perspective, many of these features look identical to one another. In order for a robot to determine it’s own location, it needs to be able to correctly associate a specific landmark with that landmark’s location on a map.
“It’s like as if you entered an office corridor where all the doors looked the same,” Daniilidis says. “You can have a mental map for them, but you still have to find the best correspondence between doors in your brain and doors you see.”
The team’s insight was a probabilistic approach takes the average over all such correspondences. Their technique allows computer vision systems to assign semantic labels to features in both indoor and outdoor environments.