Advancements in reliable navigation and mapping rest to a large extent on robust, efficient and scalable semantic understanding of the surrounding environment. The success in recent years have been propelled by the use machine learning techniques for capturing geometry and semantics of environment from video and range sensors.
I will discuss approaches for object detection, pose recovery, 3D reconstruction and detailed semantic parsing using deep convolutional neural networks (CNNs) as well as challenges of deploying these systems in real-world settings.
While perception and decision making are often considered separately, I will outline few strategies for jointly optimizing perception and decision making algorithms in the context of elementary navigation tasks. The presented explorations open interesting avenues for control of embodied physical agents.