RemoteExplorer

1 Overview

As the self-proposed project for CSCI2951K: Topics in Collaborative Robotics, I worked on the mapping and reconstruction of remote environments as a 3D virtual environment for exploration. For this project, I used the Kinova Movo as the robot.

Reconstruction of the 8th floor of the Brown Sciences Library

2 Motivation

Robots are powerful tools for exploring remote and hostile environments inaccessible to humans. From conducting rescue operations in disaster zones to surveying unexplored areas in combat situations to space exploration, robots present a safe, viable alternative to direct human engagement. Using robots for remote exploration first requires mapping the remote environment.

The current approach to mapping an unknown environment is using a simultaneous localization and mapping (SLAM) algorithm. However, these maps are often not virtual environments that can be explored. As such, the 3D environment needs to be projected onto a 2D surface like a screen - this is unintuitive for exploration as this is different from how humans view the world. Previous approaches are also mostly focused on VR teleoperation - this means that they only visualize the immediate surroundings of the robot at that point in time instead of generating a persistent map.

Loss of curvature when mapping from 3D to 2D

Therefore, the motivation of this project is to map and reconstruct environments in a way that is intuitive for human exploration. I believe that a 3D virtual environment that is explorable via a virtual reality (VR) system best answers this.

3 Technical Approach

As off-the-shelf SLAM algorithms are already mature and robust, we used RTAB-Map as the SLAM algorithm for mapping the environment.

With this, we essentially need to facilitate the following feedback loop:

The robot collects and sends sensor data to RTAB-Map
RTAB-Map generates and transmits map data to Unity
Unity renders the map as a virtual environment for the user to see
The user sees the virtual environment that is updated in real-time and sends the appropriate movement inputs to the robot (back to step 1)

To faciliate the above feedback loop, we created a software stack outlined below:

On the ROS side (right), the robot publishes 4 ROS topics: color image (sensor_msgs/CompressedImage) and depth image (sensor_msgs/CompressedImage) from the Movo’s Kinect2, transform (tf2_msgs/TFMessage) and base scan (sensor_msgs/LaserScan) from the Movo’s LiDAR.

RTAB-Map subscribes to these 4 topics and performs SLAM, publishing two ROS topics that constitute the map: grid map (nav_msgs/OccupancyGrid) and global map (rtabmap_ros/mapData).

We then use ROS-Sharp to communicate between the ROS side and the Unity side - 5 streams of data are sent over to the Unity side: transform, color image, depth image, grid map and global map. The Unity scene has 4 GameObjects that render these 5 data streams:

TF Listener: Uses transform to position the virtual Movo within the scene
Grid Map Renderer: Renders grid map as a 2D mesh covering the floor
RGB-D Renderer: Uses color image and depth image to generate and render a RGB-D point cloud that represents the current view from the Movo’s Kinect2
Global Map Renderer: Renders map data as a global point cloud representing the map of the remote environment.

As the Unity scene is continuously updated in near real-time, the user views the scene using an Oculus Quest VR system. The user then uses the Oculus Quest controllers to send controller inputs (sensor_msgs/Joy) that are sent over ROS-Sharp and published as movement inputs (geometry_msgs/Twist) to the Movo. The Movo moves according to these movement inputs. This completes the feedback loop.

3.1 Rendering in Unity

Here, I will elaborate on how each of the 5 data streams is rendered and the problems encountered.

3.1.1 Transform

A robot has many different coordinate frames - the transform describes how they are arranged in relation to each other.

To position the virtual Movo within the Unity scene, we parse the transform messages and arrange the Movo’s frames accordingly. However, we encounter bandwidth issues as there are a lot of frames within the Movo, hence creating a flood of tf2_msgs/TFMessage messages.

Optimization: As we only need /map->/odom and /odom->base_link messages, we wrote a ROS node (tf_minimal.py) to filter out other uneeded transform messages. To filter out all the other unneeded tf2_msgs/TFMessage messages, we wrote a ROS node (tf_minimal.py) to filter out messages from /tf and only publish /map->/odom and /odom->base_link messages on the /tf_minimal topic.

3.1.2 Grid Map

The grid map is a 2D map of the environment taken at the height of the Movo’s LiDAR sensor.

We initially transferred nav_msgs/OccupancyGrid messages published by RTAB-Map and manually created a texture with the data stored in these messages. However, we encountered bandwidth issues when sending nav_msgs/OccupancyGrid messages across ROS-Sharp. This would cause the Unity scene to freeze for a moment every time a grid map is sent.

Optimization: We created a ROS node (slam/grid_image_node.cpp) that first converts the grid map into an image, then compresses it using OpenCV into a JPEG image. As the grid map grows larger, the ratio of compression grows from ~x2 to ~x15. Therefore, compression of the grid map image scales well with the size of the grid map.

3.1.3 RGBD Image

A RGB-D image is produced by the Movo’s Kinect2 as a color (RGB) image and a depth (D) image, providing the current view from the Movo.

We initially transferred sensor_msgs/PointCloud2 messages that were generated by RTAB-Map as they are easier to render, but again faced bandwidth issues.

Optimization: We send compressed color and depth images, then manually integrate them to generate a point cloud on the Unity side (RGBDRenderer.cs). This overcomes the bandwidth issue as the images have been compressed.

To generate a point cloud from the color and depth images:

Decompress images using OpenCVSharp
Generate a Unity mesh instead of a GameObject for each pixel as this is less computationally expensive
Use a shader instead of setting the depth and color for each vertex of the mesh manually so as to leverage the GPU for faster rendering

RGBD point cloud generated from color and depth images

3.1.4 Global Map

The global map is a map of all the areas of the remote environment that RTAB-Map has mapped so far.

We initially transferred sensor_msgs/PointCloud2 messages published by RTAB-Map as they are far easier to render, but encountered bandwidth issues again. This is because the sensor_msgs/PointCloud2 messages contains the whole map instead of just newly added points, hence transferring a lot of redundant data. Furthermore, these messages do not contain any information about direction of each point, so we are unable to draw a quad for each point, resulting in a rather sparse rendering of the environment.

Optimization: We transferred rtabmap_ros/mapData messages published by RTAB-Map that is a representation of RTAB-Map’s internal graph. As each rtabmap_ros/mapData message contains only the latest graph node’s RGB-D image, no redundant data is sent. Additionally, each node has a position and orientation, allowing us to generate a point cloud for each node from its RGB-D image with quads. This results in a denser rendering of the environment:

Denser rendering using rtabmap_ros/mapData

3.2 User View and Control

As we want the user to be able to freely explore the Unity scene without being restricted to the Movo’s physical location, we separated the user and the Movo as two distinct entities.

4 Evaluation

4.1 Successes

In regards to mapping, we successfully mapped the 8th floor of the SciLi using RTAB-Map as our SLAM algorithm. Mapping was done close to real-time in a Ubuntu 16.04 virtual machine with ROS Kinetic and playing a rosbag that recorded a session of driving the Movo around on the 8th floor of the SciLi.

In terms of rendering, we were able to transfer all 5 streams of data from ROS to Unity in close to real-time. We were able to achieve this by overcoming significant bandwidth constraints through optimizing the transmission of data from ROS (in the Ubuntu 16.04 VM) to Unity through ROS-Sharp.

We were also able to render these streams of data in the Unity scene to form a generally cohesive virtual environment for the user to explore. We believe that this is an improvement on current approaches for mapping and reconstructing a virtual environment:

Comparison of map reconstruction between our project (left) and Sekula (right)

4.2 Shortcomings

One shortcoming is that we have not been able to run the project on the actual Movo. For some yet-unknown reason, RTAB-Map is not receiving any messages published by the Movo, despite us being able to echo the messages on the command line. As such, we have only ran the project on a Ubuntu 16.04 VM and ROS Kinetic with a rosbag recording.

Another thing we can improve is the rendering of the global map. As seen below, some of the nodes are not positioned correctly, resulting in an inconsistent scene.

5 Future Work

There are a few paths for future work:

Improve transmission of data
- Parallelize the transmission of data or find a better alternative to ROS-Sharp
- Can transmit higher quality data or operate with even greater bandwidth constraints
Automation of exploration and pathfinding
- If unable to overcome bandwidth constraints such that real-time mapping and control of the robot is impossible, can allow the robot to autonomously explore the environment
- Use pathfinding so that robot can move to selected locations within the map autonomously
Integration with VR Teleoperation
- Allow the robot to manipulate objects in the environment
- Can even enhance exploration by opening up new areas (eg. opening doors or remove obstacles)