A Very Big Step Forward towards City Scale HD Maps
At CES 2019, we launched a new product called City Scale HD Maps and demonstrated some core functionality of achieving city scale HD mapping operations. Looking back we have accomplished very large milestones since then.
At Civil Maps, we saw a disconnect between the Proof-of-concept HD Maps requirements for R&D teams at car manufacturers and the scalable production requirements from procurement. If Self Driving Vehicles need to go to market, they need a scalable map operation. Upon deciding that we needed to create a city scale map to prove scalability; Civil Maps allocated new resources in a division called Team MapX.
Team MapX stands for team map expansion and the focus of Team MapX is to take existing tooling that we utilized for small proof of concept contracts and see if it was feasible to upgrade those tools for use in large city scale maps.
In early 2018, we embarked on the journey and quickly realized that the unit economics do not add up for self driving cars unless we reduce the cost of mapping. The HD Maps cannot be a significant cost chunk in the Self Driving unit economics based on our customer development. We heard various price points being established in the market ranging in the thousands per kilometer. This obviously will not scale from a budgeting perspective. We went back to the drawing board to find some answers.
After doing a cost breakdown, a majority of the costs for mapping operations comes from driving logistics, data management, loop closure, feature extraction, semantic data, and cloud processing fees to scale.
There are four different types of computing bottlenecks you can run into given a complex data pipeline. These computing bottlenecks can be due to memory, CPU, network or storage. Previously the cloud infrastructure powering our mapping operations was running on the Amazon Web Services platform, however we were racking up bills for $80K-$100K per neighborhood every month. This once again did not match the unit economics we were trying to establish. After doing a cost analysis we realized that second hand servers from eBay were a viable option to quickly ramp up the compute capacity that we needed. Using a mix of Airflow, Iron Workers and cheap compute we were able to launch a container cluster of 2000 containers with 6GB of memory per container. We are currently engaging in an exploratory study where we utilize Arm based servers for computational benchmarking. We are super excited to share the results of this study early Q3 2019.
Our storage framework that supports city scale HD Maps is a complex abstraction between 3 different types of systems. We utilize Redis as an in-memory cache and this is one of the fastest way to move data into the CPU’s L1 and L2 cache or the GPU’s memory. The second layer above Redis is an edge distributed file system that uses Ceph to move data between servers over the network in our private cloud, and lastly we synchronize with Amazon Simple Storage Service (S3) to have persistent distributed storage across all edge sites. The storage abstraction is a seamless interface for developers to read and write to our data management layer. This allows us to quickly change technologies or data sync architectures without impacting the application layer. Our developers love the abstraction since it means less code to refactor.
After our drivers comes back from their mapping mission each day, the sensor data from the mission is directly uploaded to S3 and persistently saved. The first stage of the mapping operation requires loop closure. Loop closure is a pipeline that allows the multiple trips that intersect with each other to look consistent and increase the resolution of the HD map.
Traditionally Loop Closure pipelines use ground control points to anchor the point cloud towards a reference point that establishes the ground truth location of a particular feature and object. These ground control points are usually collected by professional surveyors. This approach was not going to scale in a city like San Francisco where urban canyons persist. At Civil Maps we built technology to automate the loop closure using machine vision that achieves 5CM global georeferencing accuracy. It works in high drift, urban canyon environments such as SOMA or NOMA neighborhoods. Our current capacity is 500km per day and we are working on expanding this processing capacity to handle multiple city workloads in the near future.
RTK GPS sensors do not produce reliable location information in complex urban environments. The sensor data drifts while passing through urban canyons. This creates complexities while loop closing. Normally the tall buildings create multipath issues for the GPS. Even with differential GPS antennas the performance is quite poor, especially in cities like San Francisco.
In addition to RTK GPS, there are also sensor fusion techniques using visual odometry fusing camera, LiDAR and IMU sensors to create a corrected trajectory. However the incoming quality of the IMU sensor itself is quite poor due to the bad GPS signals. This approach has its limitations and is at best an estimation of the ground truth position in the real world.
At Civil Maps we have introduced a novel new loop closure approach & technique that leverages machine vision while using Aerial Imagery and Aerial LiDAR as the reference shared coordinate system. The reason why this is more beneficial is due to the inertial system being onboard the plane instead of on the ground next to tall buildings. The GPS error and IMU drift is drastically reduced if the inertial system is in the sky with clear line of sight to the satellite constellations above. This allows us to obtain very accurate trajectory information that can be corrected with 30–50 ground control points for roughly 1,000 linear kilometers of road network compared to terrestrial only loop closure which is once every 100 meters. This is an order of magnitude reduction in the ground truthing requirements. We utilize aerial datasets that go through aero triangulation to create a very accurate shared coordinate system. The shared coordinate system is the ground truth georeference dataset that allows us to fuse point clouds from multiple sources.
It takes 8–12 hours to cover 100 sq. kilometers with a 40–50 ppsqm point cloud via Airplane vs multiple months of driving to collect data. This is a clear example of the type of simple ingenuity we celebrate and prioritize at Civil Maps. All problems do not need to be complex, we can often simplify the problem so the solution is quite tangible and elegant. We flew the airplane to get highly accurate low resolution point cloud data. Using a combination of Imagery and LiDAR to get XYZ corrections by finding common features between our Aerial and terrestrial point clouds. Once the corrections are obtained, we fused the corrections into the odometry trajectory to create a corrected trajectory that has accuracies of 5cm.
Very few mapping companies can claim accuracies of 5cm, most base maps today range between 15cm-25cm accuracy based on using ground control and complex RTK trajectories. However the measurement of accuracy using terrestrial RTK cannot be trusted in complex urban environments. The terrestrial ONLY approach doesn’t work, there are too many variables that can disrupt the quality of the sensor fusion or validation methodology. The misreporting of the RTK covariance is the number one cause for bad sensor fusion quality. The covariance from RTK cannot be trusted because the solution generating the trajectory is essentially measuring it’s own accuracy.
In our case, we are orthogonally comparing independent datasets against each other, wherein one is certified to be highly accurate and not prone to GPS drift due to trees, buildings or rapid changes in acceleration. With this approach we can confidently measure our mapping accuracy and rapidly scale by integrating each terrestrial trip into the shared coordinate system that was created through the Aerial Imagery + Aerial LiDAR datasets. So with a high degree of confidence, our HD maps are certified to be 5cm accurate and we are proud to say we implemented and brought to fruition a methodology to create the best in class base maps. This opens up new possibilities to create scalable city scale HD maps. We call this vision positioning system and this is a viable option to replace GPS in any city where we have base map coverage.
After loop closure, we then proceed to feature extraction. Our feature extraction pipeline divides up the map into separate geospatial areas. Using map reduce via AirFlow; individual docker containers pick up reference point cloud data that has been published after loop closure to extract features. This feature extraction process extracts relevant lane features and geometry to create the vector geometry required by the map.
After the vector layer is created; we create triggers which are connectors between individual lane centerlines. After the triggers are created a separate job utilizing 3D raycasting in Unity creates the neighboring lane semantics. This automates the semantic relationships for lane departures through intersections and lane merges.
Tooling constraints are a big issue as the toolchain migrates from small scale PoC projects to neighborhood scale maps to city scale maps. The way the data is organized needs to be re-architected and presented to the visualization layer using different indexing methodologies. The backend also needs changes to quickly filter and serve content on-demand. Additionally, syncing and incremental updates between edge devices and the cloud infrastructure is also important to quickly send change events from the database to all client devices or vehicles.
As our scale expanded, several of our tools needed to be refactored to handle the compute, memory and network constraints. Team MapX worked with our product and engineering team to profile and optimize each tool within our stack. In addition to performance, we also profiled the user friction in utilizing each tool to improve the efficiency of the individuals QC’ing the map. Different incentives, double blind, triple blind experiments were put into place to guarantee the quality of the map. As the scale increased, small gains on reducing user friction or improving product experience paid huge dividends in team productivity and map creation speed.
Quality control is the last stage of the map creation process. We have several tools that exercise stochastic sampling of the map data to certify the accuracy of the map.
Once the quality control is done the map and the features that are part of the map get published into our web services for consumption. A city scale HD Map can be several terabytes of data due to its large size. At Civil Maps we convert the raw point cloud data within the HD map into what we call a fingerprint base map. The fingerprint base map is an encoded version of the raw point cloud data which is 10,000 times smaller. Fingerprint base maps can be used for planning, routing, localization and various other applications. In summary, this new pipeline allows us to reduce the costs by a factor of 20 and we are excited to share those savings with the industry.
On the edge, we support two localization methodologies in our vision position system methodology.
Approach one uses fingerprint base map technology in order to localize the car. We are still continuing to improve and refine this technology. Our newer localization approach uses camera imagery to compute the pose estimation. We will expand on this topic in our next blog release.
As always, we are by no means done. There is a lot of work ahead of us, but it is a good milestone to stop and reflect on our accomplishments to date.
Special thanks to Anuj Gupta, Nicholas Stanley, Naresh Sunkara, Saket Sonekar, Satya Vakkaleri, Rajashekar Bhuma, Scott Harvey and other members of Civil Maps for supporting this Team MapX initiative.