How we can accelerate orthomosaic generation with GPUs
One of the most common outputs in photogrammetry is a 2D map, also known as an orthomosaic. It is used in agriculture, public safety, and surveying amongst other applications. Pix4D wanted to generate this output as fast as possible. Our team explored hardware acceleration as a solution.
The information in this article is provided by Vincent Dercksen, The R&D Team manager for our Core-Vision unit
Why use hardware acceleration?
Many consumer-grade computers nowadays are equipped with powerful Graphics Processing Units (GPUs): extra processors in addition to the regular CPUs. GPUs are primarily used for graphics, but they can also be used for Deep Learning, image processing, bitcoin mining, and other computational tasks. As this type of processor is so powerful, when it comes to number crunching, significant acceleration can potentially be achieved by transferring processing tasks from the CPU to the GPU.
We wanted to leverage this processing power to reduce the time users wait for the orthomosaic image to be created. Especially in the case of our PIX4Dfields and PIX4Dreact customers, who are processing on laptops in the field, as it would be very beneficial if results could be available faster. An orthomosaic is composed of many input images, which have to be projected, color-blended, and merged. Since creating an orthomosaic is ultimately a rendering task, using the GPUs of consumer-grade PCs and laptops to achieve accelerations is a promising approach.
Why choose Vulkan?
There are several different technologies (APIs) that use GPUs in C++ code, but support differs between operating systems and hardware manufacturers: Direct3D (Windows only), CUDA (NVidia only), Metal (Apple only), OpenGL (reaching end-of-life, support becoming worse on Windows and Mac). We had to decide which one to pick.
Vulkan is supported by all major operating systems and hardware manufacturers and is also on mobile devices. It is intended to be the successor of OpenGL, defined and supported by a wide range of companies combined in the Khronos group.
However, it is a relatively new technology. Learning resources are not as abundant as for OpenGL or CUDA, which are more mature technologies. It is not 100% clear if the industry will adopt Vulkan on a scale comparable to OpenGL. So there is a certain risk that this will not stand the test of time. Nevertheless, the benefit of cross-platform support was so big that we chose to pursue it.
What was the process?
There were 3 key obstacles to overcome:
1. Learning how to use Vulkan
The first obstacle was that we knew we had a powerful piece of technology in our hands, but needed to learn how to use it. The Vulkan API is very low-level. This means that you need code to precisely specify what you want the GPU to do: this results in lots of detail-heavy code. At the same time, this gives you a lot of flexibility to tweak the behavior and performance. Resources for learning about Vulkan are not currently as abundant as some other APIs: there is the specifications, a handful of books and some online tutorials, and example code.
This was enough to get us started. The rest, including how to manage a large piece of GPU code (examples usually only show a small aspect), was learnt on the job. Vulkan comes with quite a big learning curve, but once you are past that, there are many great opportunities from using it.
2. Adapting the algorithms
The second obstacle was an algorithmic one: we already had quite a fast algorithm for generating an orthomosaic from potentially thousands of images that runs on the CPU, but how could we transfer that to the GPU? The GPU programming model is quite different: you have to think about shaders, render passes, textures, uploading/downloading images between main memory and GPU memory, etc. We needed to rethink the algorithm and frame it in the GPU model, considering sub-problems like image selection, multi-image blending, and occlusion detection.
Additionally, the algorithm has different variants, e.g. for RGB as well as radiometrically corrected multispectral images, or deghosting for moving object removal. We wanted to have GPU implementations for those as well. Framing the algorithm in the GPU programming model has to be done very carefully, utilizing its strengths (computation, parallelism) and avoiding its weaknesses (limited memory, data transfer to/from GPU), otherwise performance improvement may not be significant and the entire effort of implementing and maintaining a GPU version of the code would not be worth it.
3. Product-proof code
The final obstacle was making the code product-proof. How could we ensure that it works on the many different GPUs that are out there, varying between models and vendors, all with differing performance, memory size, and driver versions? Does the MoltenVK translation layer that is required on Apple work as expected? We invested in a representative set of GPU models from different vendors to be able to cover a wide variety. We also involved the entire company, spread across the world, in doing tests for our team on all their different machines, which was a great help. Finally, we ensured that if we encountered problems, we would seamlessly fall back to the CPU implementation. However, after running in production for about a year, issues have been minimal and we can say that the Vulkan API lives up to its promise of being a widely supported cross-platform industry standard.
What comes next?
The GPU implementation of the orthomosaic generation is now present in several Pix4D products. The performance improvement we attained is significant: we measured accelerations of 4-5 times faster, even 8 times faster in some scenarios, saving the user precious time. Given this success, we can see the benefits of using Vulkan for other computational or rendering parts of our software products in the near future.
Got questions, or want to get involved? Join the conversation on LinkedIn and follow our hashtag #Pix4DLabs