DeviationAnalysis: going to speed.
This is the first post of of series about technical insights of the development of the new version of DeviationAnalysis, one of the main tools of cloud2model.
The performance of DeviationAnalysis has been radically improved in the last 1.13 release, being until 15X faster than the previous version. In order to achieve this level of performance, the internal architecture of the tool has been fully reworked.
Cloud extractions.
The previous system was based in making a separate point cloud extraction for each individual deviation point to study (red points in the image above). The extraction volume is determined by the deviation radius and max. distance. Typically it will be a small prismatic volume orientated with the face normal in the point (in the image above the volumes sizes are exaggerated for clarity). If the volume is very small, the points extracted will be only a few, even with high density point clouds. And the less points extracted, the less needed analytical calculations when comparing the original deviation point with all the extracted points. From the point of view of the deviation calculations, this approach looks reasonable, performance wise.
But there is a big issue in this system: the point cloud extractions are “expensive” operations in the context of the Revit API. It does not matter if the extraction volume is very small and the points found are only a few. There is some improvement if they are parallelized, but it is small.
Making a separate point cloud extraction for each deviation point implies hundreds or thousands of these expensive operations when making analysis. The too many point cloud extractions were the performance bottle neck of the tool.
Sectors.
The new approach is based in greatly reducing the number of point cloud extraction needed. We divide the face is big areas or sectors, with enough overlap to ensure that any deviation point will be inside enough to allow the deviation radius. And we make just one point cloud extraction for each sector.
From the point of view of the deviation calculations, the new system can look a bit scary at first sight. The extraction volumes are comparatively huge, and the number of points extracted will be several order of magnitude bigger. That will imply many more calculations needed when checking each deviation point against all the extracted points of the sector that contains it. But this calculations can be optimized and parallelized very efficiently.
A lot of research was focus in setting the best balanced size of the sectors. No too big, that will imply too many deviation calculations. But not too small in order to avoid too many point cloud extraction. The result is a great reduction of the needed point cloud extractions, while keeping the small processing time for the calculations.
Batches.
The previous system processed each individual face separately, included a viewport update to show the analysis result of the face. This was interesting from the point of view of used feedback. But in cases with many smaller faces, like an irregular floor slabs, the processing was clearly slower than with a few big faces.
This has been addressed in the new system, combining the sectors of all the different faces in homogeneous batches to be processed. In this way we maintain a similar CPU load during the whole processing time. The efficiency will be very similar for big or small faces, or combination of big and small faces.
Additionally, we make just one global viewport update at the end of the command. It is true, that we have lost the partial viewport update for each individual face. But the new system is so much faster with many faces, that is more than worth it.
GPU kernels.
One great advantage of the new system is that the point cloud extraction is dissociated to the analysis point division. As the image above shows, we have the same sector layout for medium density (left) and high density (right). Now to use the high density of points does not imply so long processing time as before. With the old system, typically you can expect 3-4X longer times compared with the default medium density. With the new system it will be just around 1.25X.
With medium density the actual deviation calculations are just a very small part of the processing time. Most of it is the actual point cloud extraction. But with high density, we start to notice a bit more its influence. Is there a possibility to improve this? Yes, using the GPU.
The dev. calculations can be very well organised to be processed in a GPU kernel. But we have the issue of the amount of needed data to copy to the GPU memory. The transfer from CPU memory to GPU memory is relatively slow. And if we need to copy huge amount of data, the time involved can offset any performance gain in the actual calculations.
In this case, it is needed to copy the point cloud extraction of each sector, that can be until 1 million points. Exceptionally the Revit API exposes the internal native pointer of the extracted points, that allows to copy the points fast enough. The actual calculations will be blazing fast even in a low tier dedicated GPU, with only around 1000 CUDA units.
If we use the GPU, it will be almost no difference between the processing times of medium and high points density.