GPU Aggregation Algorithm Scales Well with Multiple GPUs

A few weeks ago, we had the chance to test the Palo GPU Accelerator on a HighStation 550 XLR8 with 8 Tesla C1060 GPUs provided by the French company Carri Systems. Eight is the current maximum number of GPUs that can fit on one motherboard (and there is only one type of motherboard capable of holding 8 GPUs). Our main interest was how well the GPU aggregation algorithm would scale when more than 4 GPUs are used. In the upper part of the picture below you can see the 8 GPU Boards tightly fitted next to each other.

Bild

The figure below shows the performance scaling for the query performance with a real-world report and database from a prospective Jedox customer. As can be seen, performance scaling is almost linear for up to 4 GPUs and continues to scale to a speedup factor of 5.7 for 8 GPUs.

Bild

We think that Palo GPU performance scaling has no theoretical limit but is stopped only by
(a) hardware limitations, i.e. the number of GPUs that can be used in one system
(b) the cube size: cubes with very few filled cells will not benefit much from multiple GPUs
(c) the specific queries sent to the server: queries that are “simple” in the sense that already fast cannot be further sped up
The last point is most probably responsible for the slightly decreasing slope above 4 GPUs in our test. We have broken down the above analysis to each of the four individual PALO.DATAC queries in the test report (each querying a different cube), in order to see whether different queries scale differently.

Bild

As expected, some queries and/or cubes can benefit significantly more from multiple GPUs than others. The first query (Cube 1) scales extremely well until 7 GPUs, where a speedup factor of 6.4 is achieved, but seems to end there; the query on Cube 2 scales well until 8 GPUs, wheres for Cube 3 and Cube 4, scaling essentially stops after 4 GPUs (the last value of Cube 3 might be attributed to a measuring error). Note, however, that the latter two queries are already answered very fast (< 100 ms) on 4 GPUs and hence any further speedup will not as noticeable as for the other two.

2 Responses to “GPU Aggregation Algorithm Scales Well with Multiple GPUs”


Leave a Reply

Spam protection by WP Captcha-Free