Archive for the 'Business Intelligence' Category

Olap4j talks Palo – finally

There are many things happening behind the scenes here in R&D of Jedox. Besides keeping pace with release plan and supporting customer projects, our team also ensures to stay compatible with the trends outside. Few years ago, Julian Hyde, founder of the Mondrian project, asked if we would like to join his initiative of developing olap4j – open Java API for accessing OLAP data. You can look at it as JDBC of OLAP world.

Bild

We agreed to join in although we had very few, if at all relation to it back then. Time has passed, we eventually developed our ODBO/MDX Provider on top of Palo OLAP Server and olap4j slowly but steadily grew to unwritten standard. Last year, we’ve seen couple of applications emerging (e.g. Wabit) that already use olap4j and are able to connect to Palo.

Recently olap4j project announced that it will soon be reaching version 1.0. Vlado, our Head of R&D used this opportunity to test it against Palo. Apparently there was not much work to do. He just checked out latest source code, built it and he was ready to write his first olap4j based application. He used provided “hello world” sample. All he had to do is to adjust connection string to match the properties of Palo XMLA endpoint.

This is how this sample looks like:

import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.DriverManager;

import org.olap4j.CellSet;
import org.olap4j.OlapConnection;
import org.olap4j.OlapStatement;
import org.olap4j.OlapWrapper;
import org.olap4j.driver.xmla.XmlaOlap4jDriver;
import org.olap4j.layout.RectangularCellSetFormatter;

public class PaloConnection {

  public static void main(String[] args) throws Exception {
    Class.forName(“org.olap4j.driver.xmla.XmlaOlap4jDriver”);
    Connection connection =
    DriverManager.getConnection(
    ”jdbc:xmla:Server=http://localhost:4242;”
      + “User=’admin’;”
      + “Password=’admin’;”
      + “Catalog=FoodMart2005Palo;”
      + “Cube=Budget”);

    OlapWrapper wrapper = (OlapWrapper) connection;

    OlapConnection olapConnection = wrapper.unwrap(OlapConnection.class);

    OlapStatement statement = olapConnection.createStatement();

    CellSet cellSet =
    statement.executeOlapQuery(
      ”SELECT {[store].[USA]} ON COLUMNS , {[Account].[1000]} ON ROWS\n”
      + “FROM [Budget]“);

    RectangularCellSetFormatter formatter =
new RectangularCellSetFormatter(false);

    PrintWriter writer = new PrintWriter(System.out);
    formatter.format(cellSet, writer);
    writer.flush();
    statement.close();
    connection.close();
  }
}

GPU Aggregation Algorithm Scales Well with Multiple GPUs

A few weeks ago, we had the chance to test the Palo GPU Accelerator on a HighStation 550 XLR8 with 8 Tesla C1060 GPUs provided by the French company Carri Systems. Eight is the current maximum number of GPUs that can fit on one motherboard (and there is only one type of motherboard capable of holding 8 GPUs). Our main interest was how well the GPU aggregation algorithm would scale when more than 4 GPUs are used. In the upper part of the picture below you can see the 8 GPU Boards tightly fitted next to each other.

Bild

The figure below shows the performance scaling for the query performance with a real-world report and database from a prospective Jedox customer. As can be seen, performance scaling is almost linear for up to 4 GPUs and continues to scale to a speedup factor of 5.7 for 8 GPUs.

Bild

We think that Palo GPU performance scaling has no theoretical limit but is stopped only by
(a) hardware limitations, i.e. the number of GPUs that can be used in one system
(b) the cube size: cubes with very few filled cells will not benefit much from multiple GPUs
(c) the specific queries sent to the server: queries that are “simple” in the sense that already fast cannot be further sped up
The last point is most probably responsible for the slightly decreasing slope above 4 GPUs in our test. We have broken down the above analysis to each of the four individual PALO.DATAC queries in the test report (each querying a different cube), in order to see whether different queries scale differently.

Bild

As expected, some queries and/or cubes can benefit significantly more from multiple GPUs than others. The first query (Cube 1) scales extremely well until 7 GPUs, where a speedup factor of 6.4 is achieved, but seems to end there; the query on Cube 2 scales well until 8 GPUs, wheres for Cube 3 and Cube 4, scaling essentially stops after 4 GPUs (the last value of Cube 3 might be attributed to a measuring error). Note, however, that the latter two queries are already answered very fast (< 100 ms) on 4 GPUs and hence any further speedup will not as noticeable as for the other two.

Palo Web on Android Smartphone

Our IT got Android powered touch smartphone for testing. We used this oportunity to check how well Palo Web runs on it. Answer: very good. WebKit and 1GHz CPU make it possible to run just as good as on desktop PC without any changes. Would be nice to see some “optimized for mobile devices” applications.

Bild

Gartner waves white flag on Excel in BI

Around 500 Million people around the world work with spreadsheets, most of them with Excel. Plenty of them use Excel workbooks for planning, analyzing and reporting. However, most BI vendors have tried for decades to convince users to give up Excel in favor of what they thought to be more functional and manageable BI tools. Gradually, some BI vendors changed politics and offered integration with Excel, accepting Excel as a frontend. At the recent Gartner BI Summit in Las Vegas, the adversaries of Excel admitted failure: Gartner analysts and BI managers said that efforts to stop Excel BI use in its tracks were bound to fail and urged vendors and IT managers to make their peace with Excel as a BI tool. Managers who presented case stories stated “…the vast majority of end users – perhaps 90% – take data from the BI tools and export the information to Excel so they can work on it there”. Everybody is familiar with Excel, it is very flexible and easy to handle. Furthermore, Excel allows users to model ad-hoc queries or plan scenarios without technical skills but with fast results.

It is also stated that “the number of Excel BI users easily outstrips the combined total of people who are using the regular BI applications”.

That’s why Palo is not only designed for the so-called BI market but also for the multiple of people doing BI with Excel. Most of them are not even aware that doing reports and analyses with Excel could be called BI. So the BI market is much larger than it is usually thought to be. The circle with the doted line in the graph below shows the combined market of Palo both in the genuine BI market and in the spreadsheet market.

Bild

Of course, nobody would propose Excel as a stand-alone BI-solution. But Microsoft Excel-Add-ins like Palo offer central data storage and avoid the well known spreadsheet hell. Palo Suite allows Excel workbooks to be converted into web-based database applications. This helps combine the efficient development permitted by Excel applications with the advantages of web applications, which can otherwise take a lot of time and effort to implement.

Besides increased performance, what’s new in Palo 3.1?

Palo Suite 3.1

Palo suite 3.1

We are proud about yesterday’s release of the new version Palo 3.1 , which brought a strong progression in performance. Users already called it “a much much (!) increase in speed” and “a dramatic increase in speed.”  The release included Palo for Excel 3.1 (including Palo OLAP 3.1), Palo Suite 3.1.(including Palo OLAP 3.1 , Palo ETL 3.1 and  Palo Web 3.1). Also part of the release was the first ramp up of Palo OLAP GPU. I have already written periodically about the incredible gain in speed by using GPU in Business Intelligence, so today I would like to focus on Palo 3.1 and pick a few highlights from the more than 50 new features Palo 3.1 is offering:

  • New Styles for Palo for Excel. We refreshed the look and feel of the Paste View Style sheet. Take your style (The new styles are also available at the MyPalo community)
  • Palo Spreadsheet Auto-save and recovery. Like in Microsoft Office changed reports will be saved automatically and can be recovered from a previous saved version, e.g. if a connection to a Palo Spreadsheet gets lost during editing.
  • Enhanced User Management of Palo Web. New roles can be defined and used to setup security for Palo Web components and reports. For example: IT is supposed only to monitor and maintain the Palo ETL processes, so just assign the Palo ETL manager access to a group defined as IT, and all other Palo Web components will not be visible to them.
  • MyPalo Community integration. Palo Web and Palo for Excel are now able to store your MyPalo Community account. By this, while working in Palo, you can access advanced information through our MyPalo website.

There loads of other features to make working with Palo more comfortable – Palo fully supports now Microsoft Internet Explorer 8, contains a new homepage button to get back to homepage with on click , offers extended chart configuration for Meter charts, or German and French interfaces for Palo Pivot, or automatic calendar generation for time dimensions, etc.

All new features are listed in the “What’s new in Palo 3.1” – guide, which is available under http://www.jedox.com/en/community/mypalo/my-palo-installation-first-steps.html.

Ribbon Toolbars in Palo Web 3.1

Palo Web 3.1 contains a lot of new features. Even if the final version of Palo Web 3.1 will not be published before the end of March, I would like to draw your attention to some of the features. In this post I will show the new ribbon elements as an optional replacement for the “old school” toolbar/menu bar. They are part of Palo Web 3.1, to see them now you have to download Palo Suite 3.1 Ramp up version on www.jedox.com.

Bild

The ribbon user interface was made popular with the release of MS Office 2007. It was developed with intention to increase productivity by better organising features into sets that are easier accessible and more often used. Benefits of the ribbon elements are somewhat controversial though – they are actually matter of taste – they are evolving a long tradition of menu/toolbar driven user interfaces. Therefore, unlike in MS Office, users in Palo Web can switch easily between the two interfaces (Options/Spreadsheet/Toolbar) and use the one they like the most: the classic menu with a toolbar or the new ribbon.

In the following screenshots you can see a selection of the ribbons you find in Palo Web.

Bild

Bild

Bild

Bild

You can download Palo Suite (Ramp Up) here.

Parallel algorithms for Palo Cube Rules

In the previous weeks several people asked me, why Jedox so far is the only BI company that invests in the GPU technology. GPUs make sense when the speed of execution matters. And speed does matter for Palo users, especially when it comes down to financial planning and simulation.

Whenever planning data or planning assumptions are changed at the base level, all aggregations have to be recalculated as quickly as possible to get new consolidated results for a new planning scenario. To deliver this speed, already back in 2005 the Palo developers decided to use an in-memory technology for Palo, which by itself delivers more speed than a disk-based or relational approach.

Choosing in-memory was a wise decision and a lucky one as well, because GPU acceleration actually is only effective in an in-memory architecture (also including the graphic memory of a GPU). GPUs are not helping much on a hard disk or inside a relational database.

Recently I had an interesting conversation with Dr. Tobias Lauer from the Institute of Computer Science at the University of Freiburg. Tobias is one of the research genius behind Palo GPU and he explained how Palo benefits from the parallel algorithms that run in today’s GPUs. This is what I understood from him:

A parallel algorithm utilizes hardware architectures with multiple processing units (processors or processor cores) by executing simultaneously (= in parallel) individual steps of a program that would otherwise be computed sequentially. Depending on the number of available processors, one can distinguish multi-core moderate parallelism (e.g., 2-16 cores) and massive parallelism (hundreds or more processors).

The latter category includes modern GPUs, each consisting of several hundred processing units. Since all the individual processors of a GPU usually execute the same code at the same time, this architecture is suitable for data-parallel (the same operation on many different data) rather than task-parallel applications (different things to be executed simultaneously).

A very simple example from the business intelligence context would be the function

turnover(P) = quantity(P) x price (P)

for a product P. Instead of storing all three figures in the OLAP database, it is sufficient (and for reasons of memory requirement and data consistency even desirable) to save only the quantity and price for a product P and to calculate the turnover dynamically (by an Cube Rule) from those.

For the calculation of the total turnover of a whole group W of products, the equation turnover(W) = quantity(W) x price (W) will lead to a wrong result if quantity(W) is the cumulative total number of all goods and price(W) is the aggregated price. Hence, the individual turnover for each product in the group W must be calculated first, before they can finally be summed up (or, using Palo terminology: we have to use an N-rule). Sequential programs need to run each of the calculations after one another, roughly like this:

1. For each product P in W do (sequentially):
a. Find the quantity and price of P
b. Multiply the two values
c. Add that product to the result
2. Return result

Our new approach is to do these individual calculations in parallel, i.e. to calculate simultaneously. Graphics processors (GPUs) are ideal architectures for this: the same operation (here: multiplication) is executed on many different data sets (here: quantities and prices of all products). A bit over-simplified, our algorithm performs the following steps:

1. Find quantities and prices for all the products P in W simultaneously.
2. Match these records so that quantity and price of the same product are placed next to each other (very quick through parallel sort)
3. Multiply all related pairs (quantity, price) simultaneously and store the results in an array.
4. Add up the array to get the overall result (very quickly by parallel reduce)
5. Return result

Unlike the above sequential algorithm, our parallel approach can perform two steps – finding data and multiplication – for all data sets almost simultaneously. The sorting and the final summation are accomplished by standard algorithms of parallel computing which are also very fast.

In initial tests we have seen very promising results, where our parallel approach has achieved significant speedups compared to the sequential algorithm currently used in Palo.

Palo adds “light” to BW

Even if Germany’s predominant SAP is not pursuing an Open Source strategy (yet*), SAP clients take a different position. Money matters, especially in midsized firms, in manufacturing, in commerce and quite badly in public administration. Lots of SAP users are looking for affordable, flexible alternatives to SAP BW, especially for planning, but also for reporting and analysis. In plain language, they are scanning the market for something like a “BW light”.

Palo OLAP Server can play this role. The latest release of the Palo Suite now has SAP interfaces SAP R/3 ERP (in addition to SAP BW Connector in the previous release). So with the new and enhanced Palo SAP connectivity, it is now no longer necessary to refer to SAP BW for OLAP analysis using SAP data. SAP R/3 and ERP system users who do not require full BW functionality can now use the Palo Suite and Palo SAP Connectivity as an easy and very flexible alternative to a BI platform which can be installed quickly and is ideal for use by professionals.

Bild

With access to SAP BW and SAP R/3 ERP-systems, Palo can now be integrated optimally into SAP landscapes. SAP data is extracted simply and effectively at the table level or through a generic RFC /BAPI interface. The ETL process is fully modelled using a graphic web front-end. Details about the new Palo SAP Connectivity are available at: http://www.jedox.com/en/products/palo-sap-connectivity.html

* which they could, since SAP makes 75% of their revenues with software related services