Table of Contents
This chapter will introduce the core features of MOLAP and is intended to help new users get started with analyzing their data.
A multidimensional data model is typically organized around a central theme, such as
            sales. This theme is represented by a table that we call DataFrame. The
            data for the data frame can typically come from CSV or Excel file, an SQL database, a
            Swing TableModel, or from an array as in the example below:
DataFrame<Integer,String,?> dataFrame = DataFrameFactory.fromRowMajorArray(
    new String[]{"Item", "Origin", "Quantity", "Price"},
    new Class[]{String.class, String.class, Integer.class, Double.class},
    new Object[][]{
            {"Apple", "Switzerland", 12, 112.0},
            {"Pear", "Switzerland", 24, 125.0},
            {"Pear", "Switzerland", 6, 130.0},
            {"Tomato", "Switzerland", 50, 122.0},
            {"Tomato", "Italy", 40, 44.0},
            {"Tomato", "Italy", 100, 80.0}
    }
);MOLAP provides a good range of methods to modifying, slicing, joining, and rearranging
            data frames. All of this can be made elegantly, through an indexing mechanism that
            allows stitching data together similarly to what can be made with
                R or the Pandas libraries. We
            won't cover this in this introduction, but most of these methods can be found in the
                DataFrame interface as well its column and row elements that can be
            extracted as Series.  
        
The cube models the data through a multidimensional data model that allows the
            efficient analysis of the data. The cube is automatically instanciated as soon as you use
            the data structure returned by DataFrame.aggregate().
        
A cube at the highest level of abstraction is the apex cuboid. When a query is placed, this the the default cuboid being used.
Aggregation quantity = dataFrame.getSum("Quantity");
Aggregation revenue = dataFrame.getSum("Price").as("Revenue");
AggregateDataFrame<String> apexQuery = dataFrame.aggregate(quantity, revenue);Drill Down/Up allows the user to navigate among levels of data ranging from the most summarized (up) to the most detailed (down).
AggregateDataFrame<String> byItemQuery = dataFrame.aggregate(quantity, revenue)
            .drillDown("Item");Table 1.2. Quantities and revenues by item
| Item | Sum(Quantity) | Revenue | 
|---|---|---|
| Apple | 12.0 | 112.0 | 
| Pear | 30.0 | 255.0 | 
| Tomato | 190.0 | 246.0 | 
Sorting can be defined through one or more aggregation value.
AggregateDataFrame<String> orderQuery = dataFrame.aggregate(quantity, revenue)
            .drillDown("Item").order(revenue);Table 1.3. Quantities and revenues by item ordered by quantity
| Item | Sum(Quantity) | Revenue | 
|---|---|---|
| Apple | 12.0 | 112.0 | 
| Tomato | 190.0 | 246.0 | 
| Pear | 30.0 | 255.0 | 
An existing query can be further refined.
AggregateDataFrame<String> byOriginQuery = apexQuery.drillDown("Origin");Table 1.4. Quantities and revenues by origin
| Origin | Sum(Quantity) | Revenue | 
|---|---|---|
| Switzerland | 92.0 | 489.0 | 
| Italy | 140.0 | 124.0 | 
Drilling can be done on any number of levels, the cube will adapt automatically.
AggregateDataFrame<String> byItemOriginQuery = byItemQuery.drillDown("Origin");Table 1.5. Quantities and revenues by item and origin
| Item | Origin | Sum(Quantity) | Revenue | 
|---|---|---|---|
| Apple | Switzerland | 12.0 | 112.0 | 
| Pear | Switzerland | 30.0 | 255.0 | 
| Tomato | Switzerland | 50.0 | 122.0 | 
| Tomato | Italy | 140.0 | 124.0 | 
A different drilling path should lead to the same results!
AggregateDataFrame<String> byOriginItemQuery = byOriginQuery.drillDown("Item");Table 1.6. Quantities and revenues by origin and item
| Origin | Item | Sum(Quantity) | Revenue | 
|---|---|---|---|
| Switzerland | Apple | 12.0 | 112.0 | 
| Switzerland | Pear | 30.0 | 255.0 | 
| Switzerland | Tomato | 50.0 | 122.0 | 
| Italy | Tomato | 140.0 | 124.0 | 
Pivot allows an analyst to rotate the cube in space to see its various faces.
AggregateDataFrame<String> pivotQuery = byItemOriginQuery.pivot(revenue);
Table 1.7. Revenues by item and origin (pivot)
| Item | Italy | Switzerland | 
|---|---|---|
| Apple | n.a. | 112.0 | 
| Pear | n.a. | 255.0 | 
| Tomato | 124.0 | 122.0 | 
Summary information (e.g. totals) can be added by navigating up the the cuboid hierarchy.
AggregateDataFrame<String> byItemWithTotalQuery = dataFrame.aggregate(quantity, revenue)
            .on(byItemQuery.getCuboid(), byItemQuery.getCuboid().drillUp());Table 1.8. Quantities and revenues by item with totals
| Item | Sum(Quantity) | Revenue | 
|---|---|---|
| Apple | 12.0 | 112.0 | 
| Pear | 30.0 | 255.0 | 
| Tomato | 190.0 | 246.0 | 
| (all) | 232.0 | 613.0 | 
A view of all the possible combinations provides interesting analytic possibilities.
AggregateDataFrame<String> cubeQuery = dataFrame.aggregate(quantity, revenue)
            .on(byItemOriginQuery.getCuboid(),
                    byItemQuery.getCuboid(),
                    byOriginQuery.getCuboid(),
                    apexQuery.getCuboid());Table 1.9. Quantities and revenues by all combinations
| Origin | Item | Sum(Quantity) | Revenue | 
|---|---|---|---|
| Switzerland | Apple | 12.0 | 112.0 | 
| Switzerland | Pear | 30.0 | 255.0 | 
| Switzerland | Tomato | 50.0 | 122.0 | 
| Italy | Tomato | 140.0 | 124.0 | 
| (all) | Apple | 12.0 | 112.0 | 
| (all) | Pear | 30.0 | 255.0 | 
| (all) | Tomato | 190.0 | 246.0 | 
| Switzerland | (all) | 92.0 | 489.0 | 
| Italy | (all) | 140.0 | 124.0 | 
| (all) | (all) | 232.0 | 613.0 | 
The cube allow for computing advanced statistics without having to compute everything at every level.
Series unitPriceColumn = dataFrame.getColumn("Price")
            .divide(dataFrame.getColumn("Quantity"));
Series squareUnitPriceColumn = unitPriceColumn.pow(2.0)
            .multiply(dataFrame.getColumn("Quantity"));
Aggregation meanPrice = dataFrame.getMean("Price");
Aggregation stdDevPrice = dataFrame.getStdDev("Price");
Aggregation meanUnitPrice = revenue.dividedBy(quantity)
            .as("Mean unit price");
Aggregation stdDevUnitPrice = new StdDevAggregation(squareUnitPriceColumn,
            dataFrame.getColumn("Price"), dataFrame.getColumn("Quantity"))
            .as("Standard deviation of unit price");
AggregateDataFrame<String> statQuery = dataFrame
            .aggregate(meanPrice, stdDevPrice, meanUnitPrice, stdDevUnitPrice)
            .drillDown("Item")
            .drillDown("Origin");Table 1.10. Mean and standard deviation of unit price by item
| Item | Origin | Mean(Price) | StdDev(Price) | Mean unit price | Standard deviation of unit price | 
|---|---|---|---|---|---|
| Apple | Switzerland | 112.0 | 0.0 | 9.333333333333334 | 0.0 | 
| Pear | Switzerland | 127.5 | 2.5 | 8.5 | 6.583333333333334 | 
| Tomato | Switzerland | 122.0 | 0.0 | 2.44 | 0.0 | 
| Tomato | Italy | 62.0 | 18.0 | 0.8857142857142857 | 0.1355261854357885 |