OLAP (online analytical processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. For example, a user can request that data be analyzed to display a spreadsheet showing all of a company’s beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas a relational database can be thought of as two-dimensional, a multidimensional database considers each data attribute (such as product, geographic sales region, and time period) as a separate “dimension.” OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them. Attributes such as time periods can be broken down into subattributes.
Online Analytical Processing(OLAP)
- OLAP (Online Analytical Processing) provides the support for the multidimensional view of the data.
- It provides easy and efficient access to the various views of information to the users.
- The complex queries are also processed by using OLAP. It is easy to analyze information by processing multidimensional views of the data.
- The data warehouse is used to analyze the information, where the ample amount of historical data is stored.
OLAP systems have been traditionally categorized using the following taxonomy.
MOLAP (multi-dimensional online analytical processing) is the classic form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database.
Some MOLAP tools require the pre-computation and storage of derived data, such as consolidations – the operation known as processing. Such MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube contains all the possible answers to a given range of questions. As a result, they have a very fast response to queries. On the other hand, updating can take a long time depending on the degree of pre-computation. Pre-computation can also lead to what is known as data explosion.
Other MOLAP tools, particularly those that implement the functional database model do not pre-compute derived data but make all calculations on demand other than those that were previously requested and stored in a cache.
Read Also: Apache Hadoop Tutorial
Advantages of MOLAP
- Fast query performance due to optimized storage, multidimensional indexing and caching.
- Smaller on-disk size of data compared to data stored in relational database due to compression techniques.
- Automated computation of higher level aggregates of the data.
- It is very compact for low dimension data sets.
- Array models provide natural indexing.
- Effective data extraction achieved through the pre-structuring of aggregated data.
Disadvantages of MOLAP
- Within some MOLAP systems the processing step (data load) can be quite lengthy, especially on large data volumes. This is usually remedied by doing only incremental processing, i.e., processing only the data which have changed (usually new data) instead of reprocessing the entire data set.
- Some MOLAP methodologies introduce data redundancy.
Examples of commercial products that use MOLAP are Cognos Powerplay, Oracle Database OLAP Option, MicroStrategy, Microsoft Analysis Services, Essbase, TM1, Jedox, and icCube.
ROLAP works directly with relational databases and does not require pre-computation. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. It depends on a specialized schema design. This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP’s slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a “WHERE” clause in the SQL statement. ROLAP tools do not use pre-calculated data cubes but instead pose the query to the standard relational database and its tables in order to bring back the data required to answer the question. ROLAP tools feature the ability to ask any question because the methodology does not limit to the contents of a cube. ROLAP also has the ability to drill down to the lowest level of detail in the database.
While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database which was designed for OLTP will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. However, since it is a database, a variety of technologies can be used to populate the database.
Read Also: Apache Hadoop Ecosystem
Advantages of ROLAP
- ROLAP is considered to be more scalable in handling large data volumes, especially models with dimensions with very high cardinality (i.e., millions of members).
- With a variety of data loading tools available, and the ability to fine-tune the extract, transform, load (ETL) code to the particular data model, load times are generally much shorter than with the automated MOLAP loads.
- The data are stored in a standard relational database and can be accessed by any SQL reporting tool (the tool does not have to be an OLAP tool).
- ROLAP tools are better at handling non-aggregatable facts (e.g., textual descriptions). MOLAP tools tend to suffer from slow performance when querying these elements.
- By decoupling the data storage from the multi-dimensional model, it is possible to successfully model data that would not otherwise fit into a strict dimensional model.
- The ROLAP approach can leverage database authorization controls such as row-level security, whereby the query results are filtered depending on preset criteria applied, for example, to a given user or group of users (SQL WHERE clause).
- There is a consensus in the industry that ROLAP tools have slower performance than MOLAP tools. However, see the discussion below about ROLAP performance.
- The loading of aggregate tables must be managed by custom ETL code. The ROLAP tools do not help with this task. This means additional development time and more code to support.
- When the step of creating aggregate tables is skipped, the query performance then suffers because the larger detailed tables must be queried. This can be partially remedied by adding additional aggregate tables, however it is still not practical to create aggregate tables for all combinations of dimensions/attributes.
- ROLAP relies on the general purpose database for querying and caching, and therefore several special techniques employed by MOLAP tools are not available (such as special hierarchical indexing). However, modern ROLAP tools take advantage of latest improvements in SQL language such as CUBE and ROLLUP operators, DB2 Cube Views, as well as other SQL OLAP extensions. These SQL improvements can mitigate the benefits of the MOLAP tools.
- Since ROLAP tools rely on SQL for all of the computations, they are not suitable when the model is heavy on calculations which don’t translate well into SQL. Examples of such models include budgeting, allocations, financial reporting and other scenarios.
Multi-dimensional model has two types of tables:
1. Dimension tables: It contains the attributes of dimensions.
2. Fact tables: It consists of the facts or measures.
The following OLAP operations are performed to implement OLAP:
1. Roll up or consolidation
- Multi-dimensional databases have hierarchies with respect to the dimensions.
- Consolidation is rolling up or adding data relationship with respect to one or more dimensions.
- This hierarchy can be as, the total order street<city<State<country.
- The roll up operation aggregates the data by city to country by location hierarchy as shown in the following diagram.
In drill down operation, the view is changed to a greater level of detail.
For example: In the diagram shown below, the drill operations are performed on the upper cube by stepping down a concept hierarchy for time. It can be defined as day<month<quarter<year.
3. Slice Operation
- Slicing is referred to an ability to look at a database from various points of view.
- Slice operation selects one dimension of the given cube and creates a sub-cube, as illustrated in the following diagram.
4. Dice operation
Two or more dimensions are selected from the given cube to produce a sub cube as shown in the following diagram.
5. Pivot / Rotate
Pivot technique is used for visualization of data. It rotates the data axis to give another presentation of the data, as shown in the following schematic.
Read Also: Apache Cassandra Tutorial
Approaches to OLAP
The OLAP systems are categorized using the different techniques mentioned below:
In MOLAP, data is stored in a multidimensional cube(array storage) rather than the relational database.
- ROLAP works with relational databases. The base data and the dimension table are stored as relational tables and new table are created to hold an aggregated information.
- Each action of slicing and dicing operation is equivalent to adding a ‘WHERE’ clause in the SQL statement.
- HOLAP has the combine advantages of MOLAP and ROLAP.
- When detail information is required, the database is divided into relational database and specialized storage.
- The Microsoft SQL server 7.0 OLAP services support a hybrid OLAP server.