Wait a minute – why don’t we have that?

Of course, you look at the other players in your industry, right?

You want to know what they are doing and how they do it, right?
Maybe you could learn a thing or two as well.

Sometimes it’s how not to do things.

Sometimes it’s how to do things better.

I recently came across an example for the later.
Oracles online product documentation has grown and changed a lot over the last 12 years that I have any use for it.
The first non-printed documentation I saw from the guys in Redwood were rather clumsy and had terrible search/index functionality.
Not too surprising, given that this was 1998.

Nowadays you can not only use the very good online documentation website but you can also get notified when they release a revised version.
Yes, that’s right!
Obviously, these people, in fact, are able to revise and correct their product documentation between major release cycles!
Above that, they even offer an RSS feed (check the link I marked in the picture) to have you notified in your preferred feed reader.

This leaves me with just one big question: why don’t we at SAP have that?

MaxDB weekend magic – save 50% total storage space!

People use assumptions to make decisions.

They do this all the time and insofar, developers are people as well 🙂

To be a bit more specific, also database developers make assumptions,
One of them is for example that when you query data then you ask for data that is actually there.
You want to get some data back.
Therefore, index-structures are optimized to answer this kind of question and not the “i-want-to-check-whether-this-really-does-not-exist” kind of query.
Go and try to optimize a SELECT * FROM TABLE WHERE COLUMN != xyz statement!

Another assumption is the following:
Most tables have primary keys that allow the unique identification of every row in the table.
AND (!) this primary key is rather short compared to the whole row size.

In MaxDB, we find these assumptions represented in the way how indexes use primary keys as logical row references.
Given this logical referencing one can observe an interesting effect.

Let’s take SAP standard table WBCROSSI (Index for Includes – Where-Used List Workbench).

On a standard installation this table can take up some space:

--------------------------- ----------------------
          Total Size in KB|      Number of Entries
Entire Table                
                    108168|                 455083
Index WBCROSSI~INC          
                    116832|                  59182
Index WBCROSSI~MAS          
                    139136|                  39533
TOTAL              364136
--------------------------- ----------------------

Now I’ve made a copy of this table and added yet another index.
Check the sizes:

--------------------------- ----------------------
          Total Size in KB|      Number of Entries
--------------------------- ----------------------
Entire Table                
                     41216|                 455083
Index WBCROSSI_LB~INC       
                     10512|                  59182
Index WBCROSSI_LB~MAS       
                      8872|                  39533
Index PK_INDEX              
                    108464|                 455083
TOTAL              169.064
--------------------------- ----------------------

We see a difference of (364136 – 169064 = 195072, thanks calculator.exe!) 195072 KB or 190 MB or nearly 50% savings!!
I added an index and SAVED storage!
And no, I didn’t use some unreleased super efficient compression technology here.

The same effect can easily be observed even with MaxDB 7.5 or earlier versions.

And? Curious now?

Like all good magic, the solution is simple (and a bit boring) once you know it.
So stop reading now, if you want to keep the magic 🙂

Ok, looking back at the initial table and index sizes gives the first hint:
In the original table, all secondary indexes are actually LARGER than the table itself.
Why is that?
Let’s check the table definition to answer this:

------------- ----------------------------------------
Column Name   |Data Type |Code Typ|Len  |Dec  |Keypos
------------- ----------------------------------------
OTYPE         |VARCHAR   |ASCII   |    2|     |    1
NAME          |VARCHAR   |ASCII   |  120|     |    2
INCLUDE       |VARCHAR   |ASCII   |   40|     |    3
MASTER        |VARCHAR   |ASCII   |   40|     |    4
STATE         |VARCHAR   |ASCII   |    1|     |    5
------------- ----------------------------------------

The important part here is the Keypos column.
You may notice, that ALL columns form the primary key of this table!
Although semantically correct and allowed in SQL and the relational model this is a rather seldom situation.

It’s so seldom that it even contradicts one of the mentioned assumptions:
“the primary key is rather short compared to the whole row size.”

With this (also pretty long 2+120+40+40+1 = 203 bytes) primary key the logical referencing is a game of keeping the same data over and over again.
An index entry e.g. for Index WBCROSSI~INC will look like this:

Used: Yes               Access Permitted: Yes      Consistent: Yes
Column Name                     |Type  |Sort
INCLUDE                         |      |ASC
STATE                           |      |ASC
Index key        Primary key

There we have it: since all columns are part of the primary key, we always double store the data for the index keys.

This makes it pretty obvious why the secondary indexes are larger than the table.

But what did I change to save the space?

I dropped the primary key!
In MaxDB a table gets a system generated surrogate primary key if it does not have a defined one.
This generated primary key (hidden column SYSKEY, CHAR(8) BYTE!) is rather small and we don’t have to copy the whole row into every index entry.

But we have to make sure that the primary key constraint features are still provided:
ALL of the columns have to be NOT NULLable and any combination of these columns need to be UNIQUE.

Nothing as easy as this!
I defined a NOT NULL constraint for every column and created a new unique index over all columns.
This is, by the way, the way how the ABAP primary key definition is mapped to the database on Oracle systems all the time!

Column Name                     |Data Type |Code Typ|Len  |Dec  |Keypos
OTYPE                           |VARCHAR   |ASCII   |    2|     |    0
NAME                            |VARCHAR   |ASCII   |  120|     |    0
INCLUDE                         |VARCHAR   |ASCII   |   40|     |    0
MASTER                          |VARCHAR   |ASCII   |   40|     |    0
STATE                           |VARCHAR   |ASCII   |    1|     |    0
Indexes of Table: WBCROSSI_LB

Index Name PK_INDEX
Column Name                     |Type  |Sort
OTYPE                           |UNIQUE|ASC
NAME                            |UNIQUE|ASC
INCLUDE                         |UNIQUE|ASC
MASTER                          |UNIQUE|ASC
STATE                           |UNIQUE|ASC

Column Name                     |Type  |Sort
INCLUDE                         |      |ASC
STATE                           |      |ASC

Column Name                     |Type  |Sort
MASTER                          |      |ASC

By replacing the full row primary key (203 bytes) with the SYKEY (8 bytes) we save enough space in the secondary indexes that even the full table data copy in the new index does not make the size in total much larger.

Before you now go off and look for other tables where this ‘compression‘ could be applied wait a minute.
As nothing comes for free in life, this of course also has its price.

With the new setup, a primary key lookup now may lead to two separate B*tree accesses (primary key index + table).
This will be especially true when the optimizer cannot use the index only optimization (e.g. during joins).

Also, the ABAP dictionary check will complain about this and the transportation of this setup will likely lead to problems.

Hope you enjoyed this piece of weekend magic with MaxDB!