Category Archives: SAP MaxDB

Small changes with big effects

For many BW performance relevant DB-features, it’s really all about the details of usage and implementation when it’s about the effect of those features.

The following are two examples of rather small things that went wrong which had a big impact on the system performance.

Two examples for BW on MaxDB

The first two examples are both good examples for the inherent assumptions that developers make during development.
Number 1 goes like this:

A customer has chosen MaxDB as the database platform for its SAP BW instance.
In addition to that, the customer decided to go for a BIA, which is quite a clever choice, if you ask me.
Instead of having a super-expensive and maintenance intensive BW main database that maybe still would require the setup of a BIA, this customer now runs a low-cost low-maintenance main database and the performance intensive reporting out of the expensive but also low-maintenance BIA.

Unfortunately, it looks like nobody anticipated this combination to become popular.
Otherwise, I assume report RSDDTREX_MEMORY_ESTIMATE would have been tested with MaxDB as well.

This report is used to get an estimation of the required memory for the BIA usage.
It’s not too complicated and merely consists of taking the number of rows in an InfoCube and multiply this with the InfoObjects data lengths and some “magic” constants.
So far nothing special.

What’s “special” is that this report still makes use of the nowadays abandoned fact-views from BW3.x-times.
Fact-views make it possible to access the data in both E- and F-fact table at once, by concatenating the sets with a UNION ALL.
That means, fact-views basically look like this:

CREATE VIEW "/BIC/V..." AS
(
SELECT col1, col2, ...
  FROM "/BIC/F...."
UNION ALL
  SELECT col1, col2, ...
  FROM "/BIC/F...."
)

From the ABAP side, this eases the access since you now just have to run one query to get access to all data in an InfoCube.
Our report does the same and runs this statement:

SELECT
 count(*)
FROM
  "/BIC/VMYCUBE"

The readers with some MaxDB experience might think now:
“That’s great! MaxDB has its file counter statistics and a special COUNT(*) optimization that avoids table/index access for counting!”
And those readers are correct!

Unfortunately the COUNT(*) optimization has a severe limitation: it only works for simple statements.
That means:

  • no WHERE condition (!),
  • no JOINs (!),
  • no UNIONS/SET OPERATIONS (!),
  • no GROUP BY/ORDER BY (!)

In reality, it means NO NOTHING, just the COUNT(*).

The fact-view used here, therefore, couldn’t take advantage of this optimization and had to do the counting via the brute-force-traditional approach: read the whole first table, read the second whole table, combine the results and count the number of rows.

The execution plan for such an IO/CPU burning process looks like this:

OWNER    TABLENAME         STRATEGY                            PAGECOUNT
SAPXXX   /BIC/FMYCUBE      TABLE SCAN                                  1
SAPXXX   /BIC/EMYCUBE      TABLE SCAN                            1194819
INTERNAL TEMPORARY RESULT  TABLE SCAN                                  1
         SHOW                RESULT IS COPIED, COSTVALUE IS     10653812
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

The runtime of this little monster was 3 days and running … until the database couldn’t keep the huge temporary result set of approx. 10 Mio. pages (ca. 76 GB) anymore. The report finally dumped with the infamous

“POS(1) Space for result tables exhausted”

Ouch!

Fortunately, the report already was prepared to handle the request without a fact view, but it wasn’t enabled for MaxDB yet.
This was quickly done after a short discussion with the responsible IMS colleague and correction note
#1533676 – Long runtime of program RSDDTREX_MEMORY_ESTIMATE
was created.

The execution plans afterwards looked like this:

OWNER    TABLENAME         STRATEGY                           PAGECOUNT
SAPXXX   /BIC/FMYCUBE      TABLE SCAN                                  1
                           COUNT OPTIMIZATION
         SHOW                RESULT IS COPIED, COSTVALUE IS            2
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

and

OWNER    TABLENAME         STRATEGY                           PAGECOUNT
SAPXXX   /BIC/EMYCUBE      TABLE SCAN                            1194819
                           COUNT OPTIMIZATION
         SHOW                RESULT IS COPIED, COSTVALUE IS            2
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

And the total runtime of the report went down to a few hours (there is other stuff in there that just takes some time).

(Remark: important to understand for MaxDB execution plans is that only the COSTVALUE represents an optimizer estimation. All other PAGECOUNT values refer to the TOTAL number of pages the table or index of this specific line allocates in the database!)

If you look at the sap note with the correction, you’ll find that it was a very small change that made the difference:

From this:

     IF sy-dbsys <> 'DB400'.
       APPEND g_v_tablnm_v TO g_t_tablnm.
     ELSE.
       APPEND g_v_tablnm_e TO g_t_tablnm.
       APPEND g_v_tablnm_f TO g_t_tablnm.

to this:

IF sy-dbsys = 'DB400' OR sy-dbsys = 'ADABAS D'.
       APPEND g_v_tablnm_e TO g_t_tablnm.
       APPEND g_v_tablnm_f TO g_t_tablnm.
     ELSE.
       APPEND g_v_tablnm_v TO g_t_tablnm.

Knock, knock, any data in there?

The second example is not only ‘special’ on the MaxDB port but on all databases.
However, for MaxDB, the effect was the worst, due to certain limitations of SQL optimization.

SAP BW is a data warehouse and therefore a lot of the functionality is there to handle data, to store and move data and to get rid of data.
These tasks bring with them the necessity to sometimes drop a table and rebuild it, e.g. when you change an InfoObject-definition.

But before merely dropping tables, BW is cautious and asks“Hey, any data in this table?”.
And indeed, there is a function module called RSDU_DATA_EXISTS_TABLE that answers this question.

Now, before proceeding, ask yourself: how would YOU try to answer this question in SQL?
A common first approach would be: count the number of rows in the table and if it’s larger then 0 then there is some data in the table.
Correct!
But given the fact that counting the actual number of rows in a table really can take ages (see the example above), this is the second worst idea to approach the issue (and I admit that it was also the first I thought up).

The worst idea I’ve seen so far is what was actually implemented in the function module:

SELECT bname FROM usr01 CLIENT SPECIFIED UP TO 1 ROWS INTO :test
     WHERE EXISTS ( SELECT * FROM (i_tablnm) CLIENT SPECIFIED ).
  ENDSELECT.

Let’s see if we can figure out, what this statement should do.
In English it means:

  • Give me the column BNAME
  • from the table USR01 for at most one row
  • for which the set of all rows in table I_TABLNM (this is the one we want to know whether it’s empty or not) contains something.

This is just amazing!

As you can imagine, MaxDB will first create a temporary result set for the exists clause (that is full table copy) and then returns just one row.
If the I_TABLNM table is not empty, this can easily become a similar problem as the example above.

Now, of course, there is a much better way to do this.
If you think about it, all we want is a YES (there’s data in there) or a NO (nope, all empty) and this can be done as well as SAP note #1542839 – “Performance Optimization in RSDU_DATA_EXISTS_TABLE_ADA” nicely demonstrates:

SELECT 'X' FROM (i_tablnm) WHERE ROWNUM <= 1

This means: “Database, go and get me an X for the first row that you hit on the table and stop afterwards!”
Regardless how you process this statement, in the worst case it will end after a few (1-4) page visits.
The database may even use an index-only strategy since NO data from the table needs to be fetched (just a constant).

There are of course similar examples for other DBMS as well, but for the sake of a digestible blog post size, I’ll save them for later posts.

MaxDB weekend magic – save 50% total storage space!

People use assumptions to make decisions.

They do this all the time and insofar, developers are people as well 🙂

To be a bit more specific, also database developers make assumptions,
One of them is for example that when you query data then you ask for data that is actually there.
You want to get some data back.
Therefore, index-structures are optimized to answer this kind of question and not the “i-want-to-check-whether-this-really-does-not-exist” kind of query.
Go and try to optimize a SELECT * FROM TABLE WHERE COLUMN != xyz statement!

Another assumption is the following:
Most tables have primary keys that allow the unique identification of every row in the table.
AND (!) this primary key is rather short compared to the whole row size.

In MaxDB, we find these assumptions represented in the way how indexes use primary keys as logical row references.
Given this logical referencing one can observe an interesting effect.

Let’s take SAP standard table WBCROSSI (Index for Includes – Where-Used List Workbench).

On a standard installation this table can take up some space:

--------------------------- ----------------------
          Total Size in KB|      Number of Entries
Entire Table                
                    108168|                 455083
Index WBCROSSI~INC          
                    116832|                  59182
Index WBCROSSI~MAS          
                    139136|                  39533
TOTAL              364136
--------------------------- ----------------------

Now I’ve made a copy of this table and added yet another index.
Check the sizes:

--------------------------- ----------------------
          Total Size in KB|      Number of Entries
--------------------------- ----------------------
Entire Table                
                     41216|                 455083
Index WBCROSSI_LB~INC       
                     10512|                  59182
Index WBCROSSI_LB~MAS       
                      8872|                  39533
Index PK_INDEX              
                    108464|                 455083
                    
TOTAL              169.064
--------------------------- ----------------------

WOW!
We see a difference of (364136 – 169064 = 195072, thanks calculator.exe!) 195072 KB or 190 MB or nearly 50% savings!!
I added an index and SAVED storage!
And no, I didn’t use some unreleased super efficient compression technology here.

The same effect can easily be observed even with MaxDB 7.5 or earlier versions.

And? Curious now?

Like all good magic, the solution is simple (and a bit boring) once you know it.
So stop reading now, if you want to keep the magic 🙂

Ok, looking back at the initial table and index sizes gives the first hint:
In the original table, all secondary indexes are actually LARGER than the table itself.
Why is that?
Let’s check the table definition to answer this:

------------- ----------------------------------------
Column Name   |Data Type |Code Typ|Len  |Dec  |Keypos
------------- ----------------------------------------
OTYPE         |VARCHAR   |ASCII   |    2|     |    1
NAME          |VARCHAR   |ASCII   |  120|     |    2
INCLUDE       |VARCHAR   |ASCII   |   40|     |    3
MASTER        |VARCHAR   |ASCII   |   40|     |    4
STATE         |VARCHAR   |ASCII   |    1|     |    5
------------- ----------------------------------------

The important part here is the Keypos column.
You may notice, that ALL columns form the primary key of this table!
Although semantically correct and allowed in SQL and the relational model this is a rather seldom situation.

It’s so seldom that it even contradicts one of the mentioned assumptions:
“the primary key is rather short compared to the whole row size.”

With this (also pretty long 2+120+40+40+1 = 203 bytes) primary key the logical referencing is a game of keeping the same data over and over again.
An index entry e.g. for Index WBCROSSI~INC will look like this:

Index Name WBCROSSI~INC
Used: Yes               Access Permitted: Yes      Consistent: Yes
------------------------------------------------------------------------
Column Name                     |Type  |Sort
------------------------------------------------------------------------
INCLUDE                         |      |ASC
STATE                           |      |ASC
------------------------------------------------------------------------ 
Index key        Primary key
INCLUDE/STATE -> [OTYPE/NAME/INCLUDE/MASTER/STATE]

There we have it: since all columns are part of the primary key, we always double store the data for the index keys.

This makes it pretty obvious why the secondary indexes are larger than the table.

But what did I change to save the space?

I dropped the primary key!
In MaxDB a table gets a system generated surrogate primary key if it does not have a defined one.
This generated primary key (hidden column SYSKEY, CHAR(8) BYTE!) is rather small and we don’t have to copy the whole row into every index entry.

But we have to make sure that the primary key constraint features are still provided:
ALL of the columns have to be NOT NULLable and any combination of these columns need to be UNIQUE.

Nothing as easy as this!
I defined a NOT NULL constraint for every column and created a new unique index over all columns.
This is, by the way, the way how the ABAP primary key definition is mapped to the database on Oracle systems all the time!

WBCROSSI_LB
Column Name                     |Data Type |Code Typ|Len  |Dec  |Keypos
------------------------------------------------------------------------
OTYPE                           |VARCHAR   |ASCII   |    2|     |    0
NAME                            |VARCHAR   |ASCII   |  120|     |    0
INCLUDE                         |VARCHAR   |ASCII   |   40|     |    0
MASTER                          |VARCHAR   |ASCII   |   40|     |    0
STATE                           |VARCHAR   |ASCII   |    1|     |    0
------------------------------------------------------------------------
Indexes of Table: WBCROSSI_LB
------------------------------------------------------------------------

Index Name PK_INDEX
Column Name                     |Type  |Sort
------------------------------------------------------------------------
OTYPE                           |UNIQUE|ASC
NAME                            |UNIQUE|ASC
INCLUDE                         |UNIQUE|ASC
MASTER                          |UNIQUE|ASC
STATE                           |UNIQUE|ASC
------------------------------------------------------------------------

Index Name WBCROSSI_LB~INC
Column Name                     |Type  |Sort
------------------------------------------------------------------------
INCLUDE                         |      |ASC
STATE                           |      |ASC
------------------------------------------------------------------------

Index Name WBCROSSI_LB~MAS
Column Name                     |Type  |Sort
------------------------------------------------------------------------
MASTER                          |      |ASC
------------------------------------------------------------------------

By replacing the full row primary key (203 bytes) with the SYKEY (8 bytes) we save enough space in the secondary indexes that even the full table data copy in the new index does not make the size in total much larger.

Before you now go off and look for other tables where this ‘compression‘ could be applied wait a minute.
As nothing comes for free in life, this of course also has its price.

With the new setup, a primary key lookup now may lead to two separate B*tree accesses (primary key index + table).
This will be especially true when the optimizer cannot use the index only optimization (e.g. during joins).

Also, the ABAP dictionary check will complain about this and the transportation of this setup will likely lead to problems.

Hope you enjoyed this piece of weekend magic with MaxDB!

Cheers,
Lars

MaxDB Optimizer Statistics handling without NetWeaver

Ok, MaxDB is most often used with a NetWeaver on top of it, so this blog is about a niche topic of a niche product.
Wow – that should be enough understatement and un-buzzing for now.

The question of how and when to collect new optimizer statistics pops up every now and then.
Most people accept that a cost-model-based query optimizer depends on statistics that fit the data and storage usage of the tables and indexes involved in a query to come up with the best possible query execution plan.
But how can we know when the statistics are not fitting good enough anymore?

The reactive strategy would be to monitor the execution runtimes of every query in the system, waiting for runtime increases to show up and then check whether the execution plan had changed compared to the times the query ran quick enough.

Obviously, this strategy is pretty labour and time intensive.

Another way would be to say: “ok, maybe I’m doing some update statistics too often, but at least this does not make execution plans worse”.

This approach (and yes, sometimes the execution plan can become worse, but that’s a different story) is the one employed by recommendations like “Update statistics at least once a week”.
One improvement to this approach is to carefully choose the tables for which new statistics should be collected.
A possibly reasonable criteria for that is the change of data volume since the last update of statistics.
That means we need to compare the current size of a table (in pages) against the size it had when the statistics were last collected.

Fortunately, MaxDB provides two system tables containing this information:

1. SYSINFO.FILES
Shows the current size of the table.
This is even true for uncommitted transactions.
So if you load data into your table in session A you’ll be able to monitor the table growth via session B even before session A commits.

2. SYSDBA.OPTIMIZERSTATISTICS
Contains the stored optimizer statistics.

Doing this comparison for all tables in your database manually would be a bunch of monkey work, so MaxDB development decided to deliver a built-in monkey in form of a stored procedure:
SUPERDBA.SYSCHECKSTATISTICS (IN CHANGETHRESHOLD INTEGER)

This procedure does the comparison for us.
Via the CHANGETRESHOLD parameter, we can specify the percentage of data volume change that should lead to new statistics.

The procedure then loops over all tables of the current user and the ‘SYSDBA’ schema and performs the check.
Once a table qualifies for new statistics (another reason may be that a table does not have any optimizer statistics at all) the table name is denoted into a system table:
SYSDBA.SYSUPDSTATWANTED

If you’re familiar with the automatic statistics update feature of MaxDB than this table is already known to you.
It’s the same table where the MaxDB Kernel puts table names in when it realizes during a join, that the optimizer statistics were wrong and more data than expected.

Anyhow, apart from the automatic statistics update, there is a command for manual processing of the denoted tables present:
UPDATE STATISTICS AS PER SYSTEM TABLE

This command will read the entries from SYSUPDSTATSWANTED and run a parallelized non-blocking update statistics without sampling.

You may, of course, choose to use the sampling size stored for each table in the database catalogue via
UPDATE STATISTICS AS PER SYSTEM TABLE ESTIMATE
but this will lead to table locks, so it’s not exactly what we want to see in production systems.

Once the statistics collection is finished, you can check the result in the table
SYSDBA.SYSUPDSTATLOG

SELECT * FROM  SYSDBA.SYSUPDSTATLOG
\

\
SCHEMANAME|TABLENAME|TABLEID          |COLUMNNAME|INDEXNAME|EXECUTED_AT        |IMPLICIT|SAMPLE_PCT|SAMPLE_ROW|EXECUTION_START    |EXECUTION_END      |SESSION|TERMID            |SEQNO|
\
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
\
LARS      |T1       |000000000001A560 |          |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:35|2010-05-27 13:19:35|14692  |7200@VIENXXXXXXXXA|0    |
\
LARS      |T1       |000000000001A560 |N1        |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:35|2010-05-27 13:19:35|14692  |7200@VIENXXXXXXXXA|1    |
\
LARS      |T1       |000000000001A560 |N2        |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:35|2010-05-27 13:19:35|14692  |7200@VIENXXXXXXXXA|2    |
\
LARS      |T2       |000000000001A561 |          |         |2010-05-10 15:05:36|NO      |100       |0         |2010-05-10 15:05:36|2010-05-10 15:05:36|14299  |3532@VIENXXXXXXXXA|0    |
\
LARS      |T2       |000000000001A561 |N1        |         |2010-05-10 15:05:36|NO      |100       |0         |2010-05-10 15:05:36|2010-05-10 15:05:36|14299  |3532@VIENXXXXXXXXA|1    |
\
LARS      |ZTEST6   |000000000001A52E |          |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:23|2010-05-27 13:19:32|14686  |7200@VIENXXXXXXXXA|0    |
\
LARS      |ZTEST6   |000000000001A52E |MANDT     |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:23|2010-05-27 13:19:32|14686  |7200@VIENXXXXXXXXA|1    |
\
LARS      |ZTEST6   |000000000001A52E |OTHID     |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:23|2010-05-27 13:19:32|14686  |7200@VIENXXXXXXXXA|3    |
\
LARS      |ZTEST6   |000000000001A52E |UNIID     |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:23|2010-05-27 13:19:32|14686  |7200@VIENXXXXXXXXA|2    |
\
...
\

\

As we saw above, this procedure depends on having the SYSINFO.FILES information at hand.
Unfortunately, for databases that had been upgraded from an SAP DB/MaxDB version <=7.5 this information might not yet be available.

For which tables this information is missing, you can figure out by checking table
SYSDBA.SYSUPDATECOUNTERWANTED

As long as the file counters are not present, the SYSCHECKSTATISTICS procedure consults the SYSDBM.ESTIMATED_PAGES table to get an estimation of the current table size.
This might take much longer and would not deliver precise results, but rather an estimation on the total table size.

Summing this up:

Given a MaxDB =>7.6 at a recent patch level you can easily implement a statistics maintenance strategy by running these two commands, say once a week:

–> as your SQL Schema owner:
CALL SYSCHECKSTATISTICS (40)

–> as the SYSDBA (SUPERDBA) of the database:
UPDATE STATISTICS AS PER SYSTEM TABLE

So there would be two commands to be scheduled:

dbmcli -U DB770W -USQL DB770LARS sql_execute “call SUPERDBA.SYSCHECKSTATISTICS (40)”
and
dbmcli -U DB770W -USQL DB770W sql_execute “UPDATE STATISTICS AS PER SYSTEM TABLE”

Note: DB770W is my XUSER entry for SUPERDBA and DB770LARS is for my SQL-User.
Make sure to remember that XUSER entries are case sensitive!
db770w would not work here!

I hope you like this blog and maybe this technique can be an alternative to the one shown in SDN thread doubt about UPDATE STAT COLUMN  .http://forums.sdn.sap.com/thread.jspa?threadID=1668683&tstart=0

MaxDB ST05-trace fallacy – when sometimes the trace is wrong…

One of the most important analytical tools used to investigate slow running SQL statements is the well-known ST05 – SQL trace.

The idea of it is that the ABAP database interface notes down what SQL statement it sends to the database, how long it took to get the result, what selection criteria were used and so on.

Obviously, a key point here is that the trace contains exactly the SQL that was actually sent to the database.

Recently I came across a speciality that must be considered when using the ST05 trace results on MaxDB databases. Otherwise one will end up with totally wrong execution paths and thus at wrong conclusions for how to improve performance.

Let’s look at the following SQL Explain plan:

We see that the estimated costs are quite high, but this is not what I want to point out today here. Instead, keep an eye on the execution path – it’s a RANGE CONDITION FOR KEY that uses four key columns (MANDT, BZOBJ, KALNR, KALKA).

Next, also a common step in performance analysis, we take the statement, have the ST05 fill in the ?’s with the real variable values and use the ST05 or the SQL Studio to explain the statement again (and modify it eventually):

Now we still see a RANGE CONDITION FOR KEY, but only two columns are used, making the plan even less efficient.
Now – WHAT IS THE REASON FOR THIS?

The values had been taken directly from the ST05 trace. Let’s double check this:

Sure enough, the bind values are there.
Notable, however, is the fact that the values for KALNR and KADKY (now missing in the used KEY COLUMN list) are NUMBER type values.

The leading zeroes in A2/A3 and the date-like information in A5/A6 might give an indication that this might not be totally correct.
Let’s check the table column definition:

Surprisingly we find both columns to be CHARACTER types.
Well, not exactly surprisingly – many number-datatypes from the ABAP world are mapped to character columns.

For the application and the database interface this is rather transparent, but for the database query optimizer, this is an issue as it cannot use non-matching datatypes for KEY or INDEX accesses. If we want to run or explain the statement manually, we have to take care of this.

Therefore, we have to enclose the numbers into apostrophes to mark them as character strings:

And here we are, back with the original execution plan.

Now the real performance analysis can begin!

P.S. although the KADKY column is not used for the execution plan it’s still important to get the data type correctly – otherwise the optimizer has no chance to estimate the selectivity of the conditions correctly.

MaxDB Event Dispatcher Intro

Infrastructure software like RDBMS often tends to become feature-rich in many directions.
MaxDB is no exception to this, so by reading the documentation there’s a pretty good chance to dig out some features that are rarely seen or used.

One example for this is the MaxDB database event dispatcher.
It has been around for quite a while now but hadn’t been used in the NetWeaver scenario.
It has got no frontend and the documentation for it is – let’s say it is a bit “skinny” …

Anyhow, it’s still a piece of MaxDB software that is available on all installations starting with 7.6.

Let’s see how it works in a few easy steps!

Events – what are they?

The first thing to learn is obvious: what is meant by “database event”?
For MaxDB these are certain, predefined (a.k.a. you cannot change them yourself, they are hard-wired!) runtime situations of a database instance.
For example, the startup of a database instance would be such an event.
Or the completion of a log segment. Or the successful creation of a
backup.

There’s a bunch of those events defined in the MaxDB kernel.
Once the situation occurs, the MaxDB kernel basically puts a message about this event to a message queue.

To get a list of what events are available, simply run ‘event_list’ in DBMCLI:

dbmcli on db760>event_list
OK
Name                Priority Value Description
DBFILLINGABOVELIMIT LOW      70    Filling level of the data area exceeds the given percentage
DBFILLINGABOVELIMIT MEDIUM   80    Filling level of the data area exceeds the given percentage
DBFILLINGABOVELIMIT MEDIUM   85    Filling level of the data area exceeds the given percentage
DBFILLINGABOVELIMIT HIGH     90    Filling level of the data area exceeds the given percentage
DBFILLINGABOVELIMIT HIGH     95    Filling level of the data area exceeds the given percentage
DBFILLINGABOVELIMIT HIGH     96    Filling level of the data area exceeds the given percentage
DBFILLINGABOVELIMIT HIGH     97    Filling level of the data area exceeds the given percentage
DBFILLINGABOVELIMIT HIGH     98    Filling level of the data area exceeds the given percentage
DBFILLINGABOVELIMIT HIGH     99    Filling level of the data area exceeds the given percentage
DBFILLINGBELOWLIMIT LOW      70    Filling level of the data area has fallen short of the given percentage
DBFILLINGBELOWLIMIT LOW      80    Filling level of the data area has fallen short of the given percentage
DBFILLINGBELOWLIMIT LOW      85    Filling level of the data area has fallen short of the given percentage
DBFILLINGBELOWLIMIT LOW      90    Filling level of the data area has fallen short of the given percentage
DBFILLINGBELOWLIMIT LOW      95    Filling level of the data area has fallen short of the given percentage
LOGABOVELIMIT       LOW      50    Filling of the log area exceeds the given percentage
LOGABOVELIMIT       HIGH     66    Filling of the log area exceeds the given percentage
LOGABOVELIMIT       LOW      75    Filling of the log area exceeds the given percentage
LOGABOVELIMIT       MEDIUM   90    Filling of the log area exceeds the given percentage
LOGABOVELIMIT       HIGH     94    Filling of the log area exceeds the given percentage
LOGABOVELIMIT       MEDIUM   95    Filling of the log area exceeds the given percentage
LOGABOVELIMIT       HIGH     96    Filling of the log area exceeds the given percentage
LOGABOVELIMIT       HIGH     97    Filling of the log area exceeds the given percentage
LOGABOVELIMIT       HIGH     98    Filling of the log area exceeds the given percentage
LOGABOVELIMIT       HIGH     99    Filling of the log area exceeds the given percentage
AUTOSAVE            LOW            The state of the automatic log backup process has changed.
BACKUPRESULT        LOW            THIS FEATURE IS NOT YET IMPLEMENTED.
CHECKDATA           LOW            The event CHECKDATA is always transmitted when the database check using CHECK DATA or CHECK DATA WITH UPDATE is completed.
EVENT               LOW            An event was switched on or off
ADMIN               LOW            Operational state was changed to ADMIN
ONLINE              LOW            Operational state was changed to ONLINE
UPDSTATWANTED       LOW            At least one table needs new optimizer statistics
OUTOFSESSIONS       HIGH           Maximum number of parallel sessions is running
ERROR               HIGH           A error occurred which has been written to database diagnostic message file.
SYSTEMERROR         HIGH           A severe system error occured, see knldiag.err
DATABASEFULL        LOW            The event DATABASEFULL is transmitted at regular intervals when the data area is filled to 100 percent.
LOGFULL             LOW            The log area is full and has to be saved.
LOGSEGMENTFULL      LOW            One log segment is full and can be saved
STANDBY             LOW            Operational state was changed to STANDBY

With the command ‘event_list_categories’ a description of the events can be displayed, e.g.:

[...]
AUTOSAVE

    AUTOSAVE events give information about the state of the automatic log
    backup and are triggered by changes of this state.

    The events of category AUTOSAVE are active by default.

    An actual event of category AUTOSAVE contains usable information within the
    following data fields:

    PRIORITY:
        This data field contains the priority of the event. The following
        value can occur:
            LOW

    VALUE1:
        This data field contains the reason that triggered the event. The
        following values can occur:
            0, The automatic log backup task has been started.
            1, The automatic log backup task has been stopped.
            2, Automatic log backup has been enabled.
            3, Automatic log backup has been disabled.
            4, A log backup was successfully finished.

    TEXT:
        If data field VALUE1 has the value 1 or 4, data field TEXT contains the file
        name of the log backup medium that is used by the automatic log backup.
        Otherwise data field TEXT contains no information.
[...]

ATTENTION: the names and parameters of events changed between version 7.6 and 7.7 – so be sure to check the current event names for the MaxDB release you are using!

Now there needs to be somebody taking the event-messages (you can also call them notifications) out of the queue and react to them.
That’s what the event dispatcher is for.

The event dispatcher

With MaxDB 7.6 the event dispatcher is a separate executable that needs to be started via command line. In versions >= 7.7 this event dispatcher has been built-in to the DBMServer.

To allow the event dispatcher to react to events, the reaction has to be defined by the user.

This configuration is also done via the event dispatcher executable (7.6) or the DBMServer-client program DBMCLI (=>7.7).
The executable can be found in the version dependent path:
/sapdb//db/bin/dbmevtdisp.exe

Just calling this executable produces a short usage list:

add <cfgFile> Name == <value> [Priority == (LOW|MEDIUM|HIGH)] [Value1 (==|>=|<=|>|<) <value>] [Value2 (==|>=|<=|>|<) <value>] Command == <command>

delete <entryID> <cfgFile>

list <cfgFile>

start [-remoteaccess] <cfgFile> -l <logFile> -d <dbName> (-u <user,pwd>|-U <userkey>) [-n <node> [-e SSL]]

state -d <dbName> (-u <user,pwd>|-U <userkey>) [-n <node> [-e SSL]]

stop <instanceID> -d <dbName> (-u <user,pwd>|-U <userkey>) [-n <node> [-e SSL]]

version

With MaxDB >=7.7 the same set of commands is available via DBMCLI:

dbmcli on db770>help event
OK
event_available
event_create_testevent
event_delete             <event_category> [<value>]
event_dispatcher         ADD NAME == <event_name> [PRIORITY == <priority>]
                         [VALUE1 (==|>=|<=|>|<) <value1>] [VALUE2
                         (==|>=|<=|>|<) <value2>] COMMAND == <command> |
                         DELETE <entry_ID> |
                         SHOW |
                         ON |
                         OFF
event_list
event_list_categories    [<event_category>]
event_receive
event_release
event_set                <event_category> LOW|MEDIUM|HIGH [<value>]
event_wait

---
dbmcli on db770>

Defining a reaction to an event

Now let’s create a event reaction that simply writes out a message to a log file when the event occurs.
This information is stored in a configuration file that will be created with the first use of ‘dbmevtdisp.exe’.
To keep things easy, it’s best to store it in the RUNDIRECTORY of the database instance, where all the other configuration and log files are stored anyhow.
In this example this would be “C:\sapdb\data\wrk\DB760” and we’ll call the file just ‘evtdisp.cfg’.

Let’s say there should be an entry to the log file whenever an AUTOSAVE backup was successfully taken.
This is covered by the event “AUTOSAVE” with VALUE1 =”4″ (these VALUEx information are simple additional information about the event).

dbmevtdisp add C:\sapdb\data\wrk\DB760\evtdisp.cfg
           Name == "AUTOSAVE"
           Value1 == 4
           Command == "C:\\Windows\\System32\\cmd.exe \/q \/c C:\\sapdb\\data\\wrk\\DB760\\myeventscript.cmd $EVTTEXT$"

The whole command must be entered in one line (I inserted the line breaks for readability here) and it’s important to have spaces around the double equal signs (==)!
For the COMMAND part, it’s also necessary to escape slash characters (/ and ) with a backslash ().
That’s the reason for the double backslashes in the example!
Also, make sure that the ‘add’ command is written in lower case.

The command used here should be just a shell script (Windows). To run this, we need to call the shell (CMD.EXE) first and provide the necessary flags /q (= quiet shell action) and /c (=> run the command and exist the shell afterwards).

As a parameter to the script certain event dispatcher runtime variables can be used.
$EVTTEXT$, for example, contains the full path and filename of the log backup that had been created with AUTOSAVE.
A complete list of these variables can be found in the documentation (http://maxdb.sap.com/doc/7_6/9d/0d754252404559e10000000a114b1d/content.htm)

So basically we add an event reaction into the configuration file of our choice for the successful completion of the AUTOSAVE log backup and call a script ‘myeventscript.cmd’ and hand over the logbackup filename as a parameter.

To state that this command syntax is a bit awkward would be fully acknowledged by the author.

What’s missing now is, of course, the script file.
Let’s make it a simple one like this

echo %1 >> C:\sapdb\data\wrk\DB760\myeventscript.log

Start the event dispatcher

Having this in place all we need to do now is to start the event dispatcher:

c:\sapdb\db760\db\bin>dbmevtdisp start C:\sapdb\data\wrk\DB760\evtdisp.cfg -l C:\sapdb\data\wrk\DB760\evtdisp.log -d db760 -U db760ED
Event Dispatcher instance 0 running
using configuration file C:\sapdb\data\wrk\DB760\evtdisp.cfg
event with name DISPINFO:DISPSTART not dispatched (count 0)

Note that I’ve used pre-configured XUSER data (key db760ED) for this, so that I don’t have to specify the logon credentials here.
Anyhow, the connect user can either be the CONTROL or the SUPERDBA user.

Also, with the -l parameter I specified a log file for the event dispatcher where it will keep track of its actions.

… and stop it again

The event dispatcher will now keep the shell open and print out status messages.
Stopping it is NOT possible via CTRL+C, but instead the same executable must be used to send a stop command:

c:\sapdb\db760\db\bin>dbmevtdisp stop 0 -d db760 -U db760ED
OK

Note that it’s necessary to provide the correct event dispatcher instance number (0 in this case) to stop the event dispatcher.
It’s possible to have multiple event dispatchers attached to one MaxDB instance – but let’s keep things simple for now!

Test the dispatcher

So, restart the dispatcher and create some events!

c:\sapdb\db760\db\bin>dbmevtdisp start C:\sapdb\data\wrk\DB760\evtdisp.cfg -l C:\sapdb\data\wrk\DB760\evtdisp.log -d db760 -U db760ED
Event Dispatcher instance 0 running
using configuration file C:\sapdb\data\wrk\DB760\evtdisp.cfg

To trigger some AUTOSAVE events I’m simply using the ‘load_tutorial’ command.
Pretty soon there will be messages like the following in the event dispatcher shell:

[...]
event with name AUTOSAVE not dispatched (count 3)
Event with name AUTOSAVE dispatched (count 4)
event with name AUTOSAVE not dispatched (count 5)
event with name LOGSEGMENTFULL not dispatched (count 6)
event with name AUTOSAVE not dispatched (count 7)
Event with name AUTOSAVE dispatched (count 8)
event with name AUTOSAVE not dispatched (count 9)
Event with name AUTOSAVE dispatched (count 10)
event with name AUTOSAVE not dispatched (count 11)
Event with name AUTOSAVE dispatched (count 12)
event with name AUTOSAVE not dispatched (count 13)
[...]

We see that there are some AUTOSAVE events that are dispatched (these are the ones we created our event reaction for) and some are not dispatched.
The latter are the events that are triggered when the AUTOSAVE action is started (Value1 == 1).

So this is completely OK.

Let’s check the content of the script logfile myeventscript.log:

C:\sapdb\backup\db760log.822
C:\sapdb\backup\db760log.823
C:\sapdb\backup\db760log.824
C:\sapdb\backup\db760log.825
C:\sapdb\backup\db760log.826
C:\sapdb\backup\db760log.827
C:\sapdb\backup\db760log.828
C:\sapdb\backup\db760log.829
C:\sapdb\backup\db760log.830
C:\sapdb\backup\db760log.831
C:\sapdb\backup\db760log.833
C:\sapdb\backup\db760log.834
C:\sapdb\backup\db760log.835
C:\sapdb\backup\db760log.836
C:\sapdb\backup\db760log.837
C:\sapdb\backup\db760log.838
[...]

Well done … !?

As we this this worked pretty well.
You can, of course, makeup more complicated scripts.
E.g. the documentation for MaxDB 7.7 has an example where log files are copied to a different location.
However, it’s NOT advisable to use the event dispatcher for critical database maintenance tasks (like backups).
There is no automatic monitoring for the dispatcher functionality and it’s rather seldom used until now.
For lightweight monitoring or notification tasks, it may nevertheless be a nice feature.

Since this example for MaxDB 7.6 already was quite complex (with many odd details) I leave out the 7.7 implementations for the next blog.

MaxDB – On locking mechsanisms and how we learn something new each day… Part II

As I promised here’s the second part of my short excurse into the shoals of lock management with MaxDB databases. (The first part can be found MaxDB – On locking mechanisms and how we get to know our MaxDB a bit better each day… Part I)

2) When deadlocks go unnoticed…

Ok, big buzz word in the title – I’m sure that just the word ‘deadlock’ will get me the audience this time 😉

Before we start with the actual point, let’s get clear about what is meant by ‘deadlock’.
Among the huge variety of possible locking and hanging situations parallel working systems can get into, deadlocks are very specific.
The point of deadlocks is not that the system is hanging for a long time, but that it is impossible for the processes involved in the deadlock to resolve it by themselves.

Since locking situations can be thought as (mathematical) graphs. A deadlock can be defined as a closed circular graph with the minimum number of vertices.
The simplest example would look like this:

Process A  [PA]                Process B  [PB]
 Resource A [RA]                Resource B [RB]
 
    ([PA],[RA]) <--- PB ---
                lock request
                 --- PA ---> ([PB],[RB])

In this case, PA and PB need to wait for each other for the release of the requested resource. But since they both wait, no process can actually release a lock – this is a deadlock.

Of course, deadlocks can be way more complex, including many resources, more processes and sometimes even multiple application layers (these are really nasty since usually there is no coherent view to these cross-layer locks).

One advantage of this rather abstract view to deadlocks is that this makes it easier to recognize them.
That is what’s behind the deadlock detection feature of current DBMS.

Whenever a process needs to wait for a resource for a long time (say 1 second or so), the DBMS looks out for such a deadlock graph and eventually ‘resolves’ the situation by telling one of the waiting processes that it won’t get the lock.

The general idea behind the feature is of course not to prevent deadlocks.
Deadlocks are usually design-errors, bugs of the application program. This cannot be fixed automatically.

However, it is important for heavy-duty databases to keep running as long as possible.

To make this possible, the deadlock detection and resolution helps a great deal.
Once a deadlock is removed, the whole system can continue it’s work, while only one transaction gets an error.

So far the story is rather nice, isn’t it?

The DBMS checks for deadlocks and makes sure that the system will stay responsive even if the application designers made a mistake.

Unfortunately, nothing is perfect – and so isn’t the deadlock detection in MaxDB.
As you may know (or learn now) MaxDB knows different kinds of SQL locks:

  • Table locks
  • Row locks
  • Dictionary/Catalog locks

As long as the deadlock is just between table/row-locks everything works just as expected:

#### Session 1 (Isolation level 1, Autocommit off)
select * from locka
COLA  COLB
1     X
2     X
select * from lockb
COLA  COLB
1     Y
2     Y
update lockb set colb='YX' where cola=1
#### Session 2 (Isolation level 1, Autocommit off)
update locka set colb='XY' where cola=1

 

#### Monitoring session
select session, tablename, lockmode, lockstate, rowidhex from locks
SESSION  TABLENAME  LOCKMODE       LOCKSTATE  ROWIDHEX
8459     LOCKB      row_exclusive  write      00C1100000000...
8460     LOCKA      row_exclusive  write      00C1100000000...
Nothing special up to here – let’s create a deadlock:
#### Session 1
 update locka set cola='XY' where cola=1
#### Session 2
update lockb set colb='YX' where cola=1
Auto Commit: Off, SQL Mode: Internal, Isolation Level: Committed
 General error;600 POS(1) Work rolled back
update lockb set colb='YX' where cola=1

*** corrected the update statements 20.10.09 22:02 ***

As we see the crosswise row lock request (for the update an exclusive lock is required) is recognized and one session is rolled back.

Now let’s do this again, but let’s use shared (catalogue) locks as well…

#### Session 1
update lockb set colb='YX' where cola=1
#### Session 2
 update locka set colb='XY' where cola=1
#### Session 1
alter table locka add (colc varchar(10))
--> hangs !
#### Monitoring session
select session, tablename, lockmode, lockstate, rowidhex from locks
SESSION  TABLENAME      LOCKMODE       LOCKSTATE  ROWIDHEX
8459     LOCKA          row_exclusive  write      00C110000000000...
8460     SYS%CAT2       row_share      ?          FFFF00000000000...
8460     SYSDDLHISTORY  row_exclusive  write      00FFFE000000000...
8460     LOCKB          row_exclusive  write      00C110000000000...

 

Wow!

Besides our two already known row_exclusive locks on tables LOCKA and LOCKB we also find one for SYSDDLHISTORY and a row_share lock for SYS%CAT2.

What are those about?
Well, the lock for SYSDDLHISTORY is for an insert statement that is automatically done with MaxDB >= 7.7 whenever a DDL statement is issued.
The SYSDDLHISTORY table will contain all committed DDL statements by that – neat feature but has nothing to do with what we want to do here.
The SYS%CAT2, in turn, is the mentioned catalogue lock.

Now let’s create the deadlock:
#### Session 2
alter table lockb add (colc varchar(10))
--> hangs !
#### Monitoring session
select session, tablename, lockmode, lockstate, rowidhex from locks
SESSION  TABLENAME      LOCKMODE       LOCKSTATE  ROWIDHEX
8459     SYS%CAT2       row_share      ?          FFFF00000000000...
8459     SYSDDLHISTORY  row_exclusive  write      00FFFE000000000...
8459     LOCKA          row_exclusive  write      00C110000000000...
8460     SYS%CAT2       row_share      ?          FFFF00000000000...
8460     SYSDDLHISTORY  row_exclusive  write      00FFFE000000000...
8460     LOCKB          row_exclusive  write      00C110000000000...
select tablename, h_applprocess as holder, h_lockmode,
r_applprocess as requestor, r_reqmode from lock_waits
TABLENAME  HOLDER  H_LOCKMODE     REQUESTOR  R_REQMODE
LOCKB      4132    row_exclusive  1904       sys_exclusive
LOCKA      1904    row_exclusive  4132       sys_exclusive

Now, this is, in fact, a deadlock but MaxDB does not do anything about it.

The reason for that is simple:
The deadlock detection does not include the share locks!

To be precise, for share locks the kernel does not maintain a list of session IDs, but only a single counter.
Based on this counter it’s not possible to find out which session is holding/waiting for a specific share lock and in consequence, the kernel cannot tell which tasks to roll back.
In this case, one user task needs to be manually cancelled or the lock timeout will deny the first request.

Although this is an ugly limitation of the deadlock detection it’s not really that bad in day to day DB usage.
The reason simply is that usually there are only few DDL commands running in parallel – especially when it’s not the upgrade weekend.

3) The dead walk – how deleted rows reappear

Ok, one last thing 🙂

It’s a simple effect that I found to be surprising while I was playing around with locks during the ‘research’ phase for this blog.

#### Session 1
select * from locktest
THE_ROW  THE_ROW2
1        ?
10       ?
2        ?
3        ?
4        ?
5        x
6        x
7        x
8        x
9        x
delete from locktest where the_row >='5'
More than one row updated or deleted. Affected Rows:  5
-> SEE: no commit here!
#### Session 2
select * from locktest
THE_ROW  THE_ROW2
1        ?
10       ?
2        ?
3        ?
4        ?

 

Where is the data?

#### Session 1
rollback
#### Session 2
select * from locktest
THE_ROW  THE_ROW2
1        ?
10       ?
2        ?
3        ?
4        ?
5        x
6        x
7        x
8        x
9        x

 

There it is!

This is a really nasty feature if you come from other DBMS like Oracle.
MaxDB currently (!) does not support a consistent view concurrency and it does not reconstruct deleted rows.
Since deletions are done in-place during the statement execution (and not at commit time) the deleted rows are really just gone when the second session looks into the table.
There’s nothing there to tell the second session to look for old data, the data is just gone.

If your application really relies on a consistent view of the data without data access phenomena like ‘dirty reads’, ‘non-repeatable reads’ etc. then you either need to use a higher transaction isolation mode (but loose scalability by that) or make your application aware of this.

Looking back

As we’ve seen locking is not really something that is ‘just there’.
It can become pretty important to be able to differentiate between what locking can do for you and what it wouldn’t do.

One important thing I did not mention yet explicitly: I’ve been just writing about SQL locks. But MaxDB (and the other DBMS as well) rely on multiple different shared resources that need to be protected/serialized as well.

For that task, MaxDB uses B*Tree-locks, critical regions, semaphores & mutexes, filesystem locks and the like.

So there’s plenty of topics to write about …

Resources

For more information on this area of MaxDB please check these resources:

MaxDB Internals Course – Locking

SAP Note #1243937 – FAQ: MaxDB SQL-Locks

MaxDB Dokumentation – Locks

Marketing

If you’re not already booked for October 27-29 this year and you happen to stay in Vienna and you keep asking yourself what to do … then get your ticket for SAP TechED 2009 and make sure to attend my MaxDB session !

In addition to the presentation there will be a expert session on the afternoon, where I’ll await your questions that I hopefully can answer.
It’s session EXP349 MaxDB Q&A Tuesday, 2:30 P.M that you should register for.

 

MaxDB – On locking mechanisms and how we get to know our MaxDB a bit better each day… Part I

One of the most boring topics within the area of DBMS surely is locking.
It’s also one of the topics that most DB users try to avoid as much as possible.

Still, it’s utmost important to correctly understand how locking works in the database of choice to be able to work successfully with it and without bad surprises.

Here are two cases of MaxDB locking behaviour that is not obvious at all (and also not well documented, yet).
Note: the first part “grew” while I was researching about it, so I split this blog into two parts. The second part will be released within the next few days.

1) When not to trust count(*)

With the advent of the file directory counters a neat query optimization for the popular ‘select count(*) …‘ was implemented.
Whenever the optimizer realizes that the user just wants to know the total number of rows in a table this number is retrieved from the file directory counter of the table and not from the table itself.

The performance gain is immense: instead of reading potential thousand of pages and count the rows that are found, MaxDB just looks up the counter and returns the number to the user.

But what happens when there is a transaction open for this query?
Let’s see:

####Session 1:
select * from locktest
THE_ROW
-------
1
10
2
3
4
5
6
7
8
9
select count(*) as cnt from locktest
CNT
---
10
explain select count(*) as cnt from locktest
SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
------------------------------------------------------------------------------------------
LARS        LOCKTEST                    TABLE SCAN                                       1
                                        COUNT OPTIMIZATION                      
                                             RESULT IS COPIED   , COSTVALUE IS           2
                                        QUERYREWRITE - APPLIED RULES:           
                                           DistinctPullUp                                1
####Session 2:
insert into locktest (select the_row+20 from locktest)
More than one row updated or deleted. Affected Rows:  10
####Session 1:
select count(*) as cnt from locktest
CNT
---
20
####Session 2:
rollback
Statement successfully executed. No Result
####Session 1:
select count(*) as cnt from locktest
CNT
---
10

As we see select count(*) returned the count including the additional 10 rows, although they never were committed!
Luckily the explain tells us about the optimization by printing out the “COUNT OPTIMIZATION” line.

A way out?

But how can we force MaxDB to actually deliver the result for committed rows?
Until writing this blog I thought the solution would simply be to trick the optimizer out of the COUNT OPTIMIZATION, e.g. by adding a dummy predicate that will evaluate to TRUE for all rows, let’s say 1=1.

Let’s see how this works:

####Session 1:
explain select count(*) as cnt from locktest where 1=1
SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
------------------------------------------------------------------------------------------
LARS        LOCKTEST                    TABLE SCAN                                       1
                                             RESULT IS COPIED   , COSTVALUE IS           3
                                        QUERYREWRITE - APPLIED RULES:           
                                           DistinctPullUp                                1

Ok, the “COUNT OPTIMIZATION” is gone, so we should expect the ‘correct’ counting of data…

####Session 2:
insert into locktest (select the_row+20 from locktest)
More than one row updated or deleted. Affected Rows:  10
####Session 1:
select count(*) as cnt from locktest where 1=1
CNT
---
20

WOW – wait a minute!
Still the same wrong result?
YES!
So I’ve written nonsense above?

Nope, what I wrote is actually true.
So how come that we still see the uncommitted data?

The point here is, that there are some more optimizations build in that are all but obvious.

Clever tricks

When applying the dummy-predicate MaxDB actually will walk down through the whole table.
So far the explain is completely correct.

But what happens with the rows read is what makes up the odd behaviour.
For each row found the kernel now checks whether the row is qualified against the predicates.
Part of this is to recognize that just counting is wanted and no actual data from the table should be returned.
So the kernel takes a shortcut and reports back that the row did not fit but increments a counter instead. This is then done with all rows in the table.

The key point here is to know that the kernel does only check the locks of a row once it has been recognized as “qualified row”.
Therefore this (obviously old) count(*) optimization shortcuts before the lock handling thereby always delivering uncommitted read-results.

Phew… weird isn’t it? But things are getting creepier!

Another way to fool the optimizer out of the set of optimizations is to use “count(column_name)” instead of “count(*)”.
The problem with that is that for “count(column_name)” only those rows are counted, for which the column is not a NULL value.
This might not be a problem for most SAP tables, but surely can be an issue for others.

Basically, only two “safe” ways exist to force the committed read-count.
In both of the following cases, the kernel really reads all rows and really needs to check locking but checking for entries in the lock lists.
The two ways are:

1) using “count(pk_column_name)”

In our example column “THE_ROW” forms the primary key, so let’s check this counting approach:

####Session 2:
insert into locktest (select the_row+20 from locktest)
More than one row updated or deleted. Affected Rows:  10
###Session 1:
select  count(the_row) as cnt from locktest
CNT
---
10
SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
------------------------------------------------------------------------------------------
LARS        LOCKTEST                    TABLE SCAN                                       1
                                             RESULT IS COPIED   , COSTVALUE IS           3
                                        QUERYREWRITE - APPLIED RULES:           
                                           DistinctPullUp                                1

If we change the table and add a new column which might contain NULL values we can observe how fragile this approach can become:

####Session 2:
rollback
alter table locktest add (the_row2 varchar(20))
commit

Now set some values…

update locktest set the_row2='x' where the_row >='5'
More than one row updated or deleted. Affected Rows:  5
commit
select * from locktest
THE_ROW  THE_ROW2
-------  --------
1        ?
10       ?
2        ?
3        ?
4        ?
5        x
6        x
7        x
8        x
9        x

If we now just use THE_ROW2 for counting we will get the number of columns that are NOT NULL, regardless of the locking state of the rows:

select count(the_row2) as cnt from locktest
CNT
---
5

And, of course, locking would be adhered to in this case.
That means that without the commit after the update a second session doing the count(the_row2) would simply be blocked until the commit has been issued (when we assume to work at least in isolation mode 1 – committed read).

2) using multiple aggregate-functions at once e.g. “count(*), count(*)”

####Session 2:
insert into locktest (select the_row+20, the_row2 from locktest)
More than one row updated or deleted. Affected Rows:  10
####Session 1:
select count(*) as cnt_1, count(*) as cnt_2 from locktest
CNT_1   CNT_2
-----   -----
   10      10
   
SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
------------------------------------------------------------------------------------------
LARS        LOCKTEST                    TABLE SCAN                                       1
                                             RESULT IS COPIED   , COSTVALUE IS           3
                                        QUERYREWRITE - APPLIED RULES:           
                                           DistinctPullUp                                1

 

NOTE: Here, session 1 is also hanging, until session 2 commits or roles back!

That’s it (for now)

That’s it about the count(*) locking effects that may occur with MaxDB.
Quite strange stuff going on here, isn’t it?

So why aren’t there problems with MaxDB and the SAP products all the time?
That’s simply because the NetWeaver application servers use uncommitted read (a.k.a. isolation level 0) all the time!

The data access layer is designed to know that rows that are found in the database can already be outdated by then. Since there is a whole business transaction handling engine implemented in the NetWeaver stack (“above” the DBMS) it really doesn’t matter at all for the work processes.
Whenever a user needs to ‘really’ put his hands on a row the application needs to get a enqueue anyhow and that ensures that the correct values are retrieved and now other users can change them while the user is working on the data.

And oh, yes, this is just another big argument against direct database access…

continue reading about locking behaviour in part II of this blog, coming up this week.

Questions to SAP-Support (MaxDB): “Data is gone after reboot – where the heck is it?”

From time to time I start to believe that there are certain problems that would not come up anymore, once the topic is documented well enough.

My colleagues from the development know that too well as I’m asking for documentation and notes all the time for everything.
However, the written documentation does help nothing if it’s not read…

BACK IN TIME

Last week I got this message and I believe it will be interesting for many MaxDB users “out there”.

Customer:

“Hello SAP,

The Server XYZ has been rebooted yesterday morning.
After the reboot, I started SAP, and everything seemed ok.
Unfortunately, the system is now in a state that it was two weeks ago when a client copy was done.
[…]
The system administrator has looked on OS-level and can’t find anything wrong.
We need to know what could have happened and why there is an SAP system with a database which is two weeks old?
[…]

Best regards, …”

Ok, what’s that about?

The server got rebooted, and now the database is back at a previous state?

When I read the description of the customer I already suspected what turned out to be the actual reason for the loss of data.

To verify my assumption I only needed to check the DBM.PRT file.

This is what I found:

[...]
command db_admin -f
command auto_extend show
command auto_update_statistics show
command util_execute GET RESTART VERSION
command auto_extend show
command auto_update_statistics show
command util_execute GET RESTART VERSION
command util_execute SET LOG WRITER OFF                  <<<<<< !!!!
command util_execute GET RESTART VERSION
command auto_extend show
command auto_update_statistics show
[...]

There is was: the log writer of the database had been turned off.
So why is this something to note?

The short answer is:
As soon as you disable the log writer, no automatic savepoints are written anymore.

If you now ask yourself “Wait a minute – what have savepoints to do with the log writer and why does this cause a huge loss of data?” then you may want to read on.

THE CONVERTER – KEEPER OF THE DATABASE STATES… A LITTLE EXCURSION…

Ok, the first thing to understand here is the way MaxDB stores and retrieves data to and from the disks.

Unlike most other databases, MaxDB does NOT store data at s specific physical location.
Instead, a row is stored on a logical page and that logical page changes its physical location in the data volumes every time the page is changed and written to disk.
This concept is called “COPY-ON-WRITE”.

One advantage of this is that by writing out data to the disks MaxDB automatically can even out the filling of the data volumes and thereby guarantees an equal distribution of Ioverallall data volumes.

Of course, something needs to keep track of this so that the database can actually find the correct location of each page. This something is the so-called converter.

You put a logical page number into it and get a physical block location out (or vice versa).
Basically you could say when a page is in the converter, the database knows it – otherwise, it does not know it.

Seen from that angle, the converter holds the state of the database at all times.

“But, wait a minute, where is the information of the converter stored?”

Good question!
In fact, it’s the very key point of the customer’s problem here, so pay attention! 😉

Since the converter is so utterly important to the database (without it all the pages in the data volumes are just nonsense), MaxDB needs to save it regularly, so that it’s there the next time the database is started.
Therefore, at the end of each savepoint, MaxDB writes down the converter into the data files to places that are currently not allocated by the actual user data or the old converter.
These places, in turn, are again spread dynamically overall data volumes.

STILL AWAKE?

Did you notice something?

We have data volumes full of pages that can only be brought into context and meaning by the converter. And this converter itself is also stored in the same cryptic way in the data volumes.

“How the heck can MaxDB figure out which pages belong to the converter when it starts up?”

Again, good question!
The trick here is the data structure used by the converter.
It’s a B-tree, so all we need to have is the root-page location to find all other pages that belong to the converter.

Unlike usual data pages, converter pages on disk do have pointers to physical locations to the other converter pages, enabling MaxDB to find and read all converter pages before having the converter available for the mapping of logical to physical pages.

Now, the last missing link, the one to the root page of the converter comes in:

The restart record.

This is a ‘magic’ data structure that is always located in the first block of the first data volume.
In that restart record, the link to the converter root page is stored as a combined volume/offset-information.

(Just for those super-techies among you readers: it’s the crConvRootBA value you see in x_diagnose.
It contains the information about the volume in the lower-byte and the offset in the upper bytes. E.g. crConvRootBA: 1164033 = 0x11C301 => Offset 0x11C3, Volume 0x01.)

Fine, to summarize:

  • We have a data structure that brings meaning the randomly scattered pages in the data volumes called CONVERTER.
  • MaxDB does only know the pages that are in the CONVERTER, since for user data only logical page numbers are used to reference between pages.
  • We find this converter during the restart by following the link from the restart record and the B-Tree links of the CONVERTER itself.

So whenever the converter is written down to the data volumes, this link to the converter root needs to be updated. The restart records have to point to the new converter root.
In fact, this is done as the very last action during the savepoint.
If this fails, the old converter will be found the next time the database starts up.

By the way: this is precisely the way that SNAPSHOTS work in MaxDB.
We simply store multiple converters and protect the pages that belong to it from being overwritten.
When we then want to ‘restore’ one of the snapshots all we’ve to do is to change the restart record and load the converter by restarting the database.

UPPS!

Yes, here we have it.

During the month of work where the database was running, there seem to have been no SAVEPOINT.
How can that be?

Isn’t a SAVEPOINT expected to happen AT LEAST every 600 seconds (or in whatever interval is configured by the _RESTART_TIME parameter)?
Indeed it should, but for the SAVEPOINT to be actually triggered based on this timing, another condition has to be met:
There have to be some (5000) LOG-IO-Actions taken place since the last savepoint.

DOUBLE UUPS!

By disabling the log writer, the customer effectively turned off the automatic writing of savepoints!

Of course, he could have triggered savepoints manually or by taking backups, creating indexes or by just stopping the database via a DB_OFFLINE.
All these actions would have written a savepoint and thereby saving the precious converter information in the data area, available for the next restart.

Unfortunately, the customer decided, that the server needed a reboot and just ran “shutdown -r now“.
This command, of course, does know nothing about MaxDB or DB_OFFLINE but just KILLS all still running processes.
As a consequence of this, the restart record of course still pointed to the converter root of the converter that was written to disk a month ago.
It’s no wonder, that the database occurred as “of a month ago” when it was restarted afterwards.

The outcome of this was that the customer simply lost a month worth of work and there was no technical option to rescue his data.

WHAT A BAD MALICIOUS FUNCTION IS THAT?

One could argue that such a function should have no place in a DBMS because one of the main functions of DBMS is to keep the users’ data safe.
Well this feature is in fact not meant for normal operations.

It’s a benchmark feature that allows certain performance tests without the effect of log writing. Also, there are specific situations where you can greatly benefit from this function.
One example is the loading of data via R3Trans during the installation of a system.
Of course the R3Trans application is aware of the effects that switching off the log writer has and triggers savepoints manually.

All in all it’s just another feature that can be used correctly or the wrong way.
And unfortunately the user this time used it the extremely wrong way.

The only morale here is: understand what a command or feature does before using it!
Read the documentation and try things out beforehand!

Links

Important concepts like the CONVERTER are of course explained and documented, so here is a list of links to the relevant resources (for those of you that still cannot get enough of tech-talking):

No-Reorganization Principle: Data Storage Without I/O Bottlenecks

Documentation: Savepoint

Documentation: Converter

SAP support note #869267 – FAQ: MaxDB LOG area, 35 + 36

 

A LITTLE ADVERTISING

I was asked to blog about the TechEd Session I’m going to deliver, so here it is: If you’re not already booked for October 27-29 this year and you happen to stay in Vienna/Austria and you keep asking yourself what to do … then get your ticket for SAP TechED 2009 and make sure to attend my MaxDB Session

Questions to SAP Support: “How to prevent MaxDB from creating backups to the RUNDIRECTORY?”

Hello, community!

Just a minute ago I got to work on a message where a customer asked:

"Hello support team, could you please tell me how to teach MaxDB
not to save backups to the default RUNDIRECTORY but to a path specified by me?"

Since it’s actually not that difficult to figure out how this works, the customer solved this issue himself.

Nevertheless, I read this question one time too often, so here is the answer, public and available for everybody.

To be able to take backups from a MaxDB database (or a liveCache, ContentServer, OneDB respectively) you’ve to define backup templates (or backup mediums how they were called in earlier releases).

These templates describe the kind of data to be backed up (DATA, LOG, INCREMENTAL DATA), the type of backup medium (TAPE, PIPE, EXTERNAL TOOL, FILE), the parallelity etc.
For a backup to the filesystem, a file name needs to be specified. The Database Manager Gui (DBMGUI) automatically fills the respective textbox when you try to add a new medium:

As you can see, we find just a file name there but no folder/path.

If we leave it like that the MaxDB kernel cannot know where to put this file so it uses the current directory it runs in – the RUNDIRECTORY.
In order to change that, all you’ve to do is to *supply the full path plus the file name* as shown in the picture below:

If you know it it’s obvious, isn’t it?

best regards,
Lars

‘Mind the gap…’ (#4 – reading query execution plans)

In my mini-series “Mind the gap…” I will try to shed some light on where the little or big differences between MaxDB and Oracle databases are and what to keep in mind when working with them.

Today I’d like to write a bit about the pitfalls you can tap into when trying to apply your query tuning knowledge from Oracle to your MaxDB instance.

Basically, both databases do employ a so-called cost-based optimizer to figure out what access strategy will deliver the results as fast as possible.

For those of you that don’t know what ‘optimizer’ or ‘execution plan’ means, I recommend to grab the documentation of either one of the DBMS and read about it there. Here I will focus on the differences between both systems.

The test schema

To make it easier to compare the output of the explain-statements for both DBMS I use the same test data on both databases.

Actually, it is just the demo schema ‘HOTEL’ that comes with every MaxDB.

Here’s how it looks like from a modelling point of view:


To make things a bit more equal for both databases, I created unique indexes on Oracle wherever there is a primary key on MaxDB.As you see it’s a pretty straightforward schema with no specialities.

Also, still to make it simpler, I will just focus on the CUSTOMER and CITY tables as this should be enough to make the points.

1. Query, single table – get used to the explain commands

select * from city

This first query is really just for getting in touch with the command available to display information about the execution plan.

While for Oracle there is one command ‘EXPLAIN PLAN FOR’ to generate the execution plan and another one ‘SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);’ to display it, MaxDB directly shows the execution plan as the result of ‘EXPLAIN’ or ‘EXPLAIN JOIN’.

And, yes, there are also other options to generate execution plans for both databases, but the ones mentioned above are the most commonly used ones.

One word about the output of the Oracle EXPLAIN PLAN: when used from within SAP CCMS (ST04/ST05) you’ll see SAPs own representation of the execution plan.

It contains less information but is more graphically oriented – anyhow, since newer releases of the SAP_BASIS also provide the Oracle output formatting I’ll stick with this for this blog.

So why does MaxDB needs two commands if it displays the execution plan automatically? Because the information of both outputs differs.

In fact, MaxDB’s explain facilities needed to be extended and fit to some old coding – so, for now, we need to live with two EXPLAINs to see all information (remark: a completely new implementation of EXPLAIN is on it’s way but likely won’t be available before MaxDB 7.8).

We’ll see what both commands do later on, so let’s get started with the simple query above.

Oracle
SQL> explain plan for
2  select * from city;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 3973142328

————————————————————————–
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
————————————————————————–
|   0 | SELECT STATEMENT  |      |  1765 | 31770 |     5  (20)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| CITY |  1765 | 31770 |     4   (0)| 00:00:01 |
————————————————————————–

8 rows selected.

MaxDB
SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
LARS        CITY                        TABLE SCAN                                      11
                                           RESULT IS NOT COPIED , COSTVALUE IS          11
                                        QUERYREWRITE - APPLIED RULES:           
                                           DistinctPullUp                                1

So, what do we see here?

Oracle

Oracle shows us the access strategy (a full table scan), the accessed table (CITY) and for each line, we see the expected number of rows, the bytes to be processed, a cost value and an estimation for the execution time.

An important thing to know here is that the costs basically add up to each level of execution. We’ll see below what ‘level of execution means here’. Anyhow, this level is shown by how far the entry in ‘OPERATION’ is indeed to the right.

Note that we don’t see how many blocks Oracle expects to touch – this information in encoded to the COST for each line. Finally, the ‘TIME’ column is derived from the COSTs and some system-wide performance statistics.

MaxDB

MaxDB, in turn, shows us the used table (CITY in schema LARS) and the chosen operation on the table (TABLE SCAN here as well). Instead of COSTS or TIME we see a PAGES column.

ATTENTION: this DOES NOT contain the estimated number of pages to be accessed at this line, but it shows the SIZE of the table/index accessed by the operation in the current line. Therefore the PAGES of a multi-step execution plan DON’T add up like they do in Oracle.
Instead, the MaxDB explain provides a separate line ‘COSTVALUE IS’. This and only this line gives us information about the estimated effort of the query.

The lines below ‘Queryrewrite…’ are just additional information – the numbers shown in the PAGES column here simply tell how often a specific rewrite rule had been applied to the query. So these aren’t costs either.

One word to the COSTVALUE line: this comes in two fashions:

1) ‘RESULT IS NOT COPIED, COSTVALUE IS’ – this means the result rows are delivered to the client on-the-fly. They aren’t copied to a temporary result buffer.

2) ‘RESULT IS COPIED, COSTVALUE IS’ – the result has bee copied to an internal temporary storage and will be delivered from there to the client when it fetches the data.

Ok, now that we know these basics, let’s move on to a bit more complicated statement:

2. Query, single table with selection criteria

Oracle
SQL> explain plan for

  2  select * from city where name='Dallas';

Explained.
SQL>  select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------
Plan hash value: 2402407757

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |    21 |   378 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CITY            |    21 |   378 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_CITY_NAME_ZIP |    21 |       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NAME"='Dallas')

14 rows selected.

MaxDB
explain
select * from city where name='Dallas'
SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
LARS        CITY       I_CITY_NAME      EQUAL CONDITION FOR INDEX                       11
                       NAME                  (USED INDEX COLUMN)                
                                           RESULT IS NOT COPIED , COSTVALUE IS           2
                                        QUERYREWRITE - APPLIED RULES:           
                                           DistinctPullUp                                1

Oracle

Here we see a bit better, how the ‘execution level’ works. Oracle first accesses the Index I_CITY_NAME_ZIP (you guessed it – it covers the NAME and ZIP columns). From the index, Oracle gets all ROWIDs that fit for the WHERE condition.

There we also note, that Oracle nicely tells us for each step in the execution plan, what filters and/or access conditions are used. ‘ACCESS’ means here, that Oracle can use the condition to locate the data – just like it does by accessing the index with a search condition. If Oracle has to browse through all the data to find the fitting rows it will print out “filter” in the predicate information.

MaxDB

The MaxDB output also gains information: the name of the index accessed plus all used columns of the index are printed below the table access. Again the PAGES number in the line for the index access does NOT tell us the expected costs but simply show how large the index is in total. There is one exception to this rule (when a KEYACCESS is done) but we leave that out for now.

Up to here, the differences between both explain tools are remarkable but don’t lead to a big difference in understanding a query’s execution strategy.

But up to here we only had a look at the most simple statements there are.

Let’s see how things turn out if we apply some pressure 🙂 and do joins.

3. Query, join

Oracle
SQL> explain plan for
  2   select c.title, c.firstname, c.name, c.zip, ci.name, c.address
  3   from customer c inner join city ci on c.zip=ci.zip;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 4175067910
-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |  3731 |   229K|    14   (8)| 00:00:01 |
|*  1 |  HASH JOIN         |          |  3731 |   229K|    14   (8)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| CITY     |  1765 | 28240 |     4   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| CUSTOMER |  3731 |   171K|     9   (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C"."ZIP"="CI"."ZIP")
15 rows selected.
MaxDB
explain
 select c.title, c.firstname, c.name, c.zip, ci.name, c.address
 from customer c inner join city ci on c.zip=ci.zip

TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
C                           TABLE SCAN                                      40
CI         ZIP              JOIN VIA KEY COLUMN                             11
TABLE HASHED
NO TEMPORARY RESULTS CREATED
RESULT IS COPIED   , COSTVALUE IS         290

-------------
explain join
 select c.title, c.firstname, c.name, c.zip, ci.name, c.address
 from customer c inner join city ci on c.zip=ci.zip

STRATEGY             SCHEMANAME  TABLENAME  MAX ADDNL. ROWS  MULTIPLIER  REVERSE MULTIPLIER  ADDITIONAL PAGES  ADDITIONAL ROWS  ACCUMULATED_COSTS
C          4480             1           1                   106.666666666667  4480             40
JOIN VIA SINGLE KEY              CI         2574             1           19.9332591768632    248.888888888889  4480             289.846383079439


Oracle

Now we really see how the ‘execution level’ works in Oracle.

The two tables are joined via the HASH JOIN. For this, the first table (the upper one in the explain plan output) is read completely into memory and HASHED.

Then the second (lower) table is read completely as well and the fitting rows (the ones that fulfil the join condition) are found by HASHING the rows against the existing HASH SET.

As you see, both table scans are on the same ‘level of execution’ since both scans deliver rows to the same operation (the HASH JOIN).

You also note that the number of rows does not add up here, but that the HASH JOIN line really contains the number of rows expected to come out of the join. As this demo schema constraints all ZIP values in the CUSTOMER table so that they have to be present in the CITY table and as ZIP values are unique (in fact this is the primary key of the CITY table) Oracle correctly figures that there is one fitting row from the CITY table for each row of the CUSTOMER table. Therefore the HASH JOIN will produce exactly the number of rows of the CUSTOMER table.

The COSTs, in turn, do add up here. So the Query costs are the cost for scanning table CUSTOMER (9) + cost for scanning table CITY (4) + an overhead for performing the actual join operation (1) = 14.

And again we see what predicates Oracle used here to find the fitting rows (yes, a HASH function IS an access method).

MaxDB

The MaxDB output for the same join is somewhat more complicated or less concentrated depending on how you like to view it.

The ‘explain’ statement just informs us about the order of the execution, specifically the JOIN ORDER.
Although all the other information provided by the Oracle ‘explain plan’ can be very useful, the join order is the most important information about join processing
Along with this important information, we get the usual information about the table and index sizes as well as the total cost estimation.

To get more information, like the ones we get from Oracle, we’ve to use the second ‘explain’ command  – ‘explain join’.

This finally gives us the estimation of rows and pages to be processed within each step.

Anyhow, what we can rely on here is mainly the ADDITIONAL ROWS column as the other columns are filled by unrevealed internal calculations that usually don’t match the intuition.

Nevertheless we see that, just like in the Oracle example, a HASH JOIN is chosen and that the number of rows in the final result set equals the number of rows in the CUSTOMER table (the difference to the number of rows in the Oracle example is simply because there ARE more customers in my MaxDB … ).

This time the result set has been copied – which also is incorporated into the cost calculation.

Ok, we’re nearly done. Let’s have a look at the most complicated case (for today ;-)):

4. Query, join with selection criteria

Oracle
SQL> explain plan for
  2   select c.title, c.firstname, c.name, c.zip, ci.name, c.address
  3   from customer c inner join city ci on c.zip=ci.zip
  4   where ci.name ='Dallas';

Explained.

SQL>  select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
————————————————————————————–
Plan hash value: 1286193604

————————————————————————————–
| Id  | Operation          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
————————————————————————————–
|   0 | SELECT STATEMENT   |                 |    48 |  3024 |    10  (10)| 00:00:01 |
|*  1 |  HASH JOIN         |                 |    48 |  3024 |    10  (10)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN | I_CITY_NAME_ZIP |    21 |   336 |     0   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| CUSTOMER        |  3731 |   171K|     9   (0)| 00:00:01 |
————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 – access(“C”.”ZIP”=”CI”.”ZIP”)
2 – access(“CI”.”NAME”=’Dallas’)

16 rows selected.

MaxDB
explain
 select c.title, c.firstname, c.name, c.zip, ci.name, c.address
 from customer c inner join city ci on c.zip=ci.zip
 where ci.name ='Dallas'
TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
CI         I_CITY_NAME      EQUAL CONDITION FOR INDEX                       11
                            ONLY INDEX ACCESSED                     
           NAME                  (USED INDEX COLUMN)                
C          I_CUSTOMER_ZIP   JOIN VIA INDEXED COLUMN                         40
           ZIP                   (USED INDEX COLUMN)                
                                 NO TEMPORARY RESULTS CREATED       
                                 RESULT IS COPIED   , COSTVALUE IS           5
------------- 
explain join
 select c.title, c.firstname, c.name, c.zip, ci.name, c.address
 from customer c inner join city ci on c.zip=ci.zip
 where ci.name ='Dallas'

STRATEGY                      SCHEMANAME  TABLENAME  MAX ADDNL. ROWS   MULTIPLIER  REVERSE MULTIPLIER  ADDITIONAL PAGES  ADDITIONAL ROWS   ACCUMULATED_COSTS
CI         25.9194422927963  1           1                   1                 25.9194422927963  1
JOIN VIA MORE THAN ONE FIELD              C          4480              2           1                   2.87993803253292  51.8388845855926  4.47963144734404

Oracle

Ok, the ‘explain plan’ output is not much more complicated now. In fact, even with the new WHERE condition, Oracle has decided for a HASH scan.
Nevertheless, you notice that now table CITY is not touched anymore. Instead, the index I_CITY_NAME_ZIP is read (see the access predicate 2) and since the index contains the ZIP data as well, it is used to build up the HASH SET for the join.

The other join part is now processed like before – a simple full table scan that is hashed to create the result set.

If you wonder, why Oracle decides for HASH JOIN here (although there is an index on ZIP on the CUSTOMER table): it’s because the estimated COSTS are lower.

Remember that this is a rather small table and Oracle uses a multiblock I/O for table scans. So, in this case, it’s faster to simply load all data and do the HASHING than look up ZIP number by ZIP number in the index on CUSTOMER and visit the table for each fitting record.

But attention: just these cost differences can lead to severe performance issues when Oracle gets them wrong. Especially in OLTP system we usually don’t want to see HASH JOINs…

MaxDB

MaxDB decides for a different plan this time: first, it also performs an INDEX ONLY access on the I_CITY_NAME index.

Due to the way how indexes work in MaxDB, every index always contains the full primary key – so this index does also contain the ZIP values.

With this information, the join can already be done. This time via the I_CUSTOMER_ZIP index on the CUSTOMER table.

It’s not written out here, but this join is a nested loop join.

Again we see: the numbers in the PAGES column DON’T add up to the COSTS line.

Instead, we notice that the COSTS are the rounded ‘ACCUMULATED COSTS’ of the ‘explain join’ command.

Cheat Sheet

------------------- 8< ------------ cut here ------------ 8< ------------
Oracle MaxDB
ONE explain statement, result needs to be displayed by:
select * from table(dbms_xplan.display);
TWO explain statements, result is visible immediately.
Dense information,CBO arithmetic can be followed. Information focus on the resulting plan – optimizer arithmetics can barely be followed.
“Costs” add up in the execution plan. “Costs” appear in the “Costs” line ONLY.
All other values in the PAGES column are the statistics of the accessed tables/indexes.
notes to know:

618868 – FAQ: Oracle performance
766349 – FAQ: Oracle SQL optimization
772497 – FAQ: Oracle Hints
176754 – Problems with CBO and RBO

notes to know:

819324 – FAQ: MaxDB SQL optimization
832544 –  FAQ: MaxDB Hints

Documentation to read:

Oracle Documentation – Overview of the Optimizer

Oracle Scratchpad
(Oracle Optimizer and Performance related blog by Jonathan Lewis)

Documentation to read:

MaxDB Wiki

MaxDB Documentation – SQL Optimizer

---------------------- 8< ------------ cut here ------------ 8< --------------------
Former editions of ‘Mind the gap…’:

‘Mind the gap…’ (#3 – Backup/Restore – Part 2)

‘Mind the gap…’ (#3 – Backup/Restore – Part 1)

‘Mind the gap…’ (#2 – Data storage)

‘Mind the gap …’ (1# – Logon mechanisms)