Category Archives: SAP Business Warehouse

Automatic testing of BW queries, anyone? Here you go!

Dear readers

This time I’m not presenting you any of my own work but instead want to advertise what my CSA colleague Tobias Kaufmann just published here on SCN:

Automatic Test of Queries with RS Trace Tool. (really well done Tobias!)

Given how important the correctness and well working (concerning results and performance) of SAP BW queries are to many users, in my eyes this is one of the most important and most underused features around.

With the trace tool environment (transaction RSTT) you are able to set up test scenarios for the key queries of your BW solution and re-execute and evaluate them whenever this is required.

Just installed a new revision of SAP HANA?

Run the tests and see if everything still works as it should.

Upgraded application server hardware and want to know if the performance improved for any of your queries?

After running the tests you’ll know.

Imported a new SAP note correction or SP stack?

After executing the tests you’ll be sure whether or not this broke any of your business reports.

Although this does not eliminate the need for end-user acceptance tests or the final approval of the report users after a critical change in the system, it does make testing your BW solution a lot easier, cheaper and more reliable.

It may not be completely effortless to setup a whole test battery that covers all of the important queries, but it surely pays out to do so.

In fact, SAP BW development uses this exact functionality for testing changes in the coding.

Personally, I totally like this thing and wish we would have something similar for queries and models that are developed right in SAP HANA.

So, if you’re running SAP BW powered by HANA don’t miss out on this feature – it’s already present in your system and only waits to be employed.

Cheers, Lars

How most BW performance support messages are solved

As the fourth quarter of the year started everybody seems to check the goals that had been set for this year and what has already been reached. Looking back on the messages I worked on I found a very clear pattern, when the message was about BW on Oracle and performance.

Most messages were solved by the very same simple five steps:

  • ensuring the database software is up do date
  • ensuring the database parameters are set as recommended
  • ensuring that the CBO statistics were, in fact, current and correct
  • ensuring that all indexes are present and usable
  • ensuring that the f-fact table doesn’t have too many partitions

Yes, these steps are boring and not very popular.
To give the recommendation to implement these steps, not a lot of special analysis is required.
In fact, it’s a rather quick thing to check on these and tell if they need to be applied. It reminds me of the flossing discussions with the dentist… Anyhow, let’s review these points briefly one by one:

Up to date Oracle database software

BW systems regularly present the Oracle query optimizer with highly complex issues to solve.
And the Oracle cost-based optimizer (CBO) is by far the most complex piece of coding in the whole DBMS software.
So, even without the long history of epic optimizer errors (check note #176754) it should be obvious, that it’s fairly impossible to not have bugs in this code.
For example, one issue that struck often with the early 11.2. patches was that the optimizer started to use the infamous MERGE JOIN CARTESIAN (MJC) operation.
Due to bugs and/or outdated statistics, the CBO believed to get one or zero rows back from one-row source and with that knowledge, a MJC eventually seems to be a good choice.
Once you get more rows than that you easily end up with PSAPTEMP overflows (ORA-1652) and horrid runtimes.

And sorry, but Oracle 8i, 9i and 10g are all outdated and not supported anymore.
See note  #1110995 Extended maintenance for Oracle Version 10.2 on this.
Of course, nobody can deny you to use Oracle 10g, it’s your system landscape after all.
But be prepared to have to install Oracle 10.2.0.5 PLUS the latest SAP Bundle Patch before any in-depth analysis (e.g.via  CBO trace) is performed.
We’re not chasing bugs in this old versions anymore.
Besides: very very likely the problems just won’t be there with the latest patch anyhow. It’s pretty mature nowadays.

Database parameters

Well, this pretty much just goes with the paragraph before: without the correct parameter settings that fit the current version and patch of the database software, bugs and issues cannot be avoided, even if they are already known.

Current/fitting CBO statistics

Still, one of the links that I give to my customers most often is the one to my own old blog post

BW on Oracle: a performance hitch I see more and more often…

Even today it seems that there are yet many systems where the DBSTATC needs to be cleaned up.
Today I just added a small detail: when you used to use report SAP_ANALYZE_ALL_INFOCUBES to collect statistics, you really should stop that.
It basically brings back the DBSTATC entries…

Another aspect of this topic is that the standard heuristic to gather new statistics is based on a threshold of changed data in a table. Usually, this threshold is at 50% of changed (that is added, updated, deleted) rows. For BW systems this can be suboptimal. For example, time dimension tables might get new data and thus certain say 0CALMONTH values might suddenly be present in the histograms. By relying on the standard heuristic the CBO won’t notice that, as no new statistics have been gathered. One way to overcome this would be to force new statistics on single tables.

Another step to take would be to force the re-collection of statistics for tables that have very old (say older than a year) statistics.

Unfortunately, there is – not yet – an automatism build into brtools or BW, but internal discussions on this have started!

Indexes present and usable

Indexes are critical for the optimizer. They speed up queries, they provide data access possibilities and sometimes even allow to leave out a table access alltogether.
But only if they are actually present on the database and usable.
For BW systems, where table partitioning is used per default, it’s important to have the indexes also partitioned.
Otherwise partition-wide actions like DROP PARTITION/TRUNCATE PARTITION will render the indexes on the table UNUSABLE or INVALID.
And they cannot be used for queries in this state.
The most common cases are described or at least referenced in note
#1513510 – 900 index in unsuable state with SAP Basis 7.00/7.01/7.02
Although I write the note initially to cover a specific bug in a SAP Basis SP it contains extensive explanations, a check SQL statement and links to other related sap notes.
Make sure to check it out and to get your indexes straight.

Too many f-table partitions

F-fact tables can be seen as the data inbox table of an infocube.
The request-wise partitioning allows quick and easy data removal in case the loaded data is not OK.
Usually this shouldn’t happen once your data mart has been properly setup and is stabilized.
Anyhow, keeping a good 30 partitions (e.g. a months worth of data) won’t do any harm.
But more than that make the work for the optimizer much more difficult.
Also, nearly nobody using business reporting checks on specific data load requests.
Instead queries are usually build upon a time reference.
The E-fact table can be partitioned by time and that way the query performance can be improved by factors.

So, make sure to check blog post
Partition count contest!
and note #590370 – Too many uncompressed request (f table partitions)
and start compressing your cubes today.

Is this it? Yep, kinda…

Are there other possible performance problems in the BW on Oracle world?
Sure they are. But these aren’t nearly as often as the ones that are solved for good by implementing the five steps discussed here.
As a little extra, now that you know about these five steps, you can plan and decide when you implement them – you don’t have to wait until you hit a serious performance issue and open a support message for that 🙂

That’s it for now. Have a great weekend!

Cheers, Lars

SAP Support case “FOR ALL ENTRIES disaster”

Last week a colleague from the WHM (data warehouse management) support team called me to get some assistance with a customer message he was processing.

The customer complained about short dumps with the infamous “TSV_TNEW_BLOCKS_NO_ROLL_MEMORY” error, which is basically the way an ABAP work process reports that it ran out of memory.

Checking the short dump

Inspecting the short dump a little further gave some hints on what happened:

[...]
|Error analysis
|    The internal table "\\PROGRAM=GP4MHJ3Z311GXR3HBZEIH3F768L\\DATA=IT_AZSDSOO0600"
|     could not be enlarged further.
|
|    Memory location: "Session memory"
|
|    You attempted to extend the data structure for the management of the
|    memory blocks for table "\\PROGRAM=GP4MHJ3Z311GXR3HBZEIH3F768L\\DATA=IT_AZSDSOO06
|    00". However, the 2696208 bytes required for
|    this were no longer available in the specified memory area.
|
|    The amount of memory requested is no longer available.
[...]

OK, apparently it had been tried to load a lot of data into an internal table (IT_AZSDSOO0600) but there hadn’t been enough free memory available for that. The error message mentions that approx. 2.5 MB had been requested, which is not really much, but not how much space was already allocated.

We also found information about what program caused this issue:

[...]
|Information on where terminated
|    Termination occurred in the ABAP program "GP4MHJ3Z311GXR3HBZEIH3F768L" - in
|     "EXPERT_ROUTINE".
|    The main program was "RSBATCH_EXECUTE_PROZESS ".
[...]

The BW-savvy folks already will have noticed that this is a generated ABAP program.
In SAP BW many activities that are specific to a certain InfoSource/Provider (e.g. loading data, transferring data…) are realized by generated ABAP programs.
These generated programs are usually based on a generic template and could often be extended with so-called expert routines to implement customer specific requirements.
As we see in the error message above, this had been done here as well.
The short dump occurred right in the expert routine of the report.

This is also proven by the stack backtrace resp. the list of active ABAP calls:

[...]
|Active Calls/Events
-------------------------------------------------------------------------------------------------
|No.   Ty.          Program                             Include                             Line
|      Name
-------------------------------------------------------------------------------------------------
|    7 METHOD       GP4MHJ3Z311GXR3HBZEIH3F768L         GP4MHJ3Z311GXR3HBZEIH3F768L           557
|      LCL_TRANSFORM=>EXPERT_ROUTINE
|    6 METHOD       GP4MHJ3Z311GXR3HBZEIH3F768L         GP4MHJ3Z311GXR3HBZEIH3F768L           504
|      LCL_TRANSFORM=>EXECUTE
[...]

The dump also nicely includes a snipped from the ABAP source where the error occurred (line 557):

[...]
|Source Code Extract
---------------------------------------------------------------------------
|Line |SourceCde
---------------------------------------------------------------------------
|  545|  METHOD expert_routine.
|  546|*=== Segments ===
|  547|
|  548|    FIELD-SYMBOLS:
|  549|      <SOURCE_FIELDS>    TYPE _ty_s_SC_1.
|  550|
|  551|    DATA:
|  552|      RESULT_FIELDS      TYPE _ty_s_TG_1.
|  553|    ... "insert your code here
|  554|
|  555|    if not SOURCE_PACKAGE is initial.
|  556|
|>>>>>|      select DOC_NUMBER DOC_CATEG REASON_REJ /BIC/ZCPP_NO  GROSS_WGT
|  558|        BASE_UOM NET_VALUE ITEM_CATEG UNIT_OF_WT /BIC/ZCASE_DAT
|  559|      from /BIC/AZSDSOO0600
|  560|              into corresponding fields of table it_AZSDSOO0600
|  561|        for all entries in SOURCE_PACKAGE
|  562|               where /BIC/ZCPP_NO = SOURCE_PACKAGE-/BIC/ZCPP_NO .
[...]

A hot lead

There we have it: A simple SELECT statement should have loaded data from the active
data table of a PSA/ODS into the internal table it_AZSDSOO0600.
If you check and compare this with the information a few paragraphs above, you’ll find that this is the internal table for which not enough memory was available.

So, we have a pretty hot lead here!

The next thing to check is, of course, to check how many rows do we expect to be returned from the source table /BIC/AZSDSOO0600?
The table itself isn’t that big: a quick look at the CBO statistics reveals that the table contains around 268.006 rows.

Could this already be too much?

By reproducing the issue we had been able to get some more details about the size of the internal table before the dump occurs:

[...]
IT_AZSDSOO0600
Table IT_24841[51189146x104]
[...]

WOW!

For some reason, we managed to select 51.189.146 rows from a table that only contains 268.006 rows.
How is that possible?
Besides that, there is also a WHERE condition with a restriction in place, so we would have expected less rows.
Checking the WHERE condition is in fact key to the solution of this problem.

The customer used the FOR ALL ENTRIES clause in this case which is a SQL feature specific to ABAP.
It allows you to fill a internal table, like SOURCE_PACKAGE in this case, with the values you want to select and hand this to the database without the need of making up a IN LIST yourself.
This is a nice treat to the ABAP developer and a good thing for the database, since the database interface layer plays a nice trick with this statement.

Instead of just creating a possibly super large IN LIST, the db interface automatically loops over the internal table and creates many SQL statements with small IN LISTS, like this:

SELECT
[...]
FROM
"/BIC/AZSDSOO0600"
WHERE
"/BIC/ZCPP_NO" IN ( :A0 , :A1 , :A2 , :A3 , :A4 )

During the execution of the ABAP SQL statement the database interface grabs chunks of 5 values from the internal table (SOURCE_PACKAGE) and executes the same statement over and over again, until all values had been used.
Since the same statement is reused very often, the parsing effort on database level is minimized and due to the short IN LIST the memory consumption for the parsing is also kept very small.

As the documentation for the FOR ALL ENTRIES clause explains, the database interface also takes care of making the lines of the total result unique.

So far so good – but still we don’t know why there was a multiplication of result rows in our case.

Maybe checking the SOURCE_PACKAGE list gives additional insight:

[...]
|SOURCE_PACKAGE
|    Table IT_1638[954x1260]
[...]

Obviously there had been quite a number of IN LIST items to be used here.
Divided by our 5 IN LIST items per statement, we can approximate that the SELECT needs to be executed 954/5 -> 190 times (+ 1 time with the remaining 4 items).
Let’s do a quick cross-check and divide our unexpected super large result set by this number:
51.189.146 / 190 => 269.416

Hmm… pretty close to our total number of rows in the table!

At this point it looks like the whole table was read into the internal table with every new chunk of five IN LIST items.
In fact this is exactly what happened here!

To be able to de-duplicate the result set before returning it to the ABAP program, the database interface first has to temporarily store all returned rows. Since there wasn’t enough memory to do this, the de-duplication wasn’t done yet, when the short dump occurred.

But why did we read the whole table over and over in the first place?
This was due to the contents of the SOURCE_PACKAGE table.
In there, the field used to create the IN LIST items (/BIC/ZCPP_NO) was far from unique.
For the vast majority of rows, the field was just the initial value (” “).
Therefore, the IN LISTs created effectively looked like this:

1st chunk (IN LIST values 1 - 5)
---------------------------------
"/BIC/ZCPP_NO" IN ( " "  ,  " "  ,  " "  ,  " "  ,  " "  )
2nd chunk (IN LIST values 6 - 10)
---------------------------------
"/BIC/ZCPP_NO" IN ( " "  ,  " "  ,  " "  ,  " "  ,  " "  )
3rd chunk (IN LIST values 11 -15)
---------------------------------
"/BIC/ZCPP_NO" IN ( " "  ,  " "  ,  " "  ,  " "  ,  " "  )

For each single chunk the repetition of a IN LIST entry would not change a bit of the result set. The duplicate conditions would just be ignored as soon as a row qualifies the first entry of the IN LIST (remember: a IN LIST is a special case of a OR and EQUAL condition).

Checking the source data in this case proved: most of the rows in the source table would be selected by this condition, so running one of the statements would produce a result worth of around 260.000 rows.

And there is our memory “eater”!

Due to the duplicate initial values in the field of the SOURCE_PACKAGE table, the database interface executed statements over and over again which effectively selected the whole table.
In addition to that it had to store all the single result sets into the internal table before removing the duplicates from it.

The solution to this problem is obvious and simple:
Before calling the SELECT … FOR ALL ENTRIES statement, the list of values to be fed into the FOR ALL ENTRIES statement should be made unique.

In this specific case, this would have made the list a lot shorter (down to ca. 20 entries).
Which would not only have prevented the short dump, but also reduced the number of SQL executions on the database from 191 to 5 plus it saved the time and memory to get rid of the unwanted duplicate rows.

As usual, once you know and understand the dependencies the effect is easy to understand – but at first sight I had been pretty surprised myself about a SELECT statement that multiplies the result set.

 

Partition count contest!

Let’s have a contest!

One of my last blog posts started off a discussion (e.g. Bala posted his thought in a blog) about the importance of the partitioning feature for an SAP BW installation.

No doubt about it – partitioning is something we use heavily for SAP BW.
Breaking the work down into smaller pieces that are easier to handle is one of the great common problem-solving heuristics and it really works wonders in many cases.

Still, partitioning is something that needs to be done right and unfortunately also something that can be done wrong.

Beware, it moves!

What many DBAs and SAP Basis admins don’t seem to fully get is that partitioning is not a static thing. Of course, the handling of partitions is very much like dealing with rather fixed stuff as tables and indexes. For the later objects, you usually set them up, use them and leave them alone for the rest of the lifetime. Sometimes a reorg may be required, but this pretty much is it.

Partitioned objects, on the other hand, are usually way more ‘vivid’ (dynamic, volatile, changing… hard to find a good matching word for this).
These objects change with the data you store in them.
And this data changes over time.
So your partitioned table from yesterday will be a different one than the one of today.

In SAP BW we use partitioning for InfoCubes in two ways:
1. the F-fact tables are partitioned by request.
Every new request that gets loading into the InfoCube is stored in its own partition.
That way, we can easily remove requests e.g. if the data is not correct or during compression/condensation.

2. the E-fact table CAN by partitioned by a time-InfoObject.
With that, we can improve query and archiving performance, when these actions are based on a time dimension-InfoObject (which is most often the case).

So far so good.

The problem now is, that the first kind of partitioning is done fully automatic.
Whenever a new request is loaded into an InfoCube, the F-fact table gets a new partition and the data is stored in it.
What doesn’t happen fully automatic is that the partitions are removed again.

To remove the partitions from the F-fact table the corresponding request (and all requests that have been loaded before that) needs to be compressed or better condensed into the E-fact table.
Basically, this operation does nothing else then adding up the numbers from the F-fact table partition to the E-fact table, stores the result in the E-fact table and then drops the partition from the F-fact table.

Of course, now you cannot remove the data anymore based on the loading request since it has been summed together with the other data in the E-fact table. On the other hand, now this addition doesn’t need to be performed at query runtime anymore and the database can use the partitioning scheme of the E-fact table for a more efficient execution plan.

Our performance is good – so what’s the hassle about?

Besides performance issues, having many partitions can lead to multiple problems:

  • usually, aggregate tables tend to have even more partitions than their basic cubes (for technical reasons), so there is a multiplication effect
  • DDL statements that are generated for the F-fact tables can become too large for export/import/migrations or reorganisations on DB level.
  • Index creation can become very slow for so many partitions, since all indexes on F-fact tables are also locally partitioned, again a multiplication factor.
  • during attribute change runs a high number of partitions can lead to crashes as explained in notes
    #1388570 – BW Change Run
    #903886 – Hierarchy and attribute change run
  • It may even happen, that it’s not even possible anymore to perform change runs or compression of requests if there are too many partitions!

For all these reasons there’s a recommendation out for a long time now:

COMPRESS! COMPRESS! COMPRESS!

Note #590370 – Too many uncompressed requests (f table partitions)

I really don’t know how many support messages have already been closed by simply compressing the requests.
And because of that and because it’s so simple to figure out whether or not there are F-fact tables with too many partitions (usually not more than 20 – 30 are recommended) I decided to start a little competition here.

Just run the following little SELECT command on your BW database to get a list of F-fact tables that have more than 50 partitions:

select table_name, substr(table_name, 7) infocube_name, partition_count
from user_part_tables
where table_name like '/BI_/F%'
and partition_count >50
order by partition_count desc;

-----------------------------------------------
|TABLE_NAME     |INFOCUBE_NAME|PARTITION_COUNT|
-----------------------------------------------
|/BIC/FZ123456  |Z123456      |         8.279 |  <<< come on, beat this :-)
|/BIC/F123456784|123456784    |           999 |
|/BIC/FTPEDBIF5X|TPEDBIF5X    |           636 |
|/BI0/F0RKG_IC3 |0RKG_IC3     |           375 |
|/BIC/F100197   |100197       |           334 |
|/BIC/FRSTTREP01|RSTTREP01    |           281 |
|/BIC/FRS_C5    |RS_C5        |           253 |
|/BIC/F100184   |100184       |           238 |
|/BIC/F100183   |100183       |           238 |
[...]
-----------------------------------------------

(be aware that this statement obviously does only work for InfoCubes tables in the standard name schema /BIC/, /BI0/, /BI… – you can, of course, adapt it to your naming scheme).

If you like to, just post your TOP partition count into the comments section – would be interesting to see, what extreme examples come up…

Although there’s no price to win, you might at least get awareness that there is something to keep an eye on in your BW database.

Short note on Line Item and High Cardinality dimensions

One of the quotes I like most I learned from Tom Kyte from Oracle.
It’s one from Artemus Ward:

It ain’t so much the things we don’t know that get us into trouble.
It’s the things you know that just ain’t so.”

This blog post is about one of those cases.

A customer complained about the performance of the index creation process in one of his process chains.

I had a look at the processes messages and found this:

[...]
28.03.2011 06:10:55 SQL: 28.03.2011 06:10:55 ALE_BWP_010
28.03.2011 06:10:55  CREATE BITMAP INDEX
28.03.2011 06:10:55 "SAPBWP"."/BIC/ECUBENAME~040" ON "/BIC/ECUBENAME"
28.03.2011 06:10:55 ( "KEY_SDIMENS1" ) PCTFREE 10 INITRANS
28.03.2011 06:10:55    20 MAXTRANS 255 STORAGE ( INITIAL 16 K NEXT
28.03.2011 06:10:55 81920 K MINEXTENTS 1 MAXEXTENTS UNLIMITED
28.03.2011 06:10:55 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
28.03.2011 06:10:55 BUFFER_POOL DEFAULT ) LOCAL PARALLEL TABLESPACE
28.03.2011 06:10:55 "PSAPBWP" NOLOGGING COMPUTE STATISTIC
 --- > 6 hours !!!!????
28.03.2011 12:47:30 SQL-END: 28.03.2011 12:47:30 06:36:35
[...]

There was one index that actually took 6 hours and 36 minutes to create, although the table scan was done in parallel on Oracle level.

The other indexes on the same table only took some minutes (except one other that took something around 30 minutes).
So what could be the reason for that?

The next thing I looked up was the number of distinct values for the different dimension KEY columns (since these are the only ones getting indexes on fact tables):

OBJECT                      DIST        NUM_ROWS   (LF) BLK
Table: /BIC/ECUBENAME                   783908640   14192040
 
  KEY_SDIMENS1            445965129              <<<<<<
    KEY_SDIMENS2                 1529
    KEY_SDIMENS3                 3223
    KEY_SDIMENS4                  795
    KEY_SDIMENS5                  439
    KEY_SDIMENS6                28578
    KEY_SDIMENS7              1126484
    KEY_SDIMENS8                83675
    KEY_SDIMENS9                53296
    KEY_SDIMENSA              2465897
    KEY_SDIMENSB                  170
    KEY_SDIMENSC            194594206             <<<<<<
    KEY_SDIMENSD                    1
    KEY_SDIMENSP                    1
    KEY_SDIMENST                  631
    KEY_SDIMENSU                   43

And this was already the hint required to solve this mystery.

Bitmap indexes are extremely efficient during lookup and when used to combine selection criteria.
What they don’t do very good is to handle large numbers of distinct values (always compared to the total number of rows in the table of course).
In this case, the dimension for which the index creation took six hours (column KEY_SDIMENS1) had more than half as much distinct values than the total number of rows in the table.

If you’re used to data modelling in BW you know that it’s recommended not to have that many distinct values in the dimensions and if it cannot be avoided then the dimension should be marked as line item dimension.
This can be done in the Infocube modelling in transaction RSA1 by right-clicking on the dimension and opening the properties dialogue.

Flagging a dimension as line item dimension changes the data model by that it removes the intermediate SID table that is normally put between fact tables and the actual dimension tables (for flexibility reasons) and joins the fact table directly with the dimension table.

It changes this :

[DIMENSION-TAB1] >-----<  (SID-TAB1) >---< [[FACT TABLE]] >---<  (SID-TAB2) >-----< [DIMENSION-TAB2]

to this:

 [DIMENSION-TAB1] >-----< [[FACT TABLE]]  >---<  (SID-TAB2) >-----< [DIMENSION-TAB2]

Besides this change in the data model the index for the line item dimension should not be a BITMAP index anymore, but a B*TREE index instead. Of course, the SAP BW developers know about this and provided a way to avoid the creation of BITMAP indexes on such dimension key columns.

The fallacy the customer and I became victims of was to believe that flagging a dimension as line item automatically includes the setting for the B*Tree index. After all, this totally makes sense, doesn’t it?

Unfortunately, but correctly, the BW developers separated these two functions and provided two flags for infocube dimensions that can be set independently:

[ x ] Line Item
and
[ x ] High Cardinality

Only setting the High Cardinality flag leads to the creation of B*Tree indexes!
Fortunately, this can be done anytime, even when the infocube is already in use and filled with data.

This is of course documented (here) but as often, things mix up in the memory and we end up believing the wrong stuff.

Before I forget to mention it: as far as I know the High Cardinality flag really only does something on Oracle-based BW systems as the Bitmap Indexes are exclusively available on Oracle.
Maybe there are similar features on other platforms, but I’m not aware of them, so I leave them out for now (and avoid believing the wrong stuff once more…).

So, make sure to check your line item dimensions to avoid long indexing runs.

Best regards,

Lars

Question to SAP Support “Why is partition pruning not working in my BW on Oracle?”

Welcome to the problem

As usual understanding, the meaning and the scope of the question is key for the answer to this question, recently filed by a customer.

So, what is the customer complaining about? What does he mean by “partition pruning“?

It’s a feature of the query optimizer to figure out, which partitions of a table or an index it can be safely left aside when running the query.
It reduces work and thereby makes queries faster.

Easy example of this

Think of a table containing sales information together with the date of the recorded sales transactions.
Now this table is partitioned by the month (say, column “0CALMONTH”) and you want to see the sales data of the first quarter of 2011.
You would probably come up with a SQL statement like this:

SELECT 0CALMONTH as MONTH, SUM(0AMOUNT) as SUM_AMOUNT
FROM SALES
WHERE 0CALMONTH is between 201101 and 201103
GROUP BY 0CALMONTH;

With the information given, the optimizer knows that all relevant data can be found in the partitions that contain the data for 201101, 201102 and 201103 – all other partitions can be ignored.

Nice, isn’t it?

So, how can we know that such stuff is happening in our database?
The explain plan contains this information:

SQL> explain plan for
  2  SELECT "0CALMONTH" as MONTH, SUM("0AMOUNT") as SUM_AMOUNT
  3  FROM SALES
  4  WHERE   "0CALMONTH"  between 201101 and 201103
  5  GROUP BY "0CALMONTH";
Explained.SQL> select * from table(dbms_xplan.display);PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
Plan hash value: 3524420984
--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    11 |     4  (25)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    11 |     4  (25)| 00:00:01 |     8 |    10 |
|   2 |   HASH GROUP BY          |       |     1 |    11 |     4  (25)| 00:00:01 |       |       |
|*  3 |    TABLE ACCESS FULL     | SALES |  3056 | 33616 |     3   (0)| 00:00:01 |     8 |    10 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------
 3 - filter("0CALMONTH"<=201103)

15 rows selected.

 

The Pstart/Pstop columns tell us precisely, which partitions are used for the single steps – and we see, that only 3 of them (8,9,10) are actually used.

Back to the customer

Ok, so my customer was telling me, that exactly THIS was NOT happening on his system.
Since we use partitioning in BW on Oracle for a long time very successfully, I asked for a reproduction example, which the customer swiftly provided.
Based on what the customer told me I should run a BW query in RSRT and activate the checkbox to display the execution plans for the SQL statements (interestingly this one is called “Display Runschedule” in BW 7.0 onwards…).

A first mistake (or “to know your checkboxes and save everybody’s time”)

When I first ran the query, this was the SQL against the E-fact table which I assumed to be the problematic one (I removed some uninteresting stuff to make reading a bit easier):

====> QUERY 1 

SELECT "DT"."SID_0FISCVARNT"      AS "S____034",
 "DT"."SID_0CALDAY"         AS "S____003",
 "D1"."SID_0PLANT"          AS "S____066",
 "X1"."S__0MATL_GROUP"      AS "S____170",
 "DU"."SID_0LOC_CURRCY"     AS "S____239",
 SUM ("F"."ISSVS_VAL")      AS "Z____565",
 SUM ("F"."RECVS_VAL")      AS "Z____571",
 COUNT(*)                   AS "Z____149",
 [ ... ]
 FROM   "/BIC/EZICSTKBAL" "F"                     // <<-- our partitioned E-fact table
 [ ... ]
 WHERE  (( ((     "DT"."SID_0CALDAY" = 20100101
 AND "DT"."SID_0CALMONTH" = 201001
 AND "F"."SID_0CALMONTH" = 201001 )) // <<-- a selection criteria on the partitioning column
 AND (( "DP"."SID_0CHNGID" = 0 ))
 AND (( "DT"."SID_0FISCVARNT" = 14 ))
 AND (( "X1"."S__0MATL_GROUP" = 28 ))
 AND (( "D1"."SID_0PLANT" = 14 ))
 AND (( "DP"."SID_0RECORDTP" = 0 ))     // <<-- we select a certain kind of data here
 AND (( "DP"."SID_0REQUID" <= 72505 )) ))
 AND "X1"."OBJVERS" = 'A'

GROUP  BY "DT"."SID_0FISCVARNT",
 "DT"."SID_0CALDAY",
 "D1"."SID_0PLANT",
 "X1"."S__0MATL_GROUP",
 "DU"."SID_0LOC_CURRCY"

Execution Plan

---------------------------------------------------------------------------------------------------------------------------------
 | Id  |Operation                                  | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
 ---------------------------------------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                           |                    |     1 |   166 |    15   (7)| 00:00:01 |       |       |
 |   1 |  HASH GROUP BY                             |                    |     1 |   166 |    15   (7)| 00:00:01 |       |       |
 |*  2 |   TABLE ACCESS BY INDEX ROWID              | /BIC/DZICSTKBALP   |     1 |     9 |     1   (0)| 00:00:01 |       |       |
 |   3 |    NESTED LOOPS                            |                    |     1 |   166 |    14   (0)| 00:00:01 |       |       |
 |   4 |     NESTED LOOPS                           |                    |     1 |   157 |    13   (0)| 00:00:01 |       |       |
 |   5 |      NESTED LOOPS                          |                    |     1 |   151 |    12   (0)| 00:00:01 |       |       |
 |   6 |       NESTED LOOPS                         |                    |     1 |   141 |    11   (0)| 00:00:01 |       |       |
 |   7 |        NESTED LOOPS                        |                    |     1 |   132 |    10   (0)| 00:00:01 |       |       |
 |   8 |         NESTED LOOPS                       |                    |     1 |   125 |     9   (0)| 00:00:01 |       |       |
 |   9 |          TABLE ACCESS BY INDEX ROWID       | /BIC/DZICSTKBALT   |     1 |    18 |     0   (0)| 00:00:01 |       |       |
 |* 10 |           INDEX RANGE SCAN                 | /BIC/DZICSTKBALT01 |     1 |       |     0   (0)| 00:00:01 |       |       |
 |  11 |          PARTITION RANGE SINGLE            |                    |    54 |  6156 |     9   (0)| 00:00:01 |    54 |    54 |
 |* 12 |           TABLE ACCESS BY LOCAL INDEX ROWID| /BIC/EZICSTKBAL    |    54 |  6156 |     9   (0)| 00:00:01 |    54 |    54 |
 |  13 |            BITMAP CONVERSION TO ROWIDS     |                    |       |       |            |          |       |       |
 |* 14 |             BITMAP INDEX SINGLE VALUE      | /BIC/EZICSTKBAL~02 |       |       |            |          |    54 |    54 |
 |* 15 |         TABLE ACCESS BY INDEX ROWID        | /BIC/DZICSTKBAL1   |     1 |     7 |     1   (0)| 00:00:01 |       |       |
 |* 16 |          INDEX UNIQUE SCAN                 | /BIC/DZICSTKBAL1~0 |     1 |       |     0   (0)| 00:00:01 |       |       |
 |  17 |        TABLE ACCESS BY INDEX ROWID         | /BIC/DZICSTKBAL2   |     1 |   141 |     1   (0)| 00:00:01 |       |       |
 |* 18 |         INDEX UNIQUE SCAN                  | /BIC/DZICSTKBAL2~0 |     1 |       |     0   (0)| 00:00:01 |       |       |
 |* 19 |       TABLE ACCESS BY INDEX ROWID          | /BI0/XMATERIAL     |     1 |    10 |     1   (0)| 00:00:01 |       |       |
 |* 20 |        INDEX UNIQUE SCAN                   | /BI0/XMATERIAL~0   |     1 |       |     0   (0)| 00:00:01 |       |       |
 |  21 |      TABLE ACCESS BY INDEX ROWID           | /BIC/DZICSTKBALU   |     1 |     6 |     1   (0)| 00:00:01 |       |       |
 |* 22 |       INDEX UNIQUE SCAN                    | /BIC/DZICSTKBALU~0 |     1 |       |     0   (0)| 00:00:01 |       |       |
 |* 23 |     INDEX RANGE SCAN                       | /BIC/DZICSTKBALP02 |     1 |       |     0   (0)| 00:00:01 |       |       |
 ---------------------------------------------------------------------------------------------------------------------------------

As I saw (and as you see now) partition pruning was fully operational on this query (check the bold line and the PSTART/PSTOP columns!) – so what was the customer talking about?

Well, the tricky part of this BW query started via RSRT is, that the OLAP layer does run multiple SQL statements in parallel to reduce the total runtime of queries.
That was the reason why the customer initially told me to check the “No parallel processing” box in the Execute+Debug dialogue of RSRT.
A hint that I simply overlooked at first.

Et voilá, the problem!

Once I re-ran the query with parallel execution disabled, I was able to see the execution plan the customer was complaining about:

===> QUERY 2

SELECT "DT"."SID_0FISCVARNT"      AS "S____034",
 "DP"."SID_0RECORDTP"       AS "S____011",
 "DT"."SID_0CALDAY"         AS "S____003",
 "D1"."SID_0PLANT"          AS "S____066",
 "X2"."S__0MATL_GROUP"      AS "S____170",
 "DU"."SID_0LOC_CURRCY"     AS "S____239",
 "DU"."SID_Z_UOM_EA"        AS "S____157",
 "DU"."SID_Z_UOM_FT"        AS "S____158",
 "DU"."SID_Z_UOM_LB"        AS "S____159",
 'ZICSTKBAL'                AS "K____002",
 188                        AS "S____002",
 SUM ("F"."ISSVS_VAL")      AS "Z____565",
 [ ... ]

FROM   "/BIC/EZICSTKBAL" "F"
 [ ... ]

WHERE  (( (( "DT"."SID_0FISCVARNT" = 14 ))          // <<-- no selection criteria on the partitioning column anymore
 AND (( "X2"."S__0MATL_GROUP" = 28 ))
 AND (( "D1"."SID_0PLANT" = 14 ))
 AND (( "DP"."SID_0REQUID" <= 72505 )) ))
 AND (((( "DP"."SID_0RECORDTP" = 1            // <<-- we also select some other type of data here...
 OR "DP"."SID_0REQUID" > 0 ))))
 AND "X2"."OBJVERS" = 'A'
 GROUP  BY "DT"."SID_0FISCVARNT",
 "DP"."SID_0RECORDTP",
 "DT"."SID_0CALDAY",
 "D1"."SID_0PLANT",
 "X2"."S__0MATL_GROUP",
 "DU"."SID_0LOC_CURRCY",
 "DU"."SID_Z_UOM_EA",
 "DU"."SID_Z_UOM_FT",
 "DU"."SID_Z_UOM_LB"

Execution Plan
 -----------------------------------------------------------------------------------------------------------------------------------------
 | Id  | Operation                                 | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
 -----------------------------------------------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                          |                             |   440 | 84040 |   128K  (1)| 00:25:39 |       |       |

[ ... ]

|* 21 |        HASH JOIN                          |                             |   440 | 64680 |   127K  (1)| 00:25:28 |       |       |
 |* 22 |         TABLE ACCESS FULL                 | /BIC/DZICSTKBALT            |  3309 | 59562 |     7   (0)| 00:00:01 |       |       |
 |  23 |         PARTITION RANGE ALL               |                             | 48550 |  6116K|   127K  (1)| 00:25:28 |     1 |   90  |
 |  24 |          TABLE ACCESS BY LOCAL INDEX ROWID| /BIC/EZICSTKBAL             | 48550 |  6116K|   127K  (1)| 00:25:28 |     1 |   90  |
 |  25 |           BITMAP CONVERSION TO ROWIDS     |                             |       |       |            |          |       |       |
 |  26 |            BITMAP AND                     |                             |       |       |            |          |       |       |
 |  27 |             BITMAP MERGE                  |                             |       |       |            |          |       |       |
 |  28 |              BITMAP KEY ITERATION         |                             |       |       |            |          |       |       |
 |  29 |               BUFFER SORT                 |                             |       |       |            |          |       |       |
 |* 30 |                VIEW                       | index$_join$_025            |     3 |    21 |     3  (34)| 00:00:01 |       |       |
 |* 31 |                 HASH JOIN                 |                             |       |       |            |          |       |       |
 |* 32 |                  INDEX RANGE SCAN         | /BIC/DZICSTKBAL101          |     3 |    21 |     2  (50)| 00:00:01 |       |       |
 |  33 |                  INDEX FAST FULL SCAN     | /BIC/DZICSTKBAL1~0          |     3 |    21 |     1   (0)| 00:00:01 |       |       |
 |* 34 |               BITMAP INDEX RANGE SCAN     | /BIC/EZICSTKBAL~04          |       |       |            |          |     1 |   90  |
 |  35 |             BITMAP MERGE                  |                             |       |       |            |          |       |       |
 |  36 |              BITMAP KEY ITERATION         |                             |       |       |            |          |       |       |
 |  37 |               BUFFER SORT                 |                             |       |       |            |          |       |       |
 |* 38 |                TABLE ACCESS BY INDEX ROWID| /BIC/DZICSTKBALP            |     1 |     8 |     3   (0)| 00:00:01 |       |       |
 |* 39 |                 INDEX RANGE SCAN          | /BIC/DZICSTKBALP03          |     2 |       |     1   (0)| 00:00:01 |       |       |
 |* 40 |               BITMAP INDEX RANGE SCAN     | /BIC/EZICSTKBAL~01          |       |       |            |          |     1 |   90  |
 |  41 |             BITMAP MERGE                  |                             |       |       |            |          |       |       |
 |  42 |              BITMAP KEY ITERATION         |                             |       |       |            |          |       |       |
 |  43 |               BUFFER SORT                 |                             |       |       |            |          |       |       |
 |  44 |                TABLE ACCESS FULL          | SYS_TEMP_0FD9DDDB8_15EC3B48 |     1 |    13 |     2   (0)| 00:00:01 |       |       |
 |* 45 |               BITMAP INDEX RANGE SCAN     | /BIC/EZICSTKBAL~05          |       |       |            |          |     1 |   90  |
 |  46 |     TABLE ACCESS FULL                     | SYS_TEMP_0FD9DDDB8_15EC3B48 | 10839 | 86712 |     7   (0)| 00:00:01 |       |       |
 -----------------------------------------------------------------------------------------------------------------------------------------

Here we can see what the customer was complaining about: Oracle decides to read through ALL partitions, although we’ve seen that the BW query clearly contained the information about the time dimension.

So why is this happening?

Oracle does nothing wrong here!
If you look closely, then you see that the second statement didn’t contain a selection criterion on the partitioning column.

Do we have a bug in the BW OLAP code here?

Actually not!
Would these be BW queries on standard info cubes, we would have a problem here.
Especially since we have TWO queries against the SAME table!!!

Fact is: this is not a query on a standard infocube.
This is a query against a so-called inventory or non-cumulative cube.

Inventory Cubes?

Inventory Infocubes are a special data design implemented in SAP BW that allow analyzing stock information. Since you cannot just add up the amount of stock over time, such information required special handling by the OLAP processor.

So, what is done in this case is to store reference stock information (“markers”) and in addition to these the relative stock movements. That way OLAP can take the marker numbers and calculate backwards the amount of stock that was present for the given reference point in time.

This is of course far too short as an explanation, so make sure to check out sap note #1548125 for a far better one.

The consequence of this data design is that the OLAP engine needs to read the data in two steps and it does this indeed.
That’s why we see four SQL queries in this case (instead of the usual two SELECT statements against F- and E-fact table)

1. Read of movement data from F- and E-fact table

and

2. Read of “marker” data from F- and E-fact table

Due to the nature of the “marker” data, OLAP needs to catch up ALL of them, regardless of the time restriction.
Therefore, no WHERE condition for the time dimension is created in the second kind of statements.

Since the E-fact table, in this case, was partitioned by the time dimension, partition pruning would require having such a restricting WHERE condition.
But as this is missing, no partition pruning can be done  – which is correct in this case.

Let’s have a closer look at the statements and the execution plans.
Review query 1 above. This is one of the queries issued by OLAP for gathering all the movement data.

Please note the restrictions on the time dimensions, especially the explicit condition on “F”.”SID_0CALMONTH” which allows for the partition pruning to take place.
Also see, that OLAP here is reading the movement data, denoted by the
condition  “DP”.”SID_0RECORDTP”  = 0.

As we see, in the execution plan above, ORACLE uses the restriction
on the partition key and only reads a single partition.

Now, let’s check the query 2 again.

With this query, OLAP reads the “marker” data (or “reference points”) and we can see this by the predicate “DP”.”SID_0RECORDTP” = 1.

Important here is to see, that there aren’t any restrictions on the
time dimension left (except the fiscal variant, for rather formal
reasons). So, there is no way for Oracle to avoid any partition of the
E-fact table.
 

Room for improvement!?

Ok, so far we’ve already learned a lot about the way OLAP gets data from the database when dealing with NCUM cubes.
But, as you may have already guessed, there’s still more to it.

The marker information always represents the most current state of data. No movement is done after the marker information. So basically, if a query should return the stock amount from a day in the past, OLAP takes the marker and calculates backwards.
Another way to put this is to say: there cannot be any information more recent or with a ‘higher‘ value in the time dimension. And in fact, marker information is always stored with the maximum/infinitive date 31.12.9999.

Now, since the E-fact table can only be partitioned by the time dimension, it’s assured that the marker data will always end up being stored in the MAXVALUE partition.

So, in theory, it would be possible to create a statement that delivers the very same data to OLAP and still allows for the partition pruning to take place. For that it would ‘just’ be necessary to add the somewhat artificial selection predicate “F”.”SID_0CALMONTH” = 99991231 to the query.

I write “in theory” because this feature is not present in the OLAP-DBIF layer right now. Of course, we’re discussing this internally, but whether or not this is going to be realized is completely open (for many good and valid reasons).

Once again, a rather lengthy blog post about some rather not so simple topics, but I still hope that you’ve enjoyed reading it and that you’ve learned a bit or byte.

 

BW/DB Support 101 – Get to know your failing SQL statement

Lately, I’ve been confronted with several customer messages concerning problems with Infocube compression, change runs or roll-ups.
For some reason, these standard BW actions, that worked the day before, now started to fail.

The first step in cases like these is, of course, to find out, why the actions fail.
The usual starting point for the investigation besides the application messages or logs is the system log (SM21).

In most cases I found messages like the following in the system log:

Time     Type Nr  Clt User       TCode   Priority Grp N Text
[...]
21:52:46 BTC  020 001 ALEREMOTE                   BY  2 Database error 12801 at EXE
21:52:46 BTC  020 001 ALEREMOTE                   BY  0 > ORA-12801: error signaled in parallel query server
21:52:46 BTC  020 001 ALEREMOTE                   BY  0 > P000#ORA-01438: value larger than specified precision allowe
21:52:46 BTC  020 001 ALEREMOTE                   BY  0 > for this column
21:52:46 BTC  020 001 ALEREMOTE                   BY  1 Database error 12801
[...] There was a database problem for the batch work process nr. 20 short before 10 at night. And we can already see what kind of problem there was, but we've to have a sharp eye to see that there are in fact two error messages present here:
ORA-12801: error signaled in parallel query server P000
ORA-01438: value larger than specified precision allowed for this column

The first line tells us, that there was a problem during a parallel query that led to the abortion of one of the parallel query server processes, P000 in this case.
The second line gives us more information about the kind of problem: it was tried to insert or update a column in a table with a value that exceeds the data type definition of this column.
So, this is the root cause of the compression run to fail.
There was some kind of overflow.
But in which table and for which column?

For that to know, we must dig deeper and take a look at the work process developer trace for work process 20.
In transaction ST11 we open the file dev_w20 and look out for our error by searching for the timestamp “21:52:46”.
Sure enough, we find this in there:

C Wed Feb  2 21:52:46 2011
C     OCIStmtExecute() failed with -1=OCI_ERROR
C     SQL error 12801:
C  *** ERROR => Error 12801 in stmt_execute() from oci_execute_stmt(), orpc=0
 [dbsloci.c    13103]
C  *** ERROR => ORA-12801 occurred when executing SQL stmt (parse error offset=0)
 [dbsloci.c    13105]
C  sc_p=0x10751b950,no=214,idc_p=(nil),con=0,act=1,slen=2245,smax=2304,#vars=0,stmt=0x1094c2cb0,table=
C  MERGE/*+ USE_NL ( FACT E ) INDEX( E,"/BIC/E100481~P" ) */ INTO "/BIC/E100481" E USING ( SELECT  /*+ 
C  PARALLEL ( FACT , 3 ) */0 "PDIMID" , "KEY_100481T" , "KEY_1004811" , "KEY_1004812" , "KEY_1004813" ,
C   "KEY_1004814" , "KEY_1004815" , "KEY_1004816" , "KEY_1004817" , "KEY_1004818" , "FACTCOUNT" , "TCTD
C  BSEL" , "TCTDBTRANS" , "TCTQUCOUNT" , "TCTTIMEALL" , "TCTTIMEDM" , "TCTTIMEFE" , "TCTTIMENA" , "TCTT
C  MEOLAP" , "TCTTMEPLAN" , "TCTWTCOUNT" FROM "/BIC/F100481" FACT WHERE "KEY_100481P" =1476 ) F ON ( E.
C  "KEY_100481P" = "PDIMID" AND E."KEY_100481T" = F."KEY_100481T" AND   E."KEY_1004811" = F."KEY_100481
C  1" AND   E."KEY_1004812" = F."KEY_1004812" AND   E."KEY_1004813" = F."KEY_1004813" AND   E."KEY_1004
C  814" = F."KEY_1004814" AND   E."KEY_1004815" = F."KEY_1004815" AND   E."KEY_1004816" = F."KEY_100481
C  6" AND   E."KEY_1004817" = F."KEY_1004817" AND   E."KEY_1004818" = F."KEY_1004818" ) WHEN NOT MATCHE
C  D THEN INSERT ( E."KEY_100481P" , E."KEY_100481T" , E."KEY_1004811" , E."KEY_1004812" , E."KEY_10048
C  13" , E."KEY_1004814" , E."KEY_1004815" , E."KEY_1004816" , E."KEY_1004817" , E."KEY_1004818" , E."F
C  ACTCOUNT" , E."TCTDBSEL" , E."TCTDBTRANS" , E."TCTQUCOUNT" , E."TCTTIMEALL" , E."TCTTIMEDM" , E."TCT
C  TIMEFE" , E."TCTTIMENA" , E."TCTTMEOLAP" , E."TCTTMEPLAN" , E."TCTWTCOUNT" ) VALUES ( "PDIMID" , F."
C  KEY_100481T" , F."KEY_1004811" , F."KEY_1004812" , F."KEY_1004813" , F."KEY_1004814" , F."KEY_100481
C  5" , F."KEY_1004816" , F."KEY_1004817" , F."KEY_1004818" , F."FACTCOUNT" , F."TCTDBSEL" , F."TCTDBTR
C  ANS" , F."TCTQUCOUNT" , F."TCTTIMEALL" , F."TCTTIMEDM" , F."TCTTIMEFE" , F."TCTTIMENA" , F."TCTTMEOL
C  AP" , F."TCTTMEPLAN" , F."TCTWTCOUNT" ) WHEN MATCHED THEN UPDATE /*+ INDEX("/BIC/E100481" "/BIC/E100
C  481~P") */ SET E."FACTCOUNT" = E."FACTCOUNT" + F."FACTCOUNT", E."TCTDBSEL" = E."TCTDBSEL" + F."TCTDB
C  SEL", E."TCTDBTRANS" = E."TCTDBTRANS" + F."TCTDBTRANS", E."TCTQUCOUNT" = E."TCTQUCOUNT" + F."TCTQUCO
C  UNT", E."TCTTIMEALL" = E."TCTTIMEALL" + F."TCTTIMEALL", E."TCTTIMEDM" = E."TCTTIMEDM" + F."TCTTIMEDM
C  ", E."TCTTIMEFE" = E."TCTTIMEFE" + F."TCTTIMEFE", E."TCTTIMENA" = E."TCTTIMENA" + F."TCTTIMENA", E."
C  TCTTMEOLAP" = E."TCTTMEOLAP" + F."TCTTMEOLAP", E."TCTTMEPLAN" = E."TCTTMEPLAN" + F."TCTTMEPLAN", E."
C  TCTWTCOUNT" = E."TCTWTCOUNT" + F."TCTWTCOUNT";
C  sc_p=0x10751b950,no=214,idc_p=(nil),con=0,act=1,slen=2245,smax=2304,#vars=0,stmt=0x1094c2cb0,table=
C  prep=0,lit=0,nsql=1,lobret=0,#exec=1,dbcnt=0,upsh_p=(nil),ocistmth_p=0x1082889b0
C  IN : cols=0,rmax=1,xcnt=1,rpc=0,rowi=0,rtot=1,upto=4294967295,rsize=0,vmax=32,bound=0,iobuf_p=(nil),vda_p=0x108289000
C       lobs=0,lmax=0,lpcnt=0,larr=(nil),lcurr_p=(nil),rret=0
C  OUT: cols=4,rmax=16384,xcnt=0,rpc=0,rowi=0,rtot=1,upto=0,rsize=16,vmax=32,bound=0,iobuf_p=(nil),vda_p=0x108289790
C       lobs=0,lmax=0,lpcnt=0,larr=(nil),lcurr_p=(nil),rret=0
C  MERGE/*+ USE_NL ( FACT E ) INDEX( E,"/BIC/E100481~P" ) */ INTO "/BIC/E100481" E USING ( SELECT  /*+ 
C  PARALLEL ( FACT , 3 ) */0 "PDIMID" , "KEY_100481T" , "KEY_1004811" , "KEY_1004812" , "KEY_1004813" ,
C   "KEY_1004814" , "KEY_1004815" , "KEY_1004816" , "KEY_1004817" , "KEY_1004818" , "FACTCOUNT" , "TCTD
C  BSEL" , "TCTDBTRANS" , "TCTQUCOUNT" , "TCTTIMEALL" , "TCTTIMEDM" , "TCTTIMEFE" , "TCTTIMENA" , "TCTT
C  MEOLAP" , "TCTTMEPLAN" , "TCTWTCOUNT" FROM "/BIC/F100481" FACT WHERE "KEY_100481P" =1476 ) F ON ( E.
C  "KEY_100481P" = "PDIMID" AND E."KEY_100481T" = F."KEY_100481T" AND   E."KEY_1004811" = F."KEY_100481
C  1" AND   E."KEY_1004812" = F."KEY_1004812" AND   E."KEY_1004813" = F."KEY_1004813" AND   E."KEY_1004
C  814" = F."KEY_1004814" AND   E."KEY_1004815" = F."KEY_1004815" AND   E."KEY_1004816" = F."KEY_100481
C  6" AND   E."KEY_1004817" = F."KEY_1004817" AND   E."KEY_1004818" = F."KEY_1004818" ) WHEN NOT MATCHE
C  D THEN INSERT ( E."KEY_100481P" , E."KEY_100481T" , E."KEY_1004811" , E."KEY_1004812" , E."KEY_10048
C  13" , E."KEY_1004814" , E."KEY_1004815" , E."KEY_1004816" , E."KEY_1004817" , E."KEY_1004818" , E."F
C  ACTCOUNT" , E."TCTDBSEL" , E."TCTDBTRANS" , E."TCTQUCOUNT" , E."TCTTIMEALL" , E."TCTTIMEDM" , E."TCT
C  TIMEFE" , E."TCTTIMENA" , E."TCTTMEOLAP" , E."TCTTMEPLAN" , E."TCTWTCOUNT" ) VALUES ( "PDIMID" , F."
C  KEY_100481T" , F."KEY_1004811" , F."KEY_1004812" , F."KEY_1004813" , F."KEY_1004814" , F."KEY_100481
C  5" , F."KEY_1004816" , F."KEY_1004817" , F."KEY_1004818" , F."FACTCOUNT" , F."TCTDBSEL" , F."TCTDBTR
C  ANS" , F."TCTQUCOUNT" , F."TCTTIMEALL" , F."TCTTIMEDM" , F."TCTTIMEFE" , F."TCTTIMENA" , F."TCTTMEOL
C  AP" , F."TCTTMEPLAN" , F."TCTWTCOUNT" ) WHEN MATCHED THEN UPDATE /*+ INDEX("/BIC/E100481" "/BIC/E100
C  481~P") */ SET E."FACTCOUNT" = E."FACTCOUNT" + F."FACTCOUNT", E."TCTDBSEL" = E."TCTDBSEL" + F."TCTDB
C  SEL", E."TCTDBTRANS" = E."TCTDBTRANS" + F."TCTDBTRANS", E."TCTQUCOUNT" = E."TCTQUCOUNT" + F."TCTQUCO
C  UNT", E."TCTTIMEALL" = E."TCTTIMEALL" + F."TCTTIMEALL", E."TCTTIMEDM" = E."TCTTIMEDM" + F."TCTTIMEDM
C  ", E."TCTTIMEFE" = E."TCTTIMEFE" + F."TCTTIMEFE", E."TCTTIMENA" = E."TCTTIMENA" + F."TCTTIMENA", E."
C  TCTTMEOLAP" = E."TCTTMEOLAP" + F."TCTTMEOLAP", E."TCTTMEPLAN" = E."TCTTMEPLAN" + F."TCTTMEPLAN", E."
C  TCTWTCOUNT" = E."TCTWTCOUNT" + F."TCTWTCOUNT";
B  ***LOG BY2=> sql error 12801      performing EXE        [dbds#2 @ 810] [dbds    0810 ]
B  ***LOG BY0=> ORA-12801: error signaled in parallel query server P000
ORA-01438: value larger than specified precision allowed for this column [dbds#2 @ 810] [dbds    0810 ]
B  RECONNECT: added code: 3123
B  RECONNECT: added code: 3127
B  ***LOG BY1=> sql error 12801      [dbacds#2 @ 1433] [dbacds  1433 ]

The problem with an output like this is: it’s completely unreadable for any human being.
Thus, we need to do some re-formatting to get the pure single SQL statement.
In this case, the statement starts with line

C  MERGE/*+ USE_NL ( FACT E ) INDEX( E,"/BIC/E100481~P" ) */ INTO "/BIC/E100481" E USING ( SELECT  /*+ 

and ends with

C  TCTWTCOUNT" = E."TCTWTCOUNT" + F."TCTWTCOUNT";

Everything else does not belong to the statement and should be removed.
The next step is to remove the three characters at the start of each line.
For that, I use the block-selection capability that some text editor programs offer.
I use the free Notepad++ for that.

With it, you can just hold down the ALT key while selecting text to use the block-selection.
Just select the first three characters of all rows and delete them (backspace key).

By now the statement should look like this:

MERGE/*+ USE_NL ( FACT E ) INDEX( E,"/BIC/E100481~P" ) */ INTO "/BIC/E100481" E USING ( SELECT  /*+ 
PARALLEL ( FACT , 3 ) */0 "PDIMID" , "KEY_100481T" , "KEY_1004811" , "KEY_1004812" , "KEY_1004813" ,
 "KEY_1004814" , "KEY_1004815" , "KEY_1004816" , "KEY_1004817" , "KEY_1004818" , "FACTCOUNT" , "TCTD
BSEL" , "TCTDBTRANS" , "TCTQUCOUNT" , "TCTTIMEALL" , "TCTTIMEDM" , "TCTTIMEFE" , "TCTTIMENA" , "TCTT
MEOLAP" , "TCTTMEPLAN" , "TCTWTCOUNT" FROM "/BIC/F100481" FACT WHERE "KEY_100481P" =1476 ) F ON ( E.
"KEY_100481P" = "PDIMID" AND E."KEY_100481T" = F."KEY_100481T" AND   E."KEY_1004811" = F."KEY_100481
1" AND   E."KEY_1004812" = F."KEY_1004812" AND   E."KEY_1004813" = F."KEY_1004813" AND   E."KEY_1004
814" = F."KEY_1004814" AND   E."KEY_1004815" = F."KEY_1004815" AND   E."KEY_1004816" = F."KEY_100481
6" AND   E."KEY_1004817" = F."KEY_1004817" AND   E."KEY_1004818" = F."KEY_1004818" ) WHEN NOT MATCHE
D THEN INSERT ( E."KEY_100481P" , E."KEY_100481T" , E."KEY_1004811" , E."KEY_1004812" , E."KEY_10048
13" , E."KEY_1004814" , E."KEY_1004815" , E."KEY_1004816" , E."KEY_1004817" , E."KEY_1004818" , E."F
ACTCOUNT" , E."TCTDBSEL" , E."TCTDBTRANS" , E."TCTQUCOUNT" , E."TCTTIMEALL" , E."TCTTIMEDM" , E."TCT
TIMEFE" , E."TCTTIMENA" , E."TCTTMEOLAP" , E."TCTTMEPLAN" , E."TCTWTCOUNT" ) VALUES ( "PDIMID" , F."
KEY_100481T" , F."KEY_1004811" , F."KEY_1004812" , F."KEY_1004813" , F."KEY_1004814" , F."KEY_100481
5" , F."KEY_1004816" , F."KEY_1004817" , F."KEY_1004818" , F."FACTCOUNT" , F."TCTDBSEL" , F."TCTDBTR
ANS" , F."TCTQUCOUNT" , F."TCTTIMEALL" , F."TCTTIMEDM" , F."TCTTIMEFE" , F."TCTTIMENA" , F."TCTTMEOL
AP" , F."TCTTMEPLAN" , F."TCTWTCOUNT" ) WHEN MATCHED THEN UPDATE /*+ INDEX("/BIC/E100481" "/BIC/E100
481~P") */ SET E."FACTCOUNT" = E."FACTCOUNT" + F."FACTCOUNT", E."TCTDBSEL" = E."TCTDBSEL" + F."TCTDB
SEL", E."TCTDBTRANS" = E."TCTDBTRANS" + F."TCTDBTRANS", E."TCTQUCOUNT" = E."TCTQUCOUNT" + F."TCTQUCO
UNT", E."TCTTIMEALL" = E."TCTTIMEALL" + F."TCTTIMEALL", E."TCTTIMEDM" = E."TCTTIMEDM" + F."TCTTIMEDM
", E."TCTTIMEFE" = E."TCTTIMEFE" + F."TCTTIMEFE", E."TCTTIMENA" = E."TCTTIMENA" + F."TCTTIMENA", E."
TCTTMEOLAP" = E."TCTTMEOLAP" + F."TCTTMEOLAP", E."TCTTMEPLAN" = E."TCTTMEPLAN" + F."TCTTMEPLAN", E."
TCTWTCOUNT" = E."TCTWTCOUNT" + F."TCTWTCOUNT";

To do this, simple mark all lines and run two “search-and-replace” actions on this block.

  1. Replace all “” with an empty string, a.k.a. nothing
  2. Switch to extended search mode and replace all
    with an empty string.
    In Notepad++ “
    ” stands for CR+LF!

Now we have the whole statement in one single line.

MERGE/*+ USE_NL ( FACT E ) INDEX( E,"/BIC/E100481~P" ) */ INTO "/BIC/E100481" E USING ( SELECT  /*+ PARALLEL ( FACT , 3 ) */0 "PDIMID" , "KEY_100481T" , "KEY_1004811" , "KEY_1004812" , "KEY_1004813" , "KEY_1004814" , "KEY_1004815" ...

This is not yet better readable.
But we can use it to feed it to some automatic SQL formatting tool.
Most SQL editor programs offer this function (e.g. Oracle SQL Developer or SAP MaxDB Database Studio) and there are also free Web-tools like Instant SQL Formatter that can be used for this.

Finally, we end up with the formatted statement (due to the crappy text formatting options of the SDN blog editor I had to paste this as picture files):

Now we have a large. but readable MERGE statement.
And with this statement, we can start to figure out, for which columns an overflow possible can happen.

How this is done, is explained in our new SAP note
#1557321 – Numeric Overflow during SAP BW-standard actions

Ok, I hope you liked this blog and that the re-formatting technique is useful for you the next time you’ve to analyze SQL related problems.

Best regards,
Lars

Small changes with big effects

For many BW performance relevant DB-features, it’s really all about the details of usage and implementation when it’s about the effect of those features.

The following are two examples of rather small things that went wrong which had a big impact on the system performance.

Two examples for BW on MaxDB

The first two examples are both good examples for the inherent assumptions that developers make during development.
Number 1 goes like this:

A customer has chosen MaxDB as the database platform for its SAP BW instance.
In addition to that, the customer decided to go for a BIA, which is quite a clever choice, if you ask me.
Instead of having a super-expensive and maintenance intensive BW main database that maybe still would require the setup of a BIA, this customer now runs a low-cost low-maintenance main database and the performance intensive reporting out of the expensive but also low-maintenance BIA.

Unfortunately, it looks like nobody anticipated this combination to become popular.
Otherwise, I assume report RSDDTREX_MEMORY_ESTIMATE would have been tested with MaxDB as well.

This report is used to get an estimation of the required memory for the BIA usage.
It’s not too complicated and merely consists of taking the number of rows in an InfoCube and multiply this with the InfoObjects data lengths and some “magic” constants.
So far nothing special.

What’s “special” is that this report still makes use of the nowadays abandoned fact-views from BW3.x-times.
Fact-views make it possible to access the data in both E- and F-fact table at once, by concatenating the sets with a UNION ALL.
That means, fact-views basically look like this:

CREATE VIEW "/BIC/V..." AS
(
SELECT col1, col2, ...
  FROM "/BIC/F...."
UNION ALL
  SELECT col1, col2, ...
  FROM "/BIC/F...."
)

From the ABAP side, this eases the access since you now just have to run one query to get access to all data in an InfoCube.
Our report does the same and runs this statement:

SELECT
 count(*)
FROM
  "/BIC/VMYCUBE"

The readers with some MaxDB experience might think now:
“That’s great! MaxDB has its file counter statistics and a special COUNT(*) optimization that avoids table/index access for counting!”
And those readers are correct!

Unfortunately the COUNT(*) optimization has a severe limitation: it only works for simple statements.
That means:

  • no WHERE condition (!),
  • no JOINs (!),
  • no UNIONS/SET OPERATIONS (!),
  • no GROUP BY/ORDER BY (!)

In reality, it means NO NOTHING, just the COUNT(*).

The fact-view used here, therefore, couldn’t take advantage of this optimization and had to do the counting via the brute-force-traditional approach: read the whole first table, read the second whole table, combine the results and count the number of rows.

The execution plan for such an IO/CPU burning process looks like this:

OWNER    TABLENAME         STRATEGY                            PAGECOUNT
SAPXXX   /BIC/FMYCUBE      TABLE SCAN                                  1
SAPXXX   /BIC/EMYCUBE      TABLE SCAN                            1194819
INTERNAL TEMPORARY RESULT  TABLE SCAN                                  1
         SHOW                RESULT IS COPIED, COSTVALUE IS     10653812
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

The runtime of this little monster was 3 days and running … until the database couldn’t keep the huge temporary result set of approx. 10 Mio. pages (ca. 76 GB) anymore. The report finally dumped with the infamous

“POS(1) Space for result tables exhausted”

Ouch!

Fortunately, the report already was prepared to handle the request without a fact view, but it wasn’t enabled for MaxDB yet.
This was quickly done after a short discussion with the responsible IMS colleague and correction note
#1533676 – Long runtime of program RSDDTREX_MEMORY_ESTIMATE
was created.

The execution plans afterwards looked like this:

OWNER    TABLENAME         STRATEGY                           PAGECOUNT
SAPXXX   /BIC/FMYCUBE      TABLE SCAN                                  1
                           COUNT OPTIMIZATION
         SHOW                RESULT IS COPIED, COSTVALUE IS            2
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

and

OWNER    TABLENAME         STRATEGY                           PAGECOUNT
SAPXXX   /BIC/EMYCUBE      TABLE SCAN                            1194819
                           COUNT OPTIMIZATION
         SHOW                RESULT IS COPIED, COSTVALUE IS            2
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

And the total runtime of the report went down to a few hours (there is other stuff in there that just takes some time).

(Remark: important to understand for MaxDB execution plans is that only the COSTVALUE represents an optimizer estimation. All other PAGECOUNT values refer to the TOTAL number of pages the table or index of this specific line allocates in the database!)

If you look at the sap note with the correction, you’ll find that it was a very small change that made the difference:

From this:

     IF sy-dbsys <> 'DB400'.
       APPEND g_v_tablnm_v TO g_t_tablnm.
     ELSE.
       APPEND g_v_tablnm_e TO g_t_tablnm.
       APPEND g_v_tablnm_f TO g_t_tablnm.

to this:

IF sy-dbsys = 'DB400' OR sy-dbsys = 'ADABAS D'.
       APPEND g_v_tablnm_e TO g_t_tablnm.
       APPEND g_v_tablnm_f TO g_t_tablnm.
     ELSE.
       APPEND g_v_tablnm_v TO g_t_tablnm.

Knock, knock, any data in there?

The second example is not only ‘special’ on the MaxDB port but on all databases.
However, for MaxDB, the effect was the worst, due to certain limitations of SQL optimization.

SAP BW is a data warehouse and therefore a lot of the functionality is there to handle data, to store and move data and to get rid of data.
These tasks bring with them the necessity to sometimes drop a table and rebuild it, e.g. when you change an InfoObject-definition.

But before merely dropping tables, BW is cautious and asks“Hey, any data in this table?”.
And indeed, there is a function module called RSDU_DATA_EXISTS_TABLE that answers this question.

Now, before proceeding, ask yourself: how would YOU try to answer this question in SQL?
A common first approach would be: count the number of rows in the table and if it’s larger then 0 then there is some data in the table.
Correct!
But given the fact that counting the actual number of rows in a table really can take ages (see the example above), this is the second worst idea to approach the issue (and I admit that it was also the first I thought up).

The worst idea I’ve seen so far is what was actually implemented in the function module:

SELECT bname FROM usr01 CLIENT SPECIFIED UP TO 1 ROWS INTO :test
     WHERE EXISTS ( SELECT * FROM (i_tablnm) CLIENT SPECIFIED ).
  ENDSELECT.

Let’s see if we can figure out, what this statement should do.
In English it means:

  • Give me the column BNAME
  • from the table USR01 for at most one row
  • for which the set of all rows in table I_TABLNM (this is the one we want to know whether it’s empty or not) contains something.

This is just amazing!

As you can imagine, MaxDB will first create a temporary result set for the exists clause (that is full table copy) and then returns just one row.
If the I_TABLNM table is not empty, this can easily become a similar problem as the example above.

Now, of course, there is a much better way to do this.
If you think about it, all we want is a YES (there’s data in there) or a NO (nope, all empty) and this can be done as well as SAP note #1542839 – “Performance Optimization in RSDU_DATA_EXISTS_TABLE_ADA” nicely demonstrates:

SELECT 'X' FROM (i_tablnm) WHERE ROWNUM <= 1

This means: “Database, go and get me an X for the first row that you hit on the table and stop afterwards!”
Regardless how you process this statement, in the worst case it will end after a few (1-4) page visits.
The database may even use an index-only strategy since NO data from the table needs to be fetched (just a constant).

There are of course similar examples for other DBMS as well, but for the sake of a digestible blog post size, I’ll save them for later posts.

BW on Oracle: a performance hunch I see more and more often…

Ok, once again reproducing a problem takes a lot of (waiting-)time for the query to finish.

In the meantime I can also just write a super-short-blog post about a pretty common BW-on-Oracle performance hunch:
Inappropriate database optimizer statistics.
(yes, ‘inappropriate’ not ‘outdated’!)

Usually, that means: if everything runs as it is supposed to, BW will take care of collecting the CBO statistics whenever large changes to the data are performed (e.g. data had been loaded or condensed, etc.).
Of course, the BW admin has to set this up in process chains or as attributes to info providers but usually, this works well.

If these BW-measures somehow “forget” to update the statistics when new statistics would be required, there is still the daily “Update Optimizer Statistics” (brconnect -f stats) run that should cover this.

The big exception

Unfortunately, there is one setting available that prevents the update of new statistics:
the exception table DBSTATC.

In this table, you can setup statistic gathering parameters that deviate from the default built-in behaviour of brconnect. (This behaviour is not too complex, but really does cover most SAP specifics. See the documentation for details here).

One of the most used exceptions here is to avoid gathering new statistics at all.
This can be useful, e.g. when a table is very volatile and you don’t want to have your access plans changed all the time because the statistics sometimes report a small/empty and sometimes a large table.
(For such case SAP even provides pre-defined table statistics, check sap note 756335 on that)

But what does this have to do with BW?

Well, before brconnect there had been a different tool called SAPDBA and this tool wasn’t able to cope with BW objects.
Therefore BW development decided to build its own statistics update coding and to prevent SAPDBA from messing around with the BW tables. And for that they created entries in DBSTATC with ACTIVE=I automatically.

A comfortable solution

This is, of course, all long past now, but these entries seem to have survived many system upgrades.
In order to get rid of them, there’s a report available: SAP_DBSTATC_CLEANUP.
With it you can display and delete the DBSTATC entries for single infocubes or all of them.
Looking at the sap note that introduces this report (SAP note 129252) you’ll find that this had been around for quite a while now 🙂

The recommendation is to remove all the DBSTATC entries for BW objects so that the standard statistic gathering can be performed.
SAP note 1013912 – “FAQ: Oracle BW Performance” already contained this recommendation and from today on, it also contains the report name …

And yes, the solution is to remove the entries and not to set ACTIVE=A or something like that. DBSTATC really is all about defining an exception. If the tables should be handled the standard way, then DBSTATC should not contain any entry for them!

Ok, the customer report is through… back to the real work!

Best regards,
Lars

+++ UPDATE added 22.10.2011 +++

In addition to the SAP_DBSTATC_CLEANUP report mentioned above, it’s also important to make sure that report SAP_ANALYZE_ALL_INFOCUBES isn’t used for statistics gathering anymore.

This report calls a function RSDU_DBSTATC_EXIT for each analyzed infocube and thereby the DBSTATC entries are recreated again!

+++ UPDATE added 22.10.2011 +++

A first (BW) issue…

In the second week working in BW support, I noticed that there was a problem reported by customers again and again.
The customers complained about “database statistics update fails in process chains” or “brconnect run returns errors”.
When checking the relevant log file entries for the actions there was always something like

ERROR at line 1:
ORA-20000: index "SAPR3"."/BIC/F100069~900"  or partition of such index is in unusable state
ORA-06512: at "SYS.DBMS_STATS", line 15181
ORA-06512: at "SYS.DBMS_STATS", line 15203
ORA-06512: at line 1

The more database savvy customers or DBAs sometimes had a quick resolution at hand: to rebuild the index.

And, indeed, after rebuilding the affected index (as I realized it was nearly always a 900 index) the statistics could be correctly gathered.
Problem solved, or not?

Well, only until the next compression run for the infocube happened.

Infocube compression, wait, like infocube.zip or what?

The term ‘compression’ is a bit misleading here and ‘to condense’ seems more appropriate.
To understand this action, we’ve to take a look at how fact table data is managed in SAP BW.
If you’re yet unfamiliar with the BW-star schema (extended star schema) it might be a good idea to check the documentation or the BW related blogs and WIKIs here at SDN.

The fact tables are used to store the actual business data that should be reported on to. In the simplest case, this data is loaded from the source system (e.g. the ERP system) on a regular basis, say daily.
So, we get new data in more or less large chunks or packets.
The BW term for this is ‘request‘.
A data request is a specific chunk of data that only has one thing in common: it has been loaded into the fact table together. [more on this]

Requests and what they are good for

This request wise processing provides several options that are otherwise quite hard to achieve in data warehouse systems:

  • we can load just the data that had been changed since the last load (delta loading)
  • we can check the data after each load for technical and quality issues and decide whether or not it should occur in the reporting
  • in case some of the data was not imported correctly, we can easily delete it again, without impairing the reports that use the already available data.

If you think about it, this means, that it’s totally possible to load information about the same business objects (let’s say: direct sales to high-level customers in southern Germany) can be loaded several times.´

If you load the sales transaction data every day to, later on, report it on a weekly base then you’ll get the sum of all the sales aggregated over a week – and use data of 7 requests work of data (1 request per day, 7 days a week).

But as we see, for our reporting requirement (sales on weekly basis) it’s actually not necessary to keep all the data load packets (requests).
Once we are sure about that the data is technically and quality wise OK, then we might just as well sum up the data to the weekly level, store this and throw away the 7 requests.
This is what the compression of Infocubes pretty much is about.
In SAP BW this is implemented based on two tables:
the F-fact table to which all data load requests go and the E-fact table, which contains the information for the pre-aggregated (condensed) data. [more on this]

Getting closer…

On reporting time, the SAP BW OLAP engine knows about the fact that our data is stored in two tables.
So for every BW query against an Infocube, we usually see TWO nearly identical SQL statements, that only differ in the fact table that is actually used.

Now we have:

  • two tables,
  • we have data that needs to be read and aggregated request-wise from the F-fact table,
  • we have the aggregation result that has to be saved in the E-fact table
  • we’ve data that afterwards needs to be deleted from the F-fact table – otherwise some numbers would be included twice in our report!

Looking at this database requirements, there’s an Oracle feature available, that really is made for it (it really is ;-)): PARTITIONING!

Without partitioning the final deletion of already condensed requests would require to

  • scan at least one full index in order to find all rows matching the requests to be deleted,
  • remove them
  • and update the indexes afterwards.

With partitioning, all we’ve to do is to drop the partitions that contain our request.
That’s the reason why the F-fact tables are always partitioned based on the request dimension (on DBMSes that support partitioning, of course)

So, we can easily get rid of data, when the table is partitioned the right way.

But what about the indexes?

There’s a treat in Oracle available for that as well: local partitioning of indexes.
Simply put this means: for every table partition a corresponding partition is created for the partitioned index.

With this, we don’t even have to rebuild the indexes after dropping a table partition.
All we’ve to do is to also drop the corresponding index partition together with the table partition.
The remaining index will still be completely correct and will still cover all data in the table.

Ok, now we arrived at the start of the problem 😉

All this sounds quite good.
In fact, it’s great!
And (of course) here begins the problem.

This great combination of clever data design, implementation and database feature exploitation only works properly if the indexes really are partitioned exactly as the table is.
So BW has to take care of this since Oracle allows to create indexes with a different partitioning scheme or without partitioning as well.
If this is the case and we drop table partitions, then Oracle would have to read every row of the dropped partition to know every deleted row and take this information to maintain the index data.
Obviously, this would render the partition advantage null and void.
So, Oracle simply flags all indexes for which the same drop partition cannot be performed as UNUSABLE.

Such UNUSABLE indexes can be repaired simply be rebuilding them.
The Oracle cost-based optimizer is clever enough to ignore those indexes (also see Oracle parameter “skip_unusable_indexes“), so queries will not fail because of this.

Except… except we force Oracle to use the broken index by using a hint.

“Where the heck do we do such stuff?” – is that the question you’ve got in mind right now?

Well, we do it every time you run the update statistics job.
And collecting CBO statistics after deleting lots of data from a central fact table in BW is usually done as part of the data loading process (chain).

In Oracle, update statistics means calling a PL/SQL stored procedure in the DBMS_STATS package. And in there, Oracle will run SQL statements like this:

select /*+ no_parallel_index(t,"/BIC/F100069~900") dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_expand index(t,"/BIC/F100069~900") */ count(distinct sys_op_lbid(15948,'R',t.rowid)) as nrw ,count(distinct sys_op_lbid(15948,'L',t.rowid)) as nlb ,count(distinct "KEY_IC_0025P") as ndk ,null as clf from "SAPR3"."/BIC/F100069" t where TBL$OR$IDX$PART$NUM("SAPR3"."/BIC/F100069",0,4,0,"ROWID") = :objn

No reason to understand the whole statement for now, but the important part here is the INDEX hint.
With this hint, Oracle is forced to use the index (for which statistics should be collected).
If our index is UNUSABLE at this point in time, then Oracle has no other choice as to report:

ERROR at line 1:
ORA-20000: index "SAPR3"."/BIC/F100069~900"  or partition of such index is in unusable state
ORA-06512: at "SYS.DBMS_STATS", line 15181
ORA-06512: at "SYS.DBMS_STATS", line 15203
ORA-06512: at line 1

LOOOONG story short.

Mashing all this together:

WHEN
        we've a partitioned table with a non-partitioned index
AND
       we drop a partition of the table, so that the non-partitioned index is flagged UNUSABLE
AND
       when we finally run an update statistics on this table,
THEN
       we end up with our ORA-20000 error message.

Why do I tell you this, when BW does take care of the database objects so well?

Because sometimes it fails to do so.
One reason for such a failure is a bug that has been introduced with an SAP Basis SP.
This bug is already fixed with SAP note 1479683 but there are many customers around who haven’t yet installed the note and who are just facing miraculous errors in their info cube compression process chains.

As you can see the connection between the ORA-20000 error message about the unusable index and the real cause is rather a long distance one, although straightforward once you understand it.

The solution (ahem… TATAAA!)

To finally get rid of this problem, I wrote note #1513510.

In there you’ll find that you have to
a) import the fix for the SAP Basis bug (note #1479683)
b) recreate (NOT just rebuild) the indexes

For step b) the easiest and quickest way is to use the same function modules that SAP BW does use for this task.

Excursion… Aggregates are just little infocubes…

In note #1513510 I included an SQL statement to find all partitioned tables for which there are non-partitioned indexes so that the DBA or BW-Poweruser can look up the problematic ones without having to wait for process chains to fail.
The statement looks like this:

select
    /*+
    no_parallel_index(t,"/BIC/F100069~900")
    dbms_stats
    cursor_sharing_exact
    use_weak_name_resl
    dynamic_sampling(0)
    no_monitoring
    no_expand
    index(t,"/BIC/F100069~900")
    */
 count(distinct sys_op_lbid(15948,'R',t.rowid)) as nrw
,count(distinct sys_op_lbid(15948,'L',t.rowid)) as nlb
,count(distinct "KEY_IC_0025P") as ndk
,null as clf
from
    "SAPR3"."/BIC/F100069" t
where
    TBL$OR$IDX$PART$NUM("SAPR3"."/BIC/F100069",0,4,0,"ROWID") = :objn

If you run this statement you may come across tables like /BIC/F100234.
But you don’t have any Infocube named “100234” – so what are those tables about?
They belong to aggregate infocubes. [more on this here]
These are (small) subsets of data that the OLAP processor can choose to deliver the reporting result much quicker. In this respect, aggregates are very much like database indexes.
Since the aggregates really are subsets of the actual large infocube they have an F- and E-fact table as well and the same problem can occur with them as well.

If you now want to know to which infocube a specific aggregate belongs to, you can easily look it up in the RSAGGRDIR table.
For every aggregate, you’ll find an entry within the table that maps the aggregate to the infocube.

Checking this table in SE16 delivers an output similar to this:

Table:          RSDDAGGRDIR

   AGGRUID                   OBJVERS INFOCUBE   AGGRCUBE ...
                                                         ...
   03T89W5IUUEFPARPLPZ29YZF3 A       0BWVC_AGV  100108   ...
   200H0IWIR23WASLOESNT8BKJZ A       0BWVC_AGV  100099   ...
   3VILFVABC6MYNF9R10M0WYVHR A       ICD05      100772   ...
   3VIVFYBTHUCH8HF58HQCRGJXS A       ICD03      100778   ...
   3VIVG4AW8R8BQ0JPRVJWKZJZK A       ICD03      100779   ...
   3VIVG8ZVTWHX3SFLC8ZEQ6RQO A       ICD03      100780   ...
   3VP09ETI53LHVKWQHLL79RK5X A       RSDRICUBE  100032   ...
   40VXFTXRAJ6NNT88CWOEA3LYN A       0BWVC09CP  100071   ...
   40VXFU60ZFEEYGCAFPLSI952N A       0BWVC09CP  100072   ...
   40VXK7M8IUTH0IH052QGOF94F A       0BWVC09CP  100073   ...
   [...]
  

Ok, I hope some of you really made it to the end of this rather lengthy first BW blog post. I’ll try to keep them shorter in the future 😉

regards,

Lars