Small changes with big effects

For many BW performance relevant DB-features, it’s really all about the details of usage and implementation when it’s about the effect of those features.

The following are two examples of rather small things that went wrong which had a big impact on the system performance.

Two examples for BW on MaxDB

The first two examples are both good examples for the inherent assumptions that developers make during development.
Number 1 goes like this:

A customer has chosen MaxDB as the database platform for its SAP BW instance.
In addition to that, the customer decided to go for a BIA, which is quite a clever choice, if you ask me.
Instead of having a super-expensive and maintenance intensive BW main database that maybe still would require the setup of a BIA, this customer now runs a low-cost low-maintenance main database and the performance intensive reporting out of the expensive but also low-maintenance BIA.

Unfortunately, it looks like nobody anticipated this combination to become popular.
Otherwise, I assume report RSDDTREX_MEMORY_ESTIMATE would have been tested with MaxDB as well.

This report is used to get an estimation of the required memory for the BIA usage.
It’s not too complicated and merely consists of taking the number of rows in an InfoCube and multiply this with the InfoObjects data lengths and some “magic” constants.
So far nothing special.

What’s “special” is that this report still makes use of the nowadays abandoned fact-views from BW3.x-times.
Fact-views make it possible to access the data in both E- and F-fact table at once, by concatenating the sets with a UNION ALL.
That means, fact-views basically look like this:

CREATE VIEW "/BIC/V..." AS
(
SELECT col1, col2, ...
  FROM "/BIC/F...."
UNION ALL
  SELECT col1, col2, ...
  FROM "/BIC/F...."
)

From the ABAP side, this eases the access since you now just have to run one query to get access to all data in an InfoCube.
Our report does the same and runs this statement:

SELECT
 count(*)
FROM
  "/BIC/VMYCUBE"

The readers with some MaxDB experience might think now:
“That’s great! MaxDB has its file counter statistics and a special COUNT(*) optimization that avoids table/index access for counting!”
And those readers are correct!

Unfortunately the COUNT(*) optimization has a severe limitation: it only works for simple statements.
That means:

  • no WHERE condition (!),
  • no JOINs (!),
  • no UNIONS/SET OPERATIONS (!),
  • no GROUP BY/ORDER BY (!)

In reality, it means NO NOTHING, just the COUNT(*).

The fact-view used here, therefore, couldn’t take advantage of this optimization and had to do the counting via the brute-force-traditional approach: read the whole first table, read the second whole table, combine the results and count the number of rows.

The execution plan for such an IO/CPU burning process looks like this:

OWNER    TABLENAME         STRATEGY                            PAGECOUNT
SAPXXX   /BIC/FMYCUBE      TABLE SCAN                                  1
SAPXXX   /BIC/EMYCUBE      TABLE SCAN                            1194819
INTERNAL TEMPORARY RESULT  TABLE SCAN                                  1
         SHOW                RESULT IS COPIED, COSTVALUE IS     10653812
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

The runtime of this little monster was 3 days and running … until the database couldn’t keep the huge temporary result set of approx. 10 Mio. pages (ca. 76 GB) anymore. The report finally dumped with the infamous

“POS(1) Space for result tables exhausted”

Ouch!

Fortunately, the report already was prepared to handle the request without a fact view, but it wasn’t enabled for MaxDB yet.
This was quickly done after a short discussion with the responsible IMS colleague and correction note
#1533676 – Long runtime of program RSDDTREX_MEMORY_ESTIMATE
was created.

The execution plans afterwards looked like this:

OWNER    TABLENAME         STRATEGY                           PAGECOUNT
SAPXXX   /BIC/FMYCUBE      TABLE SCAN                                  1
                           COUNT OPTIMIZATION
         SHOW                RESULT IS COPIED, COSTVALUE IS            2
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

and

OWNER    TABLENAME         STRATEGY                           PAGECOUNT
SAPXXX   /BIC/EMYCUBE      TABLE SCAN                            1194819
                           COUNT OPTIMIZATION
         SHOW                RESULT IS COPIED, COSTVALUE IS            2
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

And the total runtime of the report went down to a few hours (there is other stuff in there that just takes some time).

(Remark: important to understand for MaxDB execution plans is that only the COSTVALUE represents an optimizer estimation. All other PAGECOUNT values refer to the TOTAL number of pages the table or index of this specific line allocates in the database!)

If you look at the sap note with the correction, you’ll find that it was a very small change that made the difference:

From this:

     IF sy-dbsys <> 'DB400'.
       APPEND g_v_tablnm_v TO g_t_tablnm.
     ELSE.
       APPEND g_v_tablnm_e TO g_t_tablnm.
       APPEND g_v_tablnm_f TO g_t_tablnm.

to this:

IF sy-dbsys = 'DB400' OR sy-dbsys = 'ADABAS D'.
       APPEND g_v_tablnm_e TO g_t_tablnm.
       APPEND g_v_tablnm_f TO g_t_tablnm.
     ELSE.
       APPEND g_v_tablnm_v TO g_t_tablnm.

Knock, knock, any data in there?

The second example is not only ‘special’ on the MaxDB port but on all databases.
However, for MaxDB, the effect was the worst, due to certain limitations of SQL optimization.

SAP BW is a data warehouse and therefore a lot of the functionality is there to handle data, to store and move data and to get rid of data.
These tasks bring with them the necessity to sometimes drop a table and rebuild it, e.g. when you change an InfoObject-definition.

But before merely dropping tables, BW is cautious and asks“Hey, any data in this table?”.
And indeed, there is a function module called RSDU_DATA_EXISTS_TABLE that answers this question.

Now, before proceeding, ask yourself: how would YOU try to answer this question in SQL?
A common first approach would be: count the number of rows in the table and if it’s larger then 0 then there is some data in the table.
Correct!
But given the fact that counting the actual number of rows in a table really can take ages (see the example above), this is the second worst idea to approach the issue (and I admit that it was also the first I thought up).

The worst idea I’ve seen so far is what was actually implemented in the function module:

SELECT bname FROM usr01 CLIENT SPECIFIED UP TO 1 ROWS INTO :test
     WHERE EXISTS ( SELECT * FROM (i_tablnm) CLIENT SPECIFIED ).
  ENDSELECT.

Let’s see if we can figure out, what this statement should do.
In English it means:

  • Give me the column BNAME
  • from the table USR01 for at most one row
  • for which the set of all rows in table I_TABLNM (this is the one we want to know whether it’s empty or not) contains something.

This is just amazing!

As you can imagine, MaxDB will first create a temporary result set for the exists clause (that is full table copy) and then returns just one row.
If the I_TABLNM table is not empty, this can easily become a similar problem as the example above.

Now, of course, there is a much better way to do this.
If you think about it, all we want is a YES (there’s data in there) or a NO (nope, all empty) and this can be done as well as SAP note #1542839 – “Performance Optimization in RSDU_DATA_EXISTS_TABLE_ADA” nicely demonstrates:

SELECT 'X' FROM (i_tablnm) WHERE ROWNUM <= 1

This means: “Database, go and get me an X for the first row that you hit on the table and stop afterwards!”
Regardless how you process this statement, in the worst case it will end after a few (1-4) page visits.
The database may even use an index-only strategy since NO data from the table needs to be fetched (just a constant).

There are of course similar examples for other DBMS as well, but for the sake of a digestible blog post size, I’ll save them for later posts.

BW on Oracle: a performance hunch I see more and more often…

Ok, once again reproducing a problem takes a lot of (waiting-)time for the query to finish.

In the meantime I can also just write a super-short-blog post about a pretty common BW-on-Oracle performance hunch:
Inappropriate database optimizer statistics.
(yes, ‘inappropriate’ not ‘outdated’!)

Usually, that means: if everything runs as it is supposed to, BW will take care of collecting the CBO statistics whenever large changes to the data are performed (e.g. data had been loaded or condensed, etc.).
Of course, the BW admin has to set this up in process chains or as attributes to info providers but usually, this works well.

If these BW-measures somehow “forget” to update the statistics when new statistics would be required, there is still the daily “Update Optimizer Statistics” (brconnect -f stats) run that should cover this.

The big exception

Unfortunately, there is one setting available that prevents the update of new statistics:
the exception table DBSTATC.

In this table, you can setup statistic gathering parameters that deviate from the default built-in behaviour of brconnect. (This behaviour is not too complex, but really does cover most SAP specifics. See the documentation for details here).

One of the most used exceptions here is to avoid gathering new statistics at all.
This can be useful, e.g. when a table is very volatile and you don’t want to have your access plans changed all the time because the statistics sometimes report a small/empty and sometimes a large table.
(For such case SAP even provides pre-defined table statistics, check sap note 756335 on that)

But what does this have to do with BW?

Well, before brconnect there had been a different tool called SAPDBA and this tool wasn’t able to cope with BW objects.
Therefore BW development decided to build its own statistics update coding and to prevent SAPDBA from messing around with the BW tables. And for that they created entries in DBSTATC with ACTIVE=I automatically.

A comfortable solution

This is, of course, all long past now, but these entries seem to have survived many system upgrades.
In order to get rid of them, there’s a report available: SAP_DBSTATC_CLEANUP.
With it you can display and delete the DBSTATC entries for single infocubes or all of them.
Looking at the sap note that introduces this report (SAP note 129252) you’ll find that this had been around for quite a while now 🙂

The recommendation is to remove all the DBSTATC entries for BW objects so that the standard statistic gathering can be performed.
SAP note 1013912 – “FAQ: Oracle BW Performance” already contained this recommendation and from today on, it also contains the report name …

And yes, the solution is to remove the entries and not to set ACTIVE=A or something like that. DBSTATC really is all about defining an exception. If the tables should be handled the standard way, then DBSTATC should not contain any entry for them!

Ok, the customer report is through… back to the real work!

Best regards,
Lars

+++ UPDATE added 22.10.2011 +++

In addition to the SAP_DBSTATC_CLEANUP report mentioned above, it’s also important to make sure that report SAP_ANALYZE_ALL_INFOCUBES isn’t used for statistics gathering anymore.

This report calls a function RSDU_DBSTATC_EXIT for each analyzed infocube and thereby the DBSTATC entries are recreated again!

+++ UPDATE added 22.10.2011 +++

A first (BW) issue…

In the second week working in BW support, I noticed that there was a problem reported by customers again and again.
The customers complained about “database statistics update fails in process chains” or “brconnect run returns errors”.
When checking the relevant log file entries for the actions there was always something like

ERROR at line 1:
ORA-20000: index "SAPR3"."/BIC/F100069~900"  or partition of such index is in unusable state
ORA-06512: at "SYS.DBMS_STATS", line 15181
ORA-06512: at "SYS.DBMS_STATS", line 15203
ORA-06512: at line 1

The more database savvy customers or DBAs sometimes had a quick resolution at hand: to rebuild the index.

And, indeed, after rebuilding the affected index (as I realized it was nearly always a 900 index) the statistics could be correctly gathered.
Problem solved, or not?

Well, only until the next compression run for the infocube happened.

Infocube compression, wait, like infocube.zip or what?

The term ‘compression’ is a bit misleading here and ‘to condense’ seems more appropriate.
To understand this action, we’ve to take a look at how fact table data is managed in SAP BW.
If you’re yet unfamiliar with the BW-star schema (extended star schema) it might be a good idea to check the documentation or the BW related blogs and WIKIs here at SDN.

The fact tables are used to store the actual business data that should be reported on to. In the simplest case, this data is loaded from the source system (e.g. the ERP system) on a regular basis, say daily.
So, we get new data in more or less large chunks or packets.
The BW term for this is ‘request‘.
A data request is a specific chunk of data that only has one thing in common: it has been loaded into the fact table together. [more on this]

Requests and what they are good for

This request wise processing provides several options that are otherwise quite hard to achieve in data warehouse systems:

  • we can load just the data that had been changed since the last load (delta loading)
  • we can check the data after each load for technical and quality issues and decide whether or not it should occur in the reporting
  • in case some of the data was not imported correctly, we can easily delete it again, without impairing the reports that use the already available data.

If you think about it, this means, that it’s totally possible to load information about the same business objects (let’s say: direct sales to high-level customers in southern Germany) can be loaded several times.´

If you load the sales transaction data every day to, later on, report it on a weekly base then you’ll get the sum of all the sales aggregated over a week – and use data of 7 requests work of data (1 request per day, 7 days a week).

But as we see, for our reporting requirement (sales on weekly basis) it’s actually not necessary to keep all the data load packets (requests).
Once we are sure about that the data is technically and quality wise OK, then we might just as well sum up the data to the weekly level, store this and throw away the 7 requests.
This is what the compression of Infocubes pretty much is about.
In SAP BW this is implemented based on two tables:
the F-fact table to which all data load requests go and the E-fact table, which contains the information for the pre-aggregated (condensed) data. [more on this]

Getting closer…

On reporting time, the SAP BW OLAP engine knows about the fact that our data is stored in two tables.
So for every BW query against an Infocube, we usually see TWO nearly identical SQL statements, that only differ in the fact table that is actually used.

Now we have:

  • two tables,
  • we have data that needs to be read and aggregated request-wise from the F-fact table,
  • we have the aggregation result that has to be saved in the E-fact table
  • we’ve data that afterwards needs to be deleted from the F-fact table – otherwise some numbers would be included twice in our report!

Looking at this database requirements, there’s an Oracle feature available, that really is made for it (it really is ;-)): PARTITIONING!

Without partitioning the final deletion of already condensed requests would require to

  • scan at least one full index in order to find all rows matching the requests to be deleted,
  • remove them
  • and update the indexes afterwards.

With partitioning, all we’ve to do is to drop the partitions that contain our request.
That’s the reason why the F-fact tables are always partitioned based on the request dimension (on DBMSes that support partitioning, of course)

So, we can easily get rid of data, when the table is partitioned the right way.

But what about the indexes?

There’s a treat in Oracle available for that as well: local partitioning of indexes.
Simply put this means: for every table partition a corresponding partition is created for the partitioned index.

With this, we don’t even have to rebuild the indexes after dropping a table partition.
All we’ve to do is to also drop the corresponding index partition together with the table partition.
The remaining index will still be completely correct and will still cover all data in the table.

Ok, now we arrived at the start of the problem 😉

All this sounds quite good.
In fact, it’s great!
And (of course) here begins the problem.

This great combination of clever data design, implementation and database feature exploitation only works properly if the indexes really are partitioned exactly as the table is.
So BW has to take care of this since Oracle allows to create indexes with a different partitioning scheme or without partitioning as well.
If this is the case and we drop table partitions, then Oracle would have to read every row of the dropped partition to know every deleted row and take this information to maintain the index data.
Obviously, this would render the partition advantage null and void.
So, Oracle simply flags all indexes for which the same drop partition cannot be performed as UNUSABLE.

Such UNUSABLE indexes can be repaired simply be rebuilding them.
The Oracle cost-based optimizer is clever enough to ignore those indexes (also see Oracle parameter “skip_unusable_indexes“), so queries will not fail because of this.

Except… except we force Oracle to use the broken index by using a hint.

“Where the heck do we do such stuff?” – is that the question you’ve got in mind right now?

Well, we do it every time you run the update statistics job.
And collecting CBO statistics after deleting lots of data from a central fact table in BW is usually done as part of the data loading process (chain).

In Oracle, update statistics means calling a PL/SQL stored procedure in the DBMS_STATS package. And in there, Oracle will run SQL statements like this:

select /*+ no_parallel_index(t,"/BIC/F100069~900") dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_expand index(t,"/BIC/F100069~900") */ count(distinct sys_op_lbid(15948,'R',t.rowid)) as nrw ,count(distinct sys_op_lbid(15948,'L',t.rowid)) as nlb ,count(distinct "KEY_IC_0025P") as ndk ,null as clf from "SAPR3"."/BIC/F100069" t where TBL$OR$IDX$PART$NUM("SAPR3"."/BIC/F100069",0,4,0,"ROWID") = :objn

No reason to understand the whole statement for now, but the important part here is the INDEX hint.
With this hint, Oracle is forced to use the index (for which statistics should be collected).
If our index is UNUSABLE at this point in time, then Oracle has no other choice as to report:

ERROR at line 1:
ORA-20000: index "SAPR3"."/BIC/F100069~900"  or partition of such index is in unusable state
ORA-06512: at "SYS.DBMS_STATS", line 15181
ORA-06512: at "SYS.DBMS_STATS", line 15203
ORA-06512: at line 1

LOOOONG story short.

Mashing all this together:

WHEN
        we've a partitioned table with a non-partitioned index
AND
       we drop a partition of the table, so that the non-partitioned index is flagged UNUSABLE
AND
       when we finally run an update statistics on this table,
THEN
       we end up with our ORA-20000 error message.

Why do I tell you this, when BW does take care of the database objects so well?

Because sometimes it fails to do so.
One reason for such a failure is a bug that has been introduced with an SAP Basis SP.
This bug is already fixed with SAP note 1479683 but there are many customers around who haven’t yet installed the note and who are just facing miraculous errors in their info cube compression process chains.

As you can see the connection between the ORA-20000 error message about the unusable index and the real cause is rather a long distance one, although straightforward once you understand it.

The solution (ahem… TATAAA!)

To finally get rid of this problem, I wrote note #1513510.

In there you’ll find that you have to
a) import the fix for the SAP Basis bug (note #1479683)
b) recreate (NOT just rebuild) the indexes

For step b) the easiest and quickest way is to use the same function modules that SAP BW does use for this task.

Excursion… Aggregates are just little infocubes…

In note #1513510 I included an SQL statement to find all partitioned tables for which there are non-partitioned indexes so that the DBA or BW-Poweruser can look up the problematic ones without having to wait for process chains to fail.
The statement looks like this:

select
    /*+
    no_parallel_index(t,"/BIC/F100069~900")
    dbms_stats
    cursor_sharing_exact
    use_weak_name_resl
    dynamic_sampling(0)
    no_monitoring
    no_expand
    index(t,"/BIC/F100069~900")
    */
 count(distinct sys_op_lbid(15948,'R',t.rowid)) as nrw
,count(distinct sys_op_lbid(15948,'L',t.rowid)) as nlb
,count(distinct "KEY_IC_0025P") as ndk
,null as clf
from
    "SAPR3"."/BIC/F100069" t
where
    TBL$OR$IDX$PART$NUM("SAPR3"."/BIC/F100069",0,4,0,"ROWID") = :objn

If you run this statement you may come across tables like /BIC/F100234.
But you don’t have any Infocube named “100234” – so what are those tables about?
They belong to aggregate infocubes. [more on this here]
These are (small) subsets of data that the OLAP processor can choose to deliver the reporting result much quicker. In this respect, aggregates are very much like database indexes.
Since the aggregates really are subsets of the actual large infocube they have an F- and E-fact table as well and the same problem can occur with them as well.

If you now want to know to which infocube a specific aggregate belongs to, you can easily look it up in the RSAGGRDIR table.
For every aggregate, you’ll find an entry within the table that maps the aggregate to the infocube.

Checking this table in SE16 delivers an output similar to this:

Table:          RSDDAGGRDIR

   AGGRUID                   OBJVERS INFOCUBE   AGGRCUBE ...
                                                         ...
   03T89W5IUUEFPARPLPZ29YZF3 A       0BWVC_AGV  100108   ...
   200H0IWIR23WASLOESNT8BKJZ A       0BWVC_AGV  100099   ...
   3VILFVABC6MYNF9R10M0WYVHR A       ICD05      100772   ...
   3VIVFYBTHUCH8HF58HQCRGJXS A       ICD03      100778   ...
   3VIVG4AW8R8BQ0JPRVJWKZJZK A       ICD03      100779   ...
   3VIVG8ZVTWHX3SFLC8ZEQ6RQO A       ICD03      100780   ...
   3VP09ETI53LHVKWQHLL79RK5X A       RSDRICUBE  100032   ...
   40VXFTXRAJ6NNT88CWOEA3LYN A       0BWVC09CP  100071   ...
   40VXFU60ZFEEYGCAFPLSI952N A       0BWVC09CP  100072   ...
   40VXK7M8IUTH0IH052QGOF94F A       0BWVC09CP  100073   ...
   [...]
  

Ok, I hope some of you really made it to the end of this rather lengthy first BW blog post. I’ll try to keep them shorter in the future 😉

regards,

Lars

New shores in sight…

As some of you might have noticed, I haven’t been too active in the DB/OS forums lately.
This is because I moved on to a new support team and a new subject for me back in September.
Yes, I took the giant leap and moved two offices to the right and am now running with the SAP BW support superheroes 😉

As in-depth database knowledge in the BW area (rather application/business minded) is as seldom as top-notch BW expertise is in the database group (rather technology/BASIS focussed) we all found that this is going to be a perfect mixup and after seven years in database support I’m enjoying to see so much ‘new’ stuff.
Well, at least I get a quite new angle on many issues that come up between BW and databases and that’s a very good thing.

So what’s in there for you then?

I think I’m going to continue writing about questions, issues, messages and situations that come up in support and for which the usual reaction is: hand it over to ‘the other’ component.
Among these cases are classics like ‘BW load fails due to unusable Oracle indexes’, ‘BW performance is slow’ or ‘that SQL does not seem to make any sense, although it comes right out of the data manager layer of BW…‘.

Well, I guess we will see, what comes up.
For now, I start with a rather typical issue that nicely highlights that looking just a little bit behind one’s own backyard really can make the difference.

To make this blog post not too long, I put the case in another one, accessible >here<.

I hope to see some of my usual readers on that new topic as well.
Best regards,
Lars