Tag Archives: performance

How to know “all these things”

There is one predominant reason for why databases and database technology get a spot in the conversation of “the business”: performance.
Understanding system performance and coming up with ideas for improving it typically requires a form of end-to-end understanding with different levels of detail insight on how a query from a client application is processed by SAP HANA. Before long, one looks into “explain plans“, “execution paths”, SQL, PlanViz diagrams, data distributions, table definitions, stack traces and even NUMA-process management on Linux/Hypervisor-level. That’s a lot of concepts and layered technology and most of those concepts come with a development history of at least 30 years if not more.
This means that prior knowledge is not only implied but required to get a proper understanding of what concepts are put to work in your specific setup. Intuition about how the system works is not your friend here.

But where do you learn about all these things? The product documentation? A university SQL 101 class? Blog posts from strangers on the Internet? Books? Or a mooc with a certificate?

Personally, I’m always happy to find good collections of material, maybe even curated into topics. One of those collections is The Red Book (Readings in Database Systems), which covers all important topics of Database Management Systems. It even features a chapter “3. Techniques Everyone Should Know” in case you really only want to have a quick overview.

I read about this awesome (and free) online book in a blog that I can only recommend: The morning paper by Adrian Coyler. He linked to the red book in his post (Database) Techiques Everyone Should Know.
There are also many more recent posts about database related papers like Learned Index Structures or the design and implementation of modern column-oriented database systems which explain current developments in DBMS technology that will probably end up as a feature in one of the big DBMS systems. The great thing about Adrian’s blog is that it is a kind of Cliff’s Notes for computer science papers. One does not have to actually read the paper to get the core ideas; but Adrian always makes the paper available to read for oneself.
While I mention this blog, I like to say “Thank you!” to Mr. Coyler whom I never met, but I certainly benefitted a lot from reading his outstanding blog.

Merging right. Some tests with data loading in HANA from Down Under.

Based on SAP HANA SPS 12 (HCP trial)

When Jody Hesch recently send me an email with a question about HANA, he received what everyone receives when sending email questions about HANA. The reply is a version of

“Usually I don’t do email Q&A as this simply doesn’t help with knowledge sharing.

Instead, I advise everyone to post the question in one of the HANA related forums
(SAP Community https://answers.sap.com/questions/metadata/23925/sap-hana.html ,
JAM https://jam4.sapjam.com/groups/about_page/6UHzR2Fxra4quFAbACtxFD
or even stackoverflow http://stackoverflow.com/questions/tagged/hana ) so that the question and its answers are search- and findable.

That way everyone can benefit from this and you even might get faster and/or better answers than from just writing to me.”

I have not given up on my personal fight against GEIGOKAI and this it what it looks like in daily real-life.


You might be inclined to call this reply rude but it turns out that most people accept this very easily and manage to copy&paste their question into one of the forums – and changing culture is always a bit painful … drip drip drip

And just as many others, Jody managed to post his question for all to see, read and discuss here: Manually managing Delta Merge

While I was doing the due diligence of checking my own ideas against a HCP trial HANA instance (SPS 12.04), the whole explanation and the example got larger than what would be nice for a simple answer, which is why you now read this is a blog post format.

On loading data and merging

Ok, let’s get to it, then.
Jody’s question was “how can manually handling delta merge make a data load faster?”
To answer this I set up a small test case:

drop table merge_test;
     "AAA" NVARCHAR(20),
     "BBB" INTEGER ,
     PRIMARY KEY ("ID"));

declare ts_start timestamp = current_timestamp;
declare ts_end timestamp;
declare loops integer = 100;
declare j integer;
declare i integer;
declare with_commit integer = 1;

    truncate table merge_test;
    for j in 0 .. :loops do
        for i in 0 .. :loops*50 do
            upsert merge_test
                values (:i, :i || ' - ' || :j, :i*:j)
                with primary key;

        end for;
        if with_commit = 1 then
        end if;
    end for;

    ts_end = current_timestamp;    
    -- now show some infos
    select seconds_between(:ts_start, :ts_end) as duration from dummy;
    select count(*) as num_rows, min(id), max(id) from merge_test;
    select column_name, memory_size_in_main, memory_size_in_delta, count, distinct_count
    from m_cs_all_columns
    where table_name ='MERGE_TEST'
    and schema_name =current_user;
    select merge_count, read_count, write_count, last_merge_time
    from m_cs_tables
    where table_name ='MERGE_TEST'
    and schema_name =current_user;
    select start_time, motivation, execution_time, memory_merge, merged_delta_records, success
    from m_delta_merge_statistics
    where table_name='MERGE_TEST'
    and start_time >= :ts_start
    and schema_name = current_user;

What we have here is a simple table and a data loading simulation. The data gets inserted or updated via the UPSERT command and, if the parameter

with_commit = 1

a commit is done after each iteration of the outer loop.
After both loops have been left, the code prints the runtime in seconds and selects some statistics from system views.

Next, I ran three experiments.

The first two used the SAP HANA AUTOMERGE feature, where HANA decides when to perform a delta merge.
The third one followed the approach SAP BW uses for data loading: the AUTOMERGE is disabled and the target tables get merged after all data has been loaded.




5001        0       5000  

ID          1064                3733044                 505101  5001          
AAA         0                   17607386                505101  -1            
BBB         0                   5571068                 505101  -1            
$trex_udiv$ 232                 8889016                 0       0            
$rowid$     768                 15676428                505101  505101        

0           7           505101      2016-12-20 03:15:41.986


Easy to see, the whole thing ran for 386 seconds and managed to create a total of 505.101 record versions in the delta store.
Note that during this load NO delta merge had happened. This is because we never committed the transaction at all and the delta merge won’t “relocate” open update transactions to a new delta store.



5001        0       5000  

ID          723716              9132                    5001    5001          
AAA         741076              10052                   5001    5001          
BBB         711828              9132                    5001    5001          
$trex_udiv$ 273432              1912                    0       0            
$rowid$     1504                9132                    5001    5001          

5           30          505111      2016-12-20 03:40:22.85

2016-12-20 03:36:22.166 AUTO        54              FALSE           80016                   TRUE  
2016-12-20 03:37:22.24  AUTO        88              FALSE           90018                   TRUE  
2016-12-20 03:38:22.349 AUTO        119             FALSE           85017                   TRUE  
2016-12-20 03:39:22.49  AUTO        157             FALSE           85017                   TRUE  
2016-12-20 03:40:22.67  AUTO        186             FALSE           85017                   TRUE  

Here we see that with commits in between the whole thing only took 349 seconds.
As we’ve committed the transaction in between a couple of times, the automerge had a chance to be active – and did so roughly every minute (5 times).

Now, why is this variant a bit faster overall? The effort to find the currently active record versions grows with the number of versions in the delta store.
With the automerge in between, this number stayed consistently below 100.000 versions, while the first version had to face an increasing number of versions to check between round 100.000 and 505.101.


Finally the version of data loading that is similar to SAP BW’s approach.
I disabled the automerge via

alter table merge_test disable automerge;

and ran the test again:


5001        0       5000  

ID          1064                3733044                 505101  5001          
AAA         0                   17607386                505101  -1            
BBB         0                   5571068                 505101  -1            
$trex_udiv$ 232                 8810536                 0       0            
$rowid$     768                 15676428                505101  505101        

0           15          505101      2016-12-20 03:49:36.914



merge delta of merge_test;

2016-12-20 03:56:09.435 HARD        46              FALSE           5001                    TRUE  

With 325 seconds this is the fastest run.
Looking at the time the manual delta merge took, we find that it’s still faster than the last delta merge of the automerge example.

This is – again – due to the fact that now the are a lot less concurrent record versions in the table.
Note, MERGED_DELTA_RECORDS shows the number of valid (committed) records that have been moved from delta to main storage.
If I update the same record a hundred times before I commit, then I would only get one record merged into the main store.

Now, why do we see a better performance in this case?

To illustrate the resource usage during our loads, see the following diagrams:

In our first case, we see that the memory usage for the delta store as well as the CPU usage for the UPSERT gets larger over time.
Sometime after the last commit is done, an automerge gets triggered, leading to additional CPU and a lot more memory usage.
The result is a relatively well-compressed table (last little blue square).

Note that the number of uncomitted record versions (little orange squares) increases during the whole time.

With the second approach (commits in between and automerge on), we see that CPU and memory usage during the load stay moderate. Only when the merge is performed, memory and CPU usage increase and the whole process has to wait for the merge to switch over to the delta2 in order to continue.

Noteworthy here is that the resulting table can easily be larger than the table sizes produced by the other methods. This is because the column compression algorithms are determined during the first delta merge and won’t change that easily.
This can lead to a situation where the compression algorithms are determined on a subset of data that is not representative of the total (think of loading data with timestamps or dates in sequential order – the first merge may only see one specific date).

With the last option, we see a very similar pattern than in the first case.
The only difference here is that the final merge was manually triggered.

So with AUTOMERGE disabled and regular commits we get the best of everything:

  • no waits due to delta merges
  • no increased runtimes due to a high number of outdated uncommitted record versions
  • well-compressed table (since the compression optimisation can work on the whole data set).

This works well, as long as the resulting delta merge can be done within the available memory.
And even if other memory structures would need to be unloaded to allow for the delta merge to happen, then this would have to happen only once and not several times during the loading process.

Smarter merge

I mentioned that this third approach is similar to what SAP BW does but this is only half the truth.
Instead of manually sending a

merge delta of merge_test;

which gives the hard instruction to perform a delta merge now, SAP BW uses a “smart merge”.
The idea here is that instead of forcing a delta merge to let HANA evaluate whether a merge would be granted for, given the current memory statistics and given the merge decision functions.

With this flexible approach, the application does not need to care about the specifics of how to decide when a merge should take place, but can yet trigger HANA to take care about that.


Manually controlling the delta merge can improve performance for the load process as well as reduce the overall system load during the data loading, since multiple merges are avoided. Delta merges still are extremely resource intensive operations that can have a massive impact to the whole HANA system. Therefore it’s a good idea to have them done as seldom as possible.
This, in combination with a controlled approached to COMMITTING loaded data can, as shown above, have quite an effect to the runtime of the data loading.

There you go, now you know.

The 7th Niceness of new toys’n’tools

If you’re not living behind some red moon and you’re reading this blog, you will likely be aware that SAP HANA SPS 10 had just been released.

With it also the SAP HANA Studio had been improved and extended.

Since I personally like eye-candy and enjoy exploring new features when they come with a button to click, I have installed the new version on my laptop and played with it; against a SPS 9 SAP HANA database!

That’s right – everything I describe here works just well on a Rev. 96, which means you can use these features even without upgrading your server.

(This is one of the benefits of the de-coupling of SAP HANA Studio releases and SAP HANA Server releases – indicated by the different version numbers.)

So here we go with the nice things I found in SAP HANA Studio 2.1.4.

Drop and drop of design time objects

Since basically ever I wished I could just drag and drop information models into the SQL editor so that I could execute SELECT statements against them.

No idea what took the dev-colleagues so long, but now that’s actually possible.

Simply dragging the design time object (you know, the object as it appears under the Content folder) over into the SQL editor and the corresponding runtime object name (yes, the _SYS_BIC.”<package_name>… <object_name>”) is placed right at your cursor.

Alternatively, there is a context menu entry “Copy Column View Name” that puts the name into the clipboard.


SELECT statement generation incl. input parameters

Similar to the above, you might have stumbled over models with input parameters.

The usual workaround to use the SQL generated by the data preview is alright, but a bit clumsy.

With SAP HANA Studio 2.1.4 you can now simply right-click and select “Generate Select SQL“.

After a few moments of checking the metadata, you get a new SQL editor with the basic SELECT SQL:


As you can tell, someone automatically pressed the “Format” button as well.


Compare two plans in PlanViz

More than once I got the question on how to perform comparisons between two PlanViz traces and the usual answer had to be: you put the two PlanViz editors next to each other and compare the Plan OPerator boxes one by one. This is of course tedious.

Now, there is the (once again – anyone remember Show me the timelines, baby! ? 😉 ) well hidden feature to compare two plans.

I found it in the context menu of an executed PlanViz by clicking on “Compare With Another Plan…” – not sure if there is another way to this feature.

Next you’re greeted by a little selection dialogue window, that I found rather self-explanatory:

Seems to be straight forward – so I selected one of the other open PlanViz displays by clicking “Other visualized plan“.

DISCLA(I)MER: for the next step, you should have a wide monitor. Or two monitors. Wide ones. Really! (It’s better for you anyhow, if you use it right.)

See? I told you so!

As I haven’t really worked with the feature there’s little I can report beyond that it’s there.

Go play around yourself!

In my books, this feature really is:

Nice!  🙂

Immediate execution for PlanViz

Alright, you have your SQL statement. You want the PlanViz for it.

Up to now you were always confronted with some “Prepare” step and a graphical version of the Explain Plan.

How useful…not!” you might have thought (so did I most of the times 😉 ).

No more of that!

Modern days SAP HANA Studio comes with this shortcut in the context menu:


Identify plan operations in PlanViz by name

Ok, the following is not actually a new feature, but I have just found this and decided to put it in this anyhow.

A major difficulty in analysing PlanViz outputs often is to map the model design to the actual execution.

This is inherent to the calculation view instantiation process but you can make it a bit easier for you by naming the nodes in your model with your own names.

To understand what I mean, just look at those two models, that do pretty much the same thing:

Default node names Custom node names

At this point there is already a benefit as the intended function for the nodes is much clearer.

Now, when the model gets executed and a PlanViz is created, we can find those node names in several places:

1. In the POP node details:

2. In the “Tables Used” list:

3. In the “Operator List”:

You will still find a lot of things that you cannot easily match back to the model, but with the custom node names, you should be able to find some anchors in the PlanViz.

Not sure about you, but I think that’s Nice!.

Comments in information models

One of the things you wish you had when working with complex models, that you or someone else created a while ago typically is: a documentation.

  • Why, just why is this condition there?
  • What exactly is the idea behind this calculated column?
  • Where should this data go?

Wish no more!

With the SPS 10 SAP HANA Studio you can now add comments/notes to virtually anything in the model.

Click the little “Maintain Comment” button in the model, when hovering over a node…

… then enter some amazingly useful comments!

Once you close the comments editor you’ll see a little indicator at the node-icon to show that there’s a note for you:

Let’s say it together:


Data Lineage in Modeler

When working with the modeler it can get quite difficult to understand where the data of a certain output column actually comes from.

To help with that, there is now the Data Lineage feature available.

The red borders around the model nodes indicate that the data is “touched in there” and the little annotation boxes show you the column name per node.


Alright, given that I only had a few hours to look into it, I’m personally quite happy about the result here.

Seven “NICE” for the current major release of SAP HANA Studio (and there are more, like the on-demand loading of the Contents sub-folders, but I couldn’t really capture this with a screen shot 😀 ).


There you go, now you know!

  • Lars