What HANA needs now is some…

… logging!

Yes, you just got click-baited into reading this advertisement for a product enhancement idea that I posted: Standard logging facility for SQLScript

The idea here is simple: when you write application logic in SQLScript then you very likely want to have the option to log what your program does.

This is not the logging I am talking about

http://www.imdb.com/character/ch0009715/mediaviewer/rm2931612928
Photo by ABC Photo Archives – © 2011 American Broadcasting Companies, Inc. – Image courtesy gettyimages.com

What I am talking about is a language feature in SQLScript that would allow to write out arbitrary messages during the execution of a SQLScript read/write procedure to some logging target. A target could really be anything, in this case, that can receive the logging information, e.g. a database table, the OS syslog, a logging server, a CSV file, an OS pipe, a kafka topic…

HANA SQLScript should provide the API for such a logging facility so that developers can focus on their business logic instead.

Especially with more and more application logic spread out over different micro-services, js apps and SQLScript an open and flexible logging feature could help to make sense of what an application does from one end to the other.

Another big technical benefit for a built-in logging would be that it could avoid the use of dynamic SQL and all the issues around it (security, performance, …).
Instead of pasting together an INSERT statement, a simple function call could not only invoke a potentially buffered log writing but also automatically include common technical information like current code location, timestamp, session context, user name, etc.

If you agree that such a feature is just what is missing in SAP HANA SQLScript, then go and vote for the idea Standard logging facility for SQLScript.

Note, that you have to register for the idea place first (https://ideas.sap.com/SAPHANAPlatform) and that you have to be logged on to vote.

SAP Community? Why?

If you’re reading this, you are likely aware of the ongoing debates, complaints, failures and the general situation around the SAP Community platform and the state of this very community in general, which is why I am not going to point out the many technical and organisational deficiencies. Others have done that well and often.
Very unfortunately, this, sometimes tedious, feedback did not lead to the expected changes but instead to the perception that “SAP is not hearing it” (why-oh-why.html, what-weve-got-here…)

Brian Ellefritz, the new Head of SAP Community , promised betterment and received sceptical encouragement and “good lucks”.
Many, including myself, believe that he means what he writes and that he will try to improve the situation.

Do I trust him with this, though? No, I do not and cannot.
And that is not because of Brian or what he has done, not at all. Frankly, I don’t personally know him so the trust that I could feel would be for his role and this role does not mean a thing to me.
I believe it does not mean much to the vast majority of community members either.

In my case, the distrust reaches further and covers the whole area of people, processes and decision making behind the organisation of SAP Community (let me refer to it as SAP-Community-ORG for this piece).
I simply don’t understand, on a very basic level, what the motivation of SAP-Community-ORG for providing and operating a platform for the community is.
What drives the actions and plans to win back active and engaged members?
Why does SAP-Community-ORG want to host the community platform?

I don’t know or feel their goals at all.

Don’t rush for answers

Now, before this question gets addressed with answers, let me point out that at this point I am not looking for quick and easy answers, but that there simply does not seem to be any explicit statement that explains to me and everyone else who might be interested in this community, what SAP-Community-ORG stands for.
“About SAP Community” does not provide this but is rather the digital version of a product leaflet (“look feature A, B, C…“, “this is what we have, this is how we do“).

I believe that the lack of this core “Why Community” is pretty much at the heart of all current problems.
The reason for me to believe this is how I think I pick the communities I am active in or which ones to not engage with.
I very much value knowledge sharing and exchange.
I have profited a lot of other people’s knowledge sharing in my life so far and I want to “give back” and contribute myself.
I learned and improved on public writing, about the different kind of people you find in forums and to try and see what someone like to achieve first instead of handing out a copy-from-the-manual-answer to a question.
I like the recognition I receive and feel very excited when I find out about things that I did not know and that I find interesting. These questions take me away from my normal business as usual problems and give me the chance to learn a lot more about the technology that I spent most of my waking hours with and on.
I also like to see how other members get similar benefits and sometimes develop from “mostly receiving/reading” to “active contributors”. That is probably one of the most motivating aspects for me: helping others to have positive experiences with communication in a forum.

When I find that these are the things that I can get from a community, that gets me interested.

Much noise, little signal

For several years now I have been active in different forums (SCN, stackoverflow, SAP internal JAM site around HANA) and became dormant in others when I felt that the forums did not follow the same goals and did not share the same values as I do.
In the case of SCN, I have been considering for the last two years what I get out of it and how many of my own values I still find a match.
Coming up with an answer for this question turned out to be difficult. There was so much noise due to the technical platform change, the bugs and the aftermath of all of this, that the usual signals for the values were and are very hard to see.
The few “signals” I find (a big shout out to the active moderators and the few remaining non-corporate contributors in the HANA-tag space!) are washed over by a wave of noise.
That frustrates and disappoints me and is the major reason for me for reducing my efforts in SCN and rather spending time elsewhere.

I cannot know that this is the same or similar for most members, but I believe many will have values and reasons to actually be active in the community beyond “what is the answer to my question?“. The latter can be solved be googling long enough, but no amount of search engine usage can provide any of the outcomes I described before.

Lost values

It is pretty clear that the technical failures annoyed and keep on annoying the users of the platform. But I don’t think that this is the reason for anyone who was interested in active participation to turn around and leave for good.
There are many comments where the author mentioned to have returned after a hiatus of 1, 3 or 6 months to see if the “toothing pains” were gone.
What I think, made them turn away again, was the impression that not only had the feedback not lead to enough improvement and that they could not find their own values in this community anymore. It is maybe just hidden behind too much noise, but the effect is the same as if the values are not there in the first place.

It looks, tastes and feels like these values are gone.

This lack of values, the “Why” of the community, is what prevents people to engage and to feel that this is a place where their own beliefs and values are shared.

Come with your vision and your values

What I see so far in terms of “community recovery” is very much focussed on organisational matters and in reducing the technical noise.

But what is needed, I believe, is that SAP-Community-ORG finds their own values and makes those unmistakable clear.
These values and the motivation behind SAP-Community-ORG need to lead the decisions made for the community and its platform.

Don’t hand out merchandise and gimmicks (everyone will take it, sure) and expect that this can buy engaged members.
Come with the vision and the values for what this community should be in your eyes instead.
That would be something one can relate to – or not. But without it, there is just nothing behind a lot of noise.

HANA 2 and two riddle errors

HANA 2 has been out for a while now – even SP01 of it is available already – so I thought it is about time to get more familiar with it.
Using my trusty “skullbox” (HANA in a pocket… ) I installed HANA 2 and realised that I now also will have to have a HANA Cockpit 2.0 installation, to be fully able to use all of the glorious new features.

Downloading the HANA Cockpit 2.0 package is a breeze as long as you do it via a proper broadband connection.
The installation archive for the database administration tool just takes about 2.5 GB (sic!).

> ll -h
-rw-r--r-- 1 lars sdba 2.5G May 12 19:58 SAPHANACOCKPIT02P_1-70002299.SAR

Compare this with, say, the current OpenSuse Leap Linux distribution (4.7 GB) and one cannot help having certain expectations around usability, the range of features and general awesomeness.
BTW: OpenSuse can be downloaded via the torrent network – given the size of the installation binaries for typical SAP application, that would be nice to have for the SAP Service Marketplace downloads, too.
Instead, in 2017, we still have to use the “Download Manager” tool, that sequentially downloads files from one server at a time.

Once the archive is on my skullbox, the interesting part begins.
Using the graphical installer “hdblcmgui” I triggered off the installation.
Being a “consumer” type of computer user as long as the situation allows, I simply accepted all the default values concerning file paths, instance numbers or names.

Click next, click next

And off the installation went, pushing a full blown HANA 2 instance onto my machine.
“WHAT?” I hear you say… which I have to respond to with “That’s right, the HANA Cockpit 2.0″ comes with its very own HANA 2 installation.”
That approach probably makes sense if one is managing many, many instances across your organisation in a centralised fashion (yep, that’s the old business user – IT – DBA/Admin way of thinking), but in times where “DevOps” is THE thing and development teams look after their “own” DBs themselves, this looks like bloat to me.
If anything, this sure will tick up the counter for productive HANA instances…

Surprisingly, the installation of HANA 2 went rather quickly. I suspect that having an SSD in your system really pays off for this sort of workload.
When I was just on the homestretch to successful-installation-celebration, a dialogue box announced:
Installation failed – Error occurred while executing SAP HANA Cockpit Components.”

Fortunately, the hdblcmgui tool makes it really easy to look at the log files in order to find out what was the cause for the error.
Kudos to the development colleagues for that!

This is what I found in the error log file:
[...]
18:02:19.875 - INFO: Output line 89: [26] --- Installation step: Set the HANA broker default DB ---
18:02:19.875 - INFO: Output line 90: [26] Set the default mapping to : '594863bc-d5f5-0b77-e100-00007f000002'(SYSTEMDB)
18:02:19.933 - INFO: Output line 91: [26] Map org: space: to DB: 594863bc-d5f5-0b77-e100-00007f000002
18:02:19.985 - INFO: Output line 92: [27] --- Installation step: Audit-Log Service (DB) ---
18:02:19.994 - INFO: Output line 93: [27] Creating application auditlog-db
18:02:19.994 - INFO: Output line 94: [27] Creating service auditlog-db-container
18:02:23.161 - INFO: Output line 95: [27] Uploading files for application auditlog-db
18:02:23.417 - INFO: Output line 96: [27] Staging application auditlog-db
18:02:24.454 - INFO: Output line 97: [27] 6/20/17 6:02:23.601 PM [STG/1] ERR /bin/sh: mc: line 1: syntax error: unexpected end of file
18:02:24.454 - INFO: Output line 98: [27] 6/20/17 6:02:23.601 PM [STG/1] ERR /bin/sh: error importing function definition for `mc'
18:02:24.454 - INFO: Output line 99: [27] 6/20/17 6:02:23.604 PM [STG/1] ERR bash: mc: line 1: syntax error: unexpected end of file
18:02:24.454 - INFO: Output line 100: [27] 6/20/17 6:02:23.604 PM [STG/1] ERR bash: error importing function definition for `mc'
18:02:24.454 - INFO: Output line 101: [27] 6/20/17 6:02:24.160 PM [STG/1] ERR bash: mc: line 1: syntax error: unexpected end of file
18:02:24.455 - INFO: Output line 102: [27] 6/20/17 6:02:24.161 PM [STG/1] ERR bash: error importing function definition for `mc'
[...]

First off: why is the error message (indicated by ERR) marked as a non-error INFO output line?
That is unnecessarily confusing.

Then looking at the error messages really didn’t help all that much:

/bin/sh: mc: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `mc'

bash: mc: line 1: syntax error: unexpected end of file
bash: error importing function definition for `mc'

“What is function ‘mc’?” you might ask and so did I.
Turns out, many moons ago, when I first installed my NUC system, I opted for some convenience in the dire straits of the Linux terminal console and installed Midnight Commander (a fantastic tool, everyone should have that!).
MidnightCommander’s program file happens to be called ‘mc‘… so, this seems to be a hot lead.

Unfortunately, I’m not the super-duper-Linux-expert, which is why I haven’t been aware of the way that Linux shells technically implement their startup.
Why would I need to know this? Well, after some lengthy research, I found that the error message was caused by a shell initialisation script stored in system folder

/etc/profile.d

Without going into too many details here (look here instead), there were two symbolic links (mc.sh and mc.csh) in this folder, pointing to files from the MidnightCommander installation.

After deleting those links (you can always recreate them via `ln /usr/share/mc/mc.sh /etc/profile.d/mc.sh` and `ln /usr/share/mc/mc.csh /etc/profile.d/mc.csh`), I thought the problem was fixed and I could just continue the installation.
For some reason though, this does not seem to be possible. Instead, I uninstalled the whole half-installed HANA Cockpit setup and started from scratch.

This time around, no script failed due to the “syntax error” and the “unexpected end of file”.

Awesome, until…

Which would have been awesome, if the installation would not have run into a different error right after that:

[...]
18:22:34.688 - INFO: Output line 156: [31] [DEPLOY_SERVICE] Getting multi-target apps in org "HANACockpit" / space "SAP" as COCKPIT_ADMIN...
18:22:34.689 - INFO: Output line 157: [31] [DEPLOY_SERVICE] No multi-target apps found
18:22:34.689 - INFO: Output line 158: [31] [DEPLOY_SERVICE] 
18:22:34.697 - INFO: Output line 159: [31] Checking for existing service lcm-view-grantor
18:22:34.713 - INFO: Output line 160: [31] Creating service lcm-view-grantor
18:22:35.947 - INFO: Output line 161: [31] Checking for existing service lm-service-credentials
18:22:35.960 - INFO: Output line 162: [31] Creating service lm-service-credentials
18:22:36.024 - INFO: Output line 163: [31] Deploying application Product Installer
18:22:36.281 - INFO: Output line 164: [31] [DEPLOY_SERVICE] 
18:22:36.707 - INFO: Output line 165: [31] [DEPLOY_SERVICE] Uploading 1 files:
18:22:36.707 - INFO: Output line 166: [31] [DEPLOY_SERVICE] /hana/shared/H4C/xs/installdata/apps/product-installer/product-installer.mtar
18:22:38.243 - INFO: Output line 167: ERR Failed to upload files
18:22:38.243 - INFO: Output line 168: ERR Error occured during communication with HTTP server
18:22:38.244 - INFO: Output line 169: ERR Broken pipe (Write failed) (local port 63255 to address 127.0.0.1 (localhost), remote host unknown)
18:22:38.255 - INFO: Output line 170: [31] Removing service lcm-view-grantor
18:22:38.258 - INFO: Output line 171: [31] Service binding lcm-view-grantor not found
18:22:38.312 - INFO: Output line 172: [31] ERROR: 
18:22:38.312 - INFO: Output line 173: [31] com.sap.xs2rt.installation.util.InstallationException: 
Installing the Product Installer application failed: 
com.sap.cloud.lm.sl.slp.client.communication.CommunicationException: 
Error occured during communication with HTTP server
18:22:38.312 - INFO: Output line 174: [31] at 
[...]

Once again, error messages are labelled as non-error INFO lines, but good to know each timestamp to the millisecond…

This time around, the DEPLOY_SERVICE complains about a “Broken pipe” when writing to a port on localhost, claiming the “remote host is unknown“.
“How odd!” I thought, because “localhost” is not “remote” at all and not “unknown” at all.

A quick ping localhost proved that “localhost” was indeed known and reachable, so what could be the problem?

The key hint – which I was not able to pick up, instead I found a side comment in one of few open customer incidents – is that this is not about communication via UDP or TCP, but via HTTP.
HTTP does use cookies for a variety of purposes.
Once again, I am not an expert of HTTP standards, but intensive googling led me to these pages:

Count your dots!

The insight from above web pages was that proper cookie domains have to have at least 2 (two!) dots/periods in them.
That’s right, a fully-qualified-hostname like `skullbox.cat5` is not suitable for proper cookies.

> hostname -f
skullbox.cat5

Alright, so I changed the domain name of my system (that has been running HANA and other software in different versions for over a year now and never ever had an issue with the domain name, but hey…) to `skullbox.lab.cat5`.
See?


That’s two dots in that name now:

> hostname -f
skullbox.lab.cat5

To be frank, this error is the one that really frustrated me.

Nowhere in the current or past HANA documentation does it say:
“Make sure to have two dots in your fully-qualified-hostname or we’ll throw random cryptic error messages at you.”

Even with the knowledge of what went wrong, it was a task to find any SAP documentation in relation to this.
The closest I found are

“In accordance with the Netscape cookie specification, cookies can be set for one domain only, and a domain must contain two or three dots (.) due to security restrictions.
Each of the seven top level domains (.COM,.EDU,.NET,.ORG,.GOV,.MIL,.INT) must contain at least one further domain component (usually the name of the company or organization), amounting to two dots.
Each domain with a different ending (this includes the top level domains for countries, such as UK, DE, FR, and so on) must consist of two further domain components, that is, these domains must contain at least three dots. For more information see the cookie specification.

Tip
Examples of valid domains:

<host>.sap.com Top level domain -> two domain components
<host>.portal.sap.de No top level domain -> three domain components”

Rounding off

After another round of reboot and uninstallation of the broken HANA Cockpit setup, hdblcmgui finally went through all the steps, finishing with “green lights”.

So, here we are. 157 lines into the installation of a database monitoring and administration tool that apparently is required to use HANA 2.

Given the information available (before I wrote this blog post), it took me several weeks to get it up and running and nearly two days straight when I finally decided to get to the bottom of the errors and finally have a proper HANA 2 setup.
I cannot say that I enjoyed the experience or felt that it was time well spent.
Hopefully, having these two nasty problems documented and searchable saves others some time.
There you go, now you know!

Eco-friendly slide recycling with HANA

Hey there!

I recently presented a short session on my experiences with developing and debugging non-classic-SAP HANA solutions at the Leading Insights conference in Melbourne (27./28.3.2017).

And since the slides are prepared in a way that they could be used without my talking as well, I thought, I might just post them. So, there they are on slideshare:

if you feel this is a shameless content plug, you’re probably right, but I got the feeling, that this might still be interesting to some readers of the “SAP HANA” tag.

Content-wise, this is what you can expect (just in case you don’t want to swap to page 2 which shows the table of contents):

  • an exciting example for a pure HANA development without any NetWeaver/ERP background
  • some practical tips for finding performance issues with SQL statements
  • and tidbits of performance gone bad and how to fix it.

There you go; now you know.

Cheers,
Lars

Make query hints great again…

In case you ever got frustrated by how 80’s like the SAP HANA hints infrastructure looks like, go and support my improvement idea on SAP Idea place: https://ideas.sap.com/D40094 

Go and upvote! Now! 🙂

[Update 23.1.2017]: After some of the folks that I really liked to review my idea reported that they don’t have access to it, I decided to put the idea text as a copy in here as well.
So here it is:


Currently (HANA 1 SPS12/HANA 2 SP0) SAP HANA supports the usage of query hints via information view parameters or – most commonly used – the WITH HINT query parameter.

Most of the times, the hints are used to avoid a specific bug in SAP HANA and are only applicable to a specific version/revision or application state.

For all practical matters, the hints don’t serve a functional purpose for any application but are necessary band-aids to enable the application to run successfully on SAP HANA.

As many bugs do get fixed with later SPS/revisions, the hints could and should be reviewed in order to allow the application leverage the formerly disabled feature again.

Right now, this means, that a developer/system operator has to manually document and keep track of all hints used in the system, the reason for the usage, the HANA revision the hint applies, maybe a incident numbers where the hint usage had been discussed and probably a target HANA version when the hint shouldn’t be required anymore.

This task is obviously not delivering a lot of value for any developer, so in reality, a hint, once introduced, will likely stay in place forever, as nobody remembers what it was used for and if it’s safe to remove it now.

The proposal is as follows:

Instead of a simple WITH HINT syntax, allow for a more comprehensive hint management that allows for identifying a specific statement (e.g. provide a custom statement identifier in a statement annotation or comment) and provide HANA function to specify a set of hints for any statement ID, including the already mentioned metadata.

Upon SQL query compilation SAP HANA would automatically match the statement ID against the maintained statement hints (this could be a single table structure) and include the hint into the statement execution.

The fact that this happened (and against which statement id hint setting this match had been performed) should be visible in the usual query execution analysis tools like EXPLAIN PLAN or PLANVIZ.

The proposed solution has the following benefits:

Application code does not need to be modified to add hints, as long as the statements can be uniquely identified.

The hint application can be controlled centrally, without the involvement of the original developer – essentially the hint can be applied

by

  • operations/support staff
  • The query compiler in HANA can automatically check for the valid HANA versions and ignore the hint for non-matching versions
  • The hints for a given installation can be transported like a configuration data set across multiple servers
  • The central list of statements with hints settings allows an efficient follow-up of HANA issues and – given an option to ignore the hint list during plan compilation – also allows for very efficient testing of whether a hint is still required or not.

I haven’t found an obvious way to contact support for the SAP Ideas Place, so my best bet is that in order to view any idea a registration is required.

  • Lars

Merging right. Some tests with data loading in HANA from Down Under.

Based on SAP HANA SPS 12 (HCP trial)

When Jody Hesch recently send me an email with a question about HANA, he received what everyone receives when sending email questions about HANA. The reply is a version of

“Usually I don’t do email Q&A as this simply doesn’t help with knowledge sharing.

Instead, I advise everyone to post the question in one of the HANA related forums
(SAP Community https://answers.sap.com/questions/metadata/23925/sap-hana.html ,
JAM https://jam4.sapjam.com/groups/about_page/6UHzR2Fxra4quFAbACtxFD
or even stackoverflow http://stackoverflow.com/questions/tagged/hana ) so that the question and its answers are search- and findable.

That way everyone can benefit from this and you even might get faster and/or better answers than from just writing to me.”

I have not given up on my personal fight against GEIGOKAI and this it what it looks like in daily real-life.

RUDE!

You might be inclined to call this reply rude but it turns out that most people accept this very easily and manage to copy&paste their question into one of the forums – and changing culture is always a bit painful … drip drip drip

And just as many others, Jody managed to post his question for all to see, read and discuss here: Manually managing Delta Merge

While I was doing the due diligence of checking my own ideas against a HCP trial HANA instance (SPS 12.04), the whole explanation and the example got larger than what would be nice for a simple answer, which is why you now read this is a blog post format.

On loading data and merging

Ok, let’s get to it, then.
Jody’s question was “how can manually handling delta merge make a data load faster?”
To answer this I set up a small test case:

drop table merge_test;
CREATE COLUMN TABLE  "MERGE_TEST" ("ID" INTEGER  ,
     "AAA" NVARCHAR(20),
     "BBB" INTEGER ,
     PRIMARY KEY ("ID"));

do
begin
declare ts_start timestamp = current_timestamp;
declare ts_end timestamp;
declare loops integer = 100;
declare j integer;
declare i integer;
declare with_commit integer = 1;

    truncate table merge_test;
    
    for j in 0 .. :loops do
        for i in 0 .. :loops*50 do
            
            upsert merge_test 
                values (:i, :i || ' - ' || :j, :i*:j)
                with primary key;

        end for;
        
        if with_commit = 1 then
            commit;
        end if;
        
    end for;

    ts_end = current_timestamp;    
 
    -- now show some infos
    select seconds_between(:ts_start, :ts_end) as duration from dummy;
    
    select count(*) as num_rows, min(id), max(id) from merge_test;
    
    select column_name, memory_size_in_main, memory_size_in_delta, count, distinct_count
    from m_cs_all_columns 
    where table_name ='MERGE_TEST'
    and schema_name =current_user;
    
    select merge_count, read_count, write_count, last_merge_time
    from m_cs_tables
    where table_name ='MERGE_TEST'
    and schema_name =current_user;
    
    select start_time, motivation, execution_time, memory_merge, merged_delta_records, success
    from m_delta_merge_statistics
    where table_name='MERGE_TEST'
    and start_time >= :ts_start
    and schema_name = current_user;
    
end;
 

What we have here is a simple table and a data loading simulation. The data gets inserted or updated via the UPSERT command and, if the parameter with_commit = 1 a commit is done after each iteration of the outer loop.
After both loops have been left, the code prints the runtime in seconds and selects some statistics from system views.

Next, I ran three experiments.

The first two used the SAP HANA AUTOMERGE feature, where HANA decides when to perform a delta merge.
The third one followed the approach SAP BW uses for data loading: the AUTOMERGE is disabled and the target tables get merged after all data has been loaded.

No commit, AUTOMERGE ON

 

/*
DURATION
386     

NUM_ROWS    MIN(ID) MAX(ID)
5001        0       5000   

COLUMN_NAME MEMORY_SIZE_IN_MAIN MEMORY_SIZE_IN_DELTA    COUNT   DISTINCT_COUNT
ID          1064                3733044                 505101  5001          
AAA         0                   17607386                505101  -1            
BBB         0                   5571068                 505101  -1            
$trex_udiv$ 232                 8889016                 0       0             
$rowid$     768                 15676428                505101  505101        

MERGE_COUNT READ_COUNT  WRITE_COUNT LAST_MERGE_TIME        
0           7           505101      2016-12-20 03:15:41.986

START_TIME  MOTIVATION  EXECUTION_TIME  MEMORY_MERGE    MERGED_DELTA_RECORDS    SUCCESS
*/

Easy to see, the whole thing ran for 386 seconds and managed to create a total of 505.101 record versions in the delta store.
Note that during this load NO delta merge had happened. This is because we never committed the transaction at all and the delta merge won’t “relocate” open update transactions to a new delta store.

Commit, AUTOMERGE ON

/*
DURATION
349     

NUM_ROWS    MIN(ID) MAX(ID)
5001        0       5000   

COLUMN_NAME MEMORY_SIZE_IN_MAIN MEMORY_SIZE_IN_DELTA    COUNT   DISTINCT_COUNT
ID          723716              9132                    5001    5001          
AAA         741076              10052                   5001    5001          
BBB         711828              9132                    5001    5001          
$trex_udiv$ 273432              1912                    0       0             
$rowid$     1504                9132                    5001    5001          

MERGE_COUNT READ_COUNT  WRITE_COUNT LAST_MERGE_TIME       
5           30          505111      2016-12-20 03:40:22.85

START_TIME              MOTIVATION  EXECUTION_TIME  MEMORY_MERGE    MERGED_DELTA_RECORDS    SUCCESS
2016-12-20 03:36:22.166 AUTO        54              FALSE           80016                   TRUE   
2016-12-20 03:37:22.24  AUTO        88              FALSE           90018                   TRUE   
2016-12-20 03:38:22.349 AUTO        119             FALSE           85017                   TRUE   
2016-12-20 03:39:22.49  AUTO        157             FALSE           85017                   TRUE   
2016-12-20 03:40:22.67  AUTO        186             FALSE           85017                   TRUE   
*/   

Here we see that with commits in between the whole thing only took 349 seconds.
As we’ve committed the transaction in between a couple of times, the automerge had a chance to be active – and did so roughly every minute (5 times).

Now, why is this variant a bit faster overall? The effort to find the currently active record versions grows with the number of versions in the delta store.
With the automerge in between, this number stayed consistently below 100.000 versions, while the first version had to face an increasing number of versions to check between round 100.000 and 505.101.

Commit, AUTOMERGE OFF

Finally the version of data loading that is similar to SAP BW’s approach.
I disabled the automerge via

alter table merge_test disable automerge;

and ran the test again:

/*
DURATION
325     

NUM_ROWS    MIN(ID) MAX(ID)
5001        0       5000   

COLUMN_NAME MEMORY_SIZE_IN_MAIN MEMORY_SIZE_IN_DELTA    COUNT   DISTINCT_COUNT
ID          1064                3733044                 505101  5001          
AAA         0                   17607386                505101  -1            
BBB         0                   5571068                 505101  -1            
$trex_udiv$ 232                 8810536                 0       0             
$rowid$     768                 15676428                505101  505101        

MERGE_COUNT READ_COUNT  WRITE_COUNT LAST_MERGE_TIME        
0           15          505101      2016-12-20 03:49:36.914

START_TIME  MOTIVATION  EXECUTION_TIME  MEMORY_MERGE    MERGED_DELTA_RECORDS    SUCCESS

*/

merge delta of merge_test;

/*
START_TIME              MOTIVATION  EXECUTION_TIME  MEMORY_MERGE    MERGED_DELTA_RECORDS    SUCCESS
2016-12-20 03:56:09.435 HARD        46              FALSE           5001                    TRUE   
*/

With 325 seconds this is the fastest run.
Looking at the time the manual delta merge took, we find that it’s still faster than the last delta merge of the automerge example.

This is – again – due to the fact that now the are a lot less concurrent record versions in the table.
Note, MERGED_DELTA_RECORDS shows the number of valid (committed) records that have been moved from delta to main storage.
If I update the same record a hundred times before I commit, then I would only get one record merged into the main store.

Now, why do we see a better performance in this case?

To illustrate the resource usage during our loads, see the following diagrams:

In our first case, we see that the memory usage for the delta store as well as the CPU usage for the UPSERT gets larger over time.
Sometime after the last commit is done, an automerge gets triggered, leading to additional CPU and a lot more memory usage.
The result is a relatively well-compressed table (last little blue square).

Note that the number of uncomitted record versions (little orange squares) increases during the whole time.


With the second approach (commits in between and automerge on), we see that CPU and memory usage during the load stay moderate. Only when the merge is performed, memory and CPU usage increase and the whole process has to wait for the merge to switch over to the delta2 in order to continue.

Noteworthy here is that the resulting table can easily be larger than the table sizes produced by the other methods. This is because the column compression algorithms are determined during the first delta merge and won’t change that easily.
This can lead to a situation where the compression algorithms are determined on a subset of data that is not representative of the total (think of loading data with timestamps or dates in sequential order – the first merge may only see one specific date).

With the last option, we see a very similar pattern than in the first case.
The only difference here is that the final merge was manually triggered.

So with AUTOMERGE disabled and regular commits we get the best of everything:

  • no waits due to delta merges
  • no increased runtimes due to a high number of outdated uncommitted record versions
  • well-compressed table (since the compression optimisation can work on the whole data set).

This works well, as long as the resulting delta merge can be done within the available memory.
And even if other memory structures would need to be unloaded to allow for the delta merge to happen, then this would have to happen only once and not several times during the loading process.

Smarter merge

I mentioned that this third approach is similar to what SAP BW does but this is only half the truth.
Instead of manually sending a

merge delta of merge_test;

which gives the hard instruction to perform a delta merge now, SAP BW uses a “smart merge”.
The idea here is that instead of forcing a delta merge to let HANA evaluate whether a merge would be granted for, given the current memory statistics and given the merge decision functions.

With this flexible approach, the application does not need to care about the specifics of how to decide when a merge should take place, but can yet trigger HANA to take care about that.

TL;DR

Manually controlling the delta merge can improve performance for the load process as well as reduce the overall system load during the data loading, since multiple merges are avoided. Delta merges still are extremely resource intensive operations that can have a massive impact to the whole HANA system. Therefore it’s a good idea to have them done as seldom as possible.
This, in combination with a controlled approached to COMMITTING loaded data can, as shown above, have quite an effect to the runtime of the data loading.

There you go, now you know.
Lars

I’m done (apparently) – see y’all

SCN is coming to an end as we all know and apparently I managed to get through early.

All missions completed and highest level reached.

Now, that leaves a feeling of achievement, even early in the week 😀

screen_shot_2016_10_04_at_08_36_25_1047739

Having this big achievement under my belt, I am a bit disappointed.

No final fight, no end boss, no grand finale! 🙁

Guys, if that’s “gamification” for you, then please, here’s some examples on how the cool stuff looked like:

Or, for the less arcade-inclined players:

Even TGS had it a bit more glamorous… (although not a lot more)

Anyhow, maybe the SCN team can still come up with an epic end rememberable end of service webpage.

I’d like that!

But I assume, it will just be a redirect to the new community or to here.

Cheerio!

 

HANA in a pocket, a skull and some dirty hands on Linux

DISCLAIMER: this blog describes unsupported and unlicensed use of the SAP HANA software. I describe my personal try-out of a hardware/software combination. It’s not in any way meant to be used productively or recommended to use the software in the way I describe.

If you loose data, waste life-time, money, your good mood or damage anything while trying to do what I describe: that’s all on you and there’s no guarantee, warranty or liability whatsoever from SAP or myself.

The frustrating reality of ‘big machine computing’

Just like everyone else in the SAP HANA community, I really like having a private instance available for me.

Trying things out, practicing, researching solutions, working with sample data, checking system behaviour etc. are just so much easier than e.g. on a HCP instance or some “playground” instance on the company’s SAP HANA hardware.

Yes, there are always ways to work around this but e.g. for me this means I have to live with average round trip times of ca. 320ms. That’s certainly not too bad and definitively works for most use cases. Unfortunately the server is not “mine”.

So, if I e.g. wanted to reboot the server for any reason, well, folks probably wouldn’t be too happy …

Also, this server really is only available when I can logon into our corporate network in one way or another. And this turns out to be a limitation that is easily hit.

Customer visit with no immediate WIFI logon?

Nope, no way to access the SAP HANA system then.

Cable provider outage in the home office?

Yep, that SAP HANA server is running fine, far away – without any chance to be used by me.

In short a somewhat local SAP HANA installation would be a really nice thing. Especially since I am not permanently working with tons of data, but rather look into the development/admin aspect of things.

Is there a way or what?

A few weeks ago, I was on a customer visit in San Francisco and the development team there came up with a valid and interesting question: how to provision developers with private SAP HANA instances?

With a private instance, each developer could mess around with his system without impacting anyone else. This is pretty much a standard approach for development nowadays, so the request is rather reasonable.

SAP internally there exist multiple approaches to this problem.

From “small” SAP HANA hardware under the developer’s desk (actually quite expensive machines, that we would have called “workstations” back in the day) to spawn-on-demand-systems on virtual servers: there’s a flavor for everyone.

Unfortunately all that sort of screams: high initial investment.

Not great for smaller teams or individuals who would want to get deeper into SAP HANA development/administration without access to “allowed” hardware.

Coincidentally I read Mr. Atwood’s blog post and was intrigued: this Intel Skull Canyon NUC sounded awesome.

Even though I am not a gamer and the last time I assembled a PC myself was somewhat around 2000 this looked really interesting to me.

The system runs on an Intel i7 quad-core processor and allows up to 32GB of RAM.

That’s certainly enough to run and operate a small SAP HANA instance.

Here’s a picture of this nice piece of technology:

2016_07_04_14_56_37_988523

Cutting the story short

I ordered that thing online together with two 16 GB DDR4 strips and an old/slow but relatively cheap IBM SSD with 480GB capacity.

All in all the bill for this was around 1400 Australian Dollars – not too bad for a system way more powerful than my MacBook Pro.

Of course: this doesn’t come with any display or input device (well, in fact the online shop I ordered from had a special and chipped in a Logitech K400 plus for free 😉 . You have to have a monitor to use this thing.

 

A few days later the whole pack arrived and I was really surprised by the weight of it: easily below one kilogram.

I’ll skip the whole un-boxing and assembly part, but will tell you: it’s easy.

The RAM modules just need to be clicked into place and the SSD installation requires to loosen and tighten a single screw. Definitively doable for a hardware n00b like myself.

The more complicated part – again: for me, others might just do this without blinking – was to put an operating system onto the new machine.

I didn’t buy a Windows license, since the whole purpose of this buy was to eventually run SAP HANA on it, so I went for OpenSuse Linux (Leap Distribution). Creating a boot-able USB thumb drive is relatively easy.

The only changes to this tutorial I did was

  • using ‘pv‘ to get an idea on the progress of the data transfer (installed via ‘brew‘) – yes, I used a Mac for this.
  • using a blocksize of 1M instead of the 4k mentioned in the tutorial.

 

The whole command set then looked like this:

sudo zsh

pv /path/to/downloaded.iso | dd of=/dev/rdisk2 bs=1M

 

Armed with this USB thumb drive stuck into one of the 4 USB ports, I simply started my “skullbox” and the OpenSuse installation screen appeared.

Again, I’m not the Linux expert on the installation side of things, so I basically when with most of the default settings and clicked through.

One reboot after that, I had OpenSuse running on the new machine.

Meanwhile in the kitchen

As TV chefs would say “… meanwhile I prepared something else …” which translates here into the download of the SAP HANA installation files from SAP Service Marketplace (Yes, you do need to have paid access to this).

 

The installation files come in a big multi-part RAR archive with Windows Auto-Extraction, which means, that the first part is an .EXE file.Unpacking on Linux however is not a problem; all you need is the UNRAR tool (if all fails ‘zypper install http://download.opensuse.org/repositories/openSUSE:/Leap:/42.1:/NonFree/standard/x86_64/unrar-5.3.3-1.3.x86_64.rpm should fetch this).

 

The rest of the installation is fairly well documented, and only small additional changes were required to make the installation work on this NOT SUPPORTED operating system:

 

a) PNG Library couldn’t be loaded, and the hdblcmgui woudln’t start.

Fixed by installing an older version via zypper install –oldpackage libpng12-0′

 

b) nameserver couldn’t be started due to missing SSL libs.The trace file listed

Can’t load ‘/playfield/hana_inst/51050838/DATA_UNITS/HDB_SERVER_LINUX_X86_64/instruntime/SSLeay.so’ for module Net::SSLeay: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory at DynaLoader.pm line 193.”

 

Fixed by ‘zypper install libssl*’ and

ln -s /lib64/libcrypto.so.1.0.0 /usr/libcrypto.so.0.9.8

ln -s /lib64/libssl.so.1.0.0 /usr/lib64/libssl.so.0.9.8

 

(as seen here)After that the installation went through without further troubles.

The looooong fork

I have enabled the option for remote administration in OpenSuse, which means that the system now can be accessed via VNC.

For my Windows 10 based work laptop I find that MobaXTerm makes it really simple to access the Linux box – it even has a plugin that contains a VNC client.

For the Mac, there is actually a VNC client build in (I’m using El Capitan): just start “Screen Sharing” or put vnc://hostname:port into the Safari address bar and off you go…

 

After all of this (and some more wrestling with the Linux network setup – I still barely have a clue on that, but it seems to work now…)

I can now access my “HANA skullbox” via:

  • WIFI connection at homeoffice
  • drect CAT5 cable connection between my work laptop and the NUC

which are exactly the scenarios I wanted to support.

Starting a HANA Studio via X-Server works too:

2016_07_04_16_29_01_988524

But that’s neither fast nor pretty.

For working with the system I prefer using the local HANA Studio or the Web UI.

Of course I used ‘memory allocation limit’, but to have some fun, I cranked it up to the total 32 GB of installed RAM; so far no functional problems with this 😉

All of this setup happened after hours over maybe three evenings. Someone with more Linux know-how would probably be done in way less time.

So far I’m quite happy with this setup and I’m eager to try out what things work and where the hard limit of usability for this micro-HANA lies.

OK, that’s it for now.

AGAIN: this is a completely unsupported setup. No support, warranty, maintenance or endorsement is provided by SAP or myself for this.

There you go – now you know!

Lars

 

SAP HANA community continues to grow and so does the moderator team

Hello HANA community!

As of last week, we are happy to have a new moderator in our team: Michael Healy

Michael lives and works in Dublin, Ireland and works in the SAP Digital Business Services group helping customers around the globe with their SAP HANA implementations.

When he’s not solving support incidents he is busy writing well liked documents on SAP HANA troubleshooting, such as

Troubleshooting SAP HANA Authorisation issues

Troubleshooting SAP HANA Delivery Units and HANA Live Packages issues. (HALM).

Troubleshooting Hanging Situations in HANA

Troubleshooting SAP HANA High CPU Utilisation

And if there’s still time left he’s also answering question in the forum.

What an engagement! 😎

His efforts had been recognized with the Topic Leader award 2015/2016 in the SAP HANA and In-Memory Computing space.

All this was not enough for him and he seemed quite happy when I asked him if he would consider joining the moderator’s team – lucky us!

Here’s the whole team at a glimpse:

Vancouver, CAN Dublin, IRE Passau, GER Melbourne, AUS
Lucas Oliveira Michael Healy Florian Pfeffer Lars Breddemann

For the more visually inclined readers, here’s a nice and colorful map view (click for the website):

2016_06_27_12_02_44_983326

All in all the moderator team now spans all time zones and covers most areas of SAP HANA technology. It also consists of Top Contributors, SAP Press Authors and SAP HANA Distinguished Engineers (no SAP Mentors yet, but who knows what the future holds… ), so I’d say the space is in a good position to enable informed and interesting discussions and knowledge exchange.

From my end of the world (pin “C” on the map 🙂 ) thanks to Lucas, Michael and Florian for their great work!

Cheers from Down Under,

Lars

 

On multiple mistakes with IN conditions

Based on SAP HANA SPS 11

Update 1.6.2017

Retesting the APPLY_FILTER() approach on rev. 112.07 showed that it is now possible to simply hand APPLY_FILTER() an IN condition with a list of integers.
This approach for the workaround of the general limitation on multiple input parameter values for scripted calculation views is also shown in SAP note  “2315085 – Query with Multi-Value Parameter on Scripted Calculation View Fails with Incorrect Syntax Error.

Therefore, if you are using a current version of SAP HANA and still develop/use scripted calculation views, it is not recommended to use the CE-function approach I explained in this blog post. The limitation that this approach worked around has been fixed in SAP HANA.


Dear readers

there is a long-standing modelling problem with SAP HANA calculation views:

Using multiple input parameters to filter data similar to the SQL IN predicate.

This discussion Handling multi value input parameters can be taken as a comprehensive example.

It seems so straight forward at first and so practical.

Once the input parameter is defined the data preview tool built into SAP HANA Studio or your reporting client of choice can read the meta data for it and present the user with a nice UI dialog to specify values.

Something as fancy as this:

input_par_967338

Now, the way that this works is rather counter intuitive.

For graphical calculation views, there are a couple of nicely written blog posts available, like Using Multiple Values in Input parameter for filtering in Graphical Calculation View but it seems that scripted calculation views did simply not want to be as flexible.

For those, rather clunky (and not very well performing) solutions had to be built to make it possible at all, (see SAP HANA: Handling Dynamic Select Column List and Multiple values in input parameter or How to process and use multi-value input parameter in a scripted view in HANA)

Either the solution involved dynamic SQL or some form of parameter string mangling with loops and pseudo-dynamic temporary result set constructs.

Other approaches proposed to avoid the problem altogether and use multiple parameters (instead of one multi-valued parameter).

Developer arrogance driving solution finding…

The last time I read one of those discussions (yesterday) I thought:

 

“This cannot be the right solution. There must be some easier way to do it!”

So arrogance got the better of me – HA! It cannot be that difficult. (It’s so cheesy that for once Comic Sans is a fitting choice).

I dare to guess that nearly every developer had that feeling every now and then (if not, I would have a hard time finding a good explanation for so many drastically underestimated development efforts…)

Attacking the problem

My first impulse was to use the APPLY_FILTER() function, but I soon learned what many others probably discovered before: it doesn’t solve the problem.

The reason for that is the way APPLY_FILTER() works.

It takes the table variable and your filter string and constructs a new SQL statement.

For example, if your table variable is called vfact and your input parameter selection was 1, 2 and 5 your scripted calculation view could look like this:

/********* Begin Procedure Script ************/
BEGIN
  vfact = select * from fact;
  declare vfiltD10 nvarchar(50); -- this is a temp variable to construct the filter condition
  vfiltD10 = ' "DIM10" IN ( ' || :IP_DIM10 || ' )';
  var_out = APPLY_FILTER (:vfact, :vfiltD10);
END /********* End Procedure Script ************/

This compiles fine and if you try to run it with some parameters you are greeted with a surprise:

SELECT
     "DIM10", "DIM100", "DIM1000", "DIM1000000",
     "KF1", "KF2"
FROM "_SYS_BIC"."devTest/MULTIIP"
        ('PLACEHOLDER' = ('$$IP_DIM10$$','1,3,6')) ;

Could not execute ‘SELECT “DIM10”, “DIM100”, “DIM1000”, “DIM1000000”, “KF1”, “KF2” FROM “_SYS_BIC”.”devTest/MULTIIP” …’ in 373 ms 962 µs .

SAP DBTech JDBC: [2048]: column store error: search table error:  [2620] “_SYS_BIC”.”devTest/MULTIIP/proc”: [130] (range 2) InternalFatal exception: not a valid number string ‘1,3,6’

Not only is this error annoying, but it’s FATAL… shudder!

After some investigation, I found out that the input parameter not only provides the digits and the separating commas but also the enclosing single-quotes.

Nothing easier than getting rid of those:

  vfiltD10 = ' "DIM10" IN ( ' || replace (:IP_DIM10 , char(39), '')  || ' )';

With this, the single-quotes get easily removed (39 is the ASCII value for the single quotes and the CHAR function returns the character for the provided ASCII code – this just makes it easier to handle the double-triple-whatever-quotation syntax required when the single-quote character should be put into a string).

Of course, seeing that we have not yet reached the end of this blog post, you already know: that wasn’t the solution.

The problem here was not only the quotation marks but also that  SAP HANA does not parse the string for the input parameter value. The result for the filter variable is that we do not get the condition

  actual condition          ===> syntax structure

  “DIM10” IN ( 1, 3, 6)     ===> X IN ( c1, c2, c3)

but

  “DIM10” IN ( >’1, 3, 6′<) ===> X IN ( c1 )

So even when we remove the quotation marks, we still end up with just one value (I enclosed this single value in >’ ‘< for easier distinction).

Interlude

The different syntax structures pointed out above are easily overlooked also in standard SQL. Often developers do not fully realise that an IN condition with 3 parameters is structurally different from an IN condition with 2 or 4 parameters.

Whenever the number fo parameters of the IN condition changes, the statement is effectively a new statement to the database, requiring new parsing and optimisation and also allocating its own space in the shared SQL cache.

This is another detail that ABAP developers do not need to worry about, since the

SAP NetWeaver database interface gracefully splits up IN-lists into equal chunks and recombines the result set automatically. See this ancient piece SAP Support case “FOR ALL ENTRIES disaster” for more details.

One approach to avoid this issue can be to use temporary tables instead of the IN condition. Especially when parsing/query optimisation is taking a long time for your application, this might be an approach worthwhile to implement.

Back to the main topic though!

So, the “obvious” approach of using APPLY_FILTER() does not help in this case.

Is it possible that it is just not possible to take multiple input parameter values into an IN list? But graphical calculation views can do it – and rather easy.

And in this observation laid the key for the solution. What is different between graphical and scripted calculation views?

Right, graphical calculation views do not produce SQL for the boxes we draw up.

Technically speaking it replaces them with Plan Operators – very much similar to the abandoned CE_-functions.

Do you see where this is heading?

Yes, indeed. The solution I found works with CE_-functions.

Oh, how very heretic!

May the performance gods get angry with me for making the SAP HANA execution switch engines…

But first, let’s look at the solution, shall we?

/********* Begin Procedure Script ************/
BEGIN
     vfact = select * from fact;
  var_out = CE_PROJECTION(:vfact,
                     [ "DIM10", "DIM100", "DIM1000", "DIM1000000"
                     , "KF1", "KF2" ],
                      'IN ("DIM10", $$IP_DIM10$$)');
END /********* End Procedure Script ************/

Easy to see, this approach mimics the filter approach for graphical calculation views.

To not over complicate things I only used the CE_PROJECTION function for the filter part – everything else is still in efficient, familiar SQL.

Important to note is that this works only, when the input parameter is referenced with the $$<name>$$ format.

Also important to recall is that the complete filter expression needs to be provided as one string enclosed in single quotation marks ( ‘ <filter expression goes here> ‘ ).

OK!“, you may say, “this works, but now you broke the holy rule of CE_-functions damnation. The performance of this surely is way worse due to the implicit engine change!

Well, let’s have a look into this!

First the explain plan for the SQL based statement:

SELECT
     "DIM10","DIM100","DIM1000", "DIM1000000",
     "KF1","KF2"
FROM FACT
where DIM10 IN (1,3,6) ;

OPERATOR_NAME   OPERATOR_DETAILS                                         EXEC_ENGINE SUBTREE_COST

COLUMN SEARCH   FACT.DIM10, FACT.DIM100, FACT.DIM1000, FACT.DIM1000000,  COLUMN       1.645529062

                FACT.KF1, FACT.KF2                                                  

                (LATE MATERIALIZATION, OLTP SEARCH, ENUM_BY: CS_TABLE)              

  COLUMN TABLE  FILTER CONDITION:                                                   

                (ITAB_IN (DIM10))                                                   

                FACT.DIM10 = 1 OR FACT.DIM10 = 3 OR FACT.DIM10 = 6       COLUMN    

Now the scripted calculation view version:

SELECT
     "DIM10","DIM100","DIM1000", "DIM1000000",
     "KF1","KF2"
FROM "_SYS_BIC"."devTest/MULTIIP"
        ('PLACEHOLDER' = ('$$IP_DIM10$$','1,3,6')) ;

OPERATOR_NAME   OPERATOR_DETAILS                                         EXEC_ENGINE SUBTREE_COST

COLUMN SEARCH   FACT.DIM10, FACT.DIM100, FACT.DIM1000, FACT.DIM1000000,  COLUMN       1.645529062

                FACT.KF1, FACT.KF2                                                  

                (LATE MATERIALIZATION, OLTP SEARCH, ENUM_BY: CS_TABLE)              

  COLUMN TABLE  FILTER CONDITION:                                                   

                (ITAB_IN (DIM10))                                                   

                FACT.DIM10 = 1 OR FACT.DIM10 = 3 OR FACT.DIM10 = 6       COLUMN     

See any difference?

No?

That’s right, there is none. And yes, further investigation with PlanViz confirmed this.

SAP HANA tries to transform graphical calculation views and CE_-functions internally to SQL equivalents so that the SQL optimizer can be leveraged. This does not always work since the CE_-function are not always easy to map to a SQL equivalent, but a simple projection with a filter works just fine.

Now there you have it.

Efficient and nearly elegant IN condition filtering based on multiple input parameters.

There you go, now you know.

Have a great weekend everyone!

Lars

 

SAP HANA, databases and computers