All posts by Lars Breddemann

SAP HANA fiddler, knows computers, database systems and data warehouses. Shares his knowledge. SAP Mentor, SAP Employee, with the SAP Health team

Finding answers on workdays

Cunningham’s Law is the idea that “the best way to get the right answer on the internet is not to ask a question; it’s to post the wrong answer.“.

More often than not I feel that some of the answers provided to questions on the SAP Community platform follow this approach, probably being unaware that there is a name for this strategy.

XKCD Duty calls

Question “SAP HANA Exclude weekends while calculating diff between two dates.” is such a question.

“I have requirement to find out the Number of hour between two dates, but we have to exclude the weekends (Saturday & Sunday).

Example : DATE1 is 19-July and DATE2 is July-26 – Actual diff is – 168 Hours.

Required Output is (168 – 48 Hours (Saturday & Sunday) – 120 Hours.

This I want to achieve using Graphical or Script Cal view, please help me to solve the issue”

The OP did not bother googling for solutions for this very common problem nor does (s)he seem to have access to classic SQL development literature, like “Jeo Celko’s Thinking in Sets“.

This is one of those “throw the requirement over the fence” question and really should have been moderated.
However, as so often, an eager answerer offered his help and provided the following code to solve the problem:

	declare v_i integer;
	declare v_count integer;
sel1 = select 
			"0CALDAY" as "CALDAY",
 	 		"Yesterday" as "YESTERSDAY",
	 		 sum("DATE_DIFF") AS "DATE_DIFF" ,	 		 
	 		 sum((select 0 from dummy)) as "HOUR_DIFF",
	 		 sum((select 0 from dummy)) as "WRK_HOUR_DIFF"
	 		 from  "_SYS_BIC"."ZTABLE"
--	 		 where "0CALDAY" >= '20180701'
			 GROUP BY "0CALDAY", "Yesterday";

select count(*) into v_count from :sel1;

for v_i in 1..v_count do
	WHILE to_date(:sel1.CALDAY[v_i]) < to_date(:sel1.YESTERSDAY[v_i]) do 
		if( weekday(to_date(:sel1.CALDAY[v_i])) ) < 5 
			sel1.WRK_HOUR_DIFF[v_i] = :sel1.WRK_HOUR_DIFF[v_i] 	+ 24 ;
			sel1.HOUR_DIFF[v_i] 	= :sel1.HOUR_DIFF[v_i] 		+ 24  ;			
			sel1.HOUR_DIFF[v_i] 	= :sel1.HOUR_DIFF[v_i] 		+ 24  ;			
		end if;
		sel1.CALDAY[v_i] = ADD_DAYS(to_date(:sel1.CALDAY[v_i]), 1) ;
	end while ;	
end for;

select * from :sel1;


Reading through this approach several problems stick out:

  • this really shouldn’t require SQLScript at all and definitively not a loop
  • using ‘SELECT 0 from DUMMY’ to generate a constant value of integer 0 is plain wrong
  • summing constant 0 over a grouping set is plain wrong as well – the result is a constant 0
  • removing the aggregation allows for removing the grouping altogether: a SELECT DISTINCT would do the trick here
  • the number of records in sel1 depends solely on the number of different values in ZTABLE.CALDAY a table that has not been defined
  • this number of different CALDAY values is used to loop and add 24 hours to the sel1 table variable HOUR_DIFF and WRK_HOUR_DIFF to keep a tally of hours between dates, which could have been done with HANA SQL functions directly

This looks very much like the kind of SQL code developers tend to write that are not “at home” with SQL and rather try pushing an imperative programming style into their SQL code.

This code is inefficient, slow and hard to maintain.
In short: don’t do it like this.

Alternative solutions

The first approach for this type of common problem really should be to not trying to develop the solution yourself, but instead to look for existing standard solutions.
The mentioned book uses so-called auxiliary tables for this (see chapter 9.5 Calendar Tables) and so does SAP NetWeaver and also the freestyle solution presented in “Want to Calculate WORKDAYS_BETWEEN without TFACS Table?” .

The core of this solution approach is to maintain a table of all dates that indicates whether or not a given date is a weekend or not. Such a table could even hold the information if a given date has been a working day or a public holiday.

Looking at the requirement of “excluding weekends” from counting hours being able to discern workdays from holidays seem like a close-by requirement.

Let’s stick to the case of simply excluding the weekends which the OP defined as Saturday and Sunday.
When working on such a requirement it is particularly important to consider the calendar context in which the solution should work.
Certain calendars (e.g. Isreali calendar) have Friday and Saturday as their weekends.
When the underlying requirement actually is “hours of working days” this might become more complicated with different calendars. Consider, for example, that daylight saving time shifts effectively add/subtract hours – commonly over weekends, but sometimes also on working days.
The OP hasn’t mentioned anything beyond the description of a mechanical calculation problem, so we cannot know what the underlying requirement really was.

Anyway, a quick solution for the simple counting of hours on days other than Saturday and Sunday would be to use the SAP HANA built-in table “_SYS_BI”.”M_TIME_DIMENSION”.
This table provides dates in different formats such as DATE_SQL as well a day of the week number DAY_OF_THE_WEEK_INT (Monday =0, Sunday=6).

With data generated in the table, the original question can be answered as “number of non-weekend days between the given days multiplied by 24 hours

    select count(*) as no_of_workdays,
           count(*) * 24 as no_of_hours
    from (
            from "_SYS_BI"."M_TIME_DIMENSION"
            where day_of_week not in (5, 6)
            and date_sql between to_date('19.07.2018', 'DD.MM.YYYY')
                             and to_date('26.07.2018', 'DD.MM.YYYY')

    6             	144        

Note how this differs from the OPs calculation by one day.
Checking the inner query gives us:

    19/07/2018	3              
    20/07/2018	4              
    23/07/2018	0              
    24/07/2018	1              
    25/07/2018	2              
    26/07/2018	3              

Which means that the OP did miscalculate the hours in the example, by not including the last day of the given range.
This points to another common trap when dealing with dates and intervals: one needs to be clear about whether interval end dates should be included or excluded.

The same approach can be taken with graphical calculation views:

Example calculation view with filter expression

    count(distinct "DATE_SQL" ) as no_of_workdays,
    count(distinct "DATE_SQL" )  * 24 as no_of_hours


The broader view

I mentioned above that this common problem can be broadened to “separate working days from non-working days”.
The SAP business applications have been dealing with such problems for many decades now and so it’s not surprising that there is indeed a standard solution available for this.
This standard solution comes in the form of “FACTORY CALENDARS”.
SAP Netweaver table TFACS holds one row for each year and each calendar.
Every row contains twelve column with a “bitfield” represented by ‘1’ and ‘0’ characters for each day of the month.
A ‘1’ indicates a working day and a ‘0’ denotes a non-working day.

There are some other columns available, but for our purpose, this is what we need to know.

AU 2018 0111100111110011111001111000111 1100111110011111001111100111 ... 0011111001111100111110010010110

Typically this table is re-used or replicated from the SAP NetWeaver environment so that all date calculations are using the same calendar definitions.
If you want to use the table without an SAP NetWeaver system you could use the following commands to create it yourself:

create row table TFACS
( IDENT NVARCHAR(2) not null -- factory calendar
, JAHR NVARCHAR(4) not null -- year stored
, MON01 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON02 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON03 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON04 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON05 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON06 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON07 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON08 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON09 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON10 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON11 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON12 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, BASIS NVARCHAR(5) not null -- start of factory day numbering
, FENUM NVARCHAR(3) not null -- number of days in year
, WENUM NVARCHAR(3) not null -- number of workdays in year
, LOAD NVARCHAR(1) not null -- load year in bufer flag ('X' = YES)
, primary key (IDENT, JAHR));

As an example I entered one row with the working days of Australia/Victoria in 2018:

insert into tfacs values (
'AU', '2018'
-- 1 2 3
-- 1234567890123456789012345678901
, '0111100111110011111001111000111' -- january
, '1100111110011111001111100111' -- february
, '1100111110011111001111100111100' -- march
, '001111100111110011111001011001' -- april
, '1111001111100111110011111001111' -- may
, '100111110001111001111100111110' -- june
, '0111110011111001111100111110011' -- july
, '1110011111001111100111110011111' -- august
, '001111100111110011111001111100' -- september
, '011110011111001111100111110011' -- october
, '11001111100111110011111001111' -- november
, '0011111001111100111110010010110' -- december
, ' '
, ' '
, ' '
, ' ');

With this table in place, SAP HANA provides SQL function WORKDAYS_BETWEEN (see also the SAP HANA Academy video), that returns the desired number of working days directly:

SELECT WORKDAYS_BETWEEN('AU', to_date('19.07.2018', 'DD.MM.YYYY')
, to_date('26.07.2018', 'DD.MM.YYYY')) "no of working days"
FROM dummy;

no of working days

Note that here, the “between” condition is understood as “workdays computed starting on the 19th and finishing on, but not including, the 26th” (see documentation on this) which seem to be the understanding of the OP.

The benefits of using calendar tables over pure function approaches are clear:

  • it’s easy to understand which days are considered working days and which days are not
  • multiple calendars can be maintained and used in parallel (different years, countries, union agreements, operating shifts, etc.)
  • the SQL/application code does not need to be changed when the calendars need updating

There you go, now you know.

Trouble with time?

There are a couple of topics that are an ongoing source of confusion even for experienced IT professionals and developers.
UNICODE handling is one example (see here for info), floating point calculations is another one (this is a good primer) and then there is everything to do with time.

Dealing with dates, time, time zones and the like is confusing and counter-intuitive in itself; adding computers to the mix doesn’t make this any better.

This post is going to shine some light on what goes into date/time handling in SAP HANA and tries to answer some questions that I came across recently.

The big picture

The SAP HANA DBMS obviously doesn’t come with its own timekeeper (though I think it would make a nice gift for HANA DBAs) but uses date/time information provided by the Linux operating system whenever it needs to know the current time.

Linux gets its time from different possible sources, starting with the hardware clock (aka BIOS clock) and, if configured, the ntpd(aemon).
The man-pages on this (hwclock, date, tz, adjtimex, /proc/timer_list) are an interesting read.

A super-short version of the story here is: When the computer is started, the hardware clock sets the current time kept by the battery and the BIOS. This time does not have a time zone assigned to it – it is open to interpretation.
At some point, Linux starts up, reads the hardware clock and sets the current system time and date considering the time zone that was declared for this computer when Linux was installed (yes, one can change this later on).
From here on the hardware clock is out of the picture. All system requests concerning current date/time are served from the Linux managed clock(s) – the System time – and likely continuously updated via the ntp-daemon.
It is possible to check the current hardware time (output in ISO 8061 format):

> sudo hwclock --utc
  2018-03-26 17:57:36.608111+1100

The +1100 is the time zone offset that the system time uses/assumes. The hardware clock does not ‘know’ what timezone it is in, so specifying either –utc or –localtime and checking which results in the correct local time is crucial here. Once again, the man-page for hwclock has the whole story.

Comparing this to the output of  date:

> date
  Mon Mar 26 11:05:37 AEDT 2018

we learn the system time is set to use the Australian Eastern time zone.
Another way to check which timezone has been set on system level is to check where the file-system link /etc/localtime points to:

> ls -l /etc/localtime
  lrwxrwxrwx 1 root root 39 Mar 12 18:48 /etc/localtime -&gt; /usr/share/zoneinfo/Australia/Melbourne

The /usr/share/zoneinfo folder is the system timezone database directory.
All time zones, including Daylight Saving Time (DST), that the Linux system knows about, can be found here. Since both time zone definitions, as well as DST rules, change regularly (see Time Zone News), these files are commonly updated during Linux updates.
We will come back to the topic of Daylight Saving Time later on – it is yet another can of worms.

The time of the day

Here we are, with Linux booted up and ready to go. We can log in now and ask for the current time and date (as seen above):

> date
  Mon Mar 26 18:17:07 AEDT 2018

But what if my user should use a different time zone?
To do that, the Linux shell provides a setting in form of an environment variable named TZ.
Setting the variable to one of the time zones in the timezone database directory allows programs to return the date in the local time of choice:

> export TZ=UTC+0
> echo $TZ
> date
  Mon Mar 26 07:18:00 UTC 2018

Any program started from a shell with this setting will now work with the time zone set to UTC.

Becoming a profiler

If this setting should be made permanent (just for the current user) the environment variable has to be set in the shell profile of the user.
The shell profile is an auto-start script that contains the commands the shell should execute every time it gets started.
For the common Bourne shell this script file is called .profile.
As the name starts with a dot, it is a so-called hidden file, but using ls -a in the user’s home directory lists it nevertheless:

> cd ~ # change into the user's home directory
> ls -la # list all files, including the 'hidden' files

 drwx------ 1 root      root     730 Mar 26 05:53 .
 drwxr-xr-x 1 root      root     200 Jan 31 00:51 ..
 drwx------ 1 root      root      16 Jun 13  2017 .hdb
 -rw------- 1 root      root     146 Mar 26 07:15 .lesshst
 drwx------ 1 root      root      10 Jun  6  2017 .local
 drwx------ 1 root      root      34 Sep 11  2017 .mozilla
 -rw-r--r-- 1 root      root      80 Jan  3  2017 .profile   &lt;&lt; this one

If the line export TZ=UTC+0 is added to the .profile-file then the new time zone will be set for the user every time the shell is used.

Now, for Linux based SAP systems things are a tad more … flexible. And with “flexible” of course, I mean complicated.
Putting all settings into just a single file would make managing system- or organisation-wide settings rather tedious.
Maybe some setting should be user-specific, while others shall apply to all users of a system and so on.
For that, there is the option to call other .profile scripts.

Cutting to the chase, for the SAP HANA operating system user <sid>adm we’re looking at the following profile scripts:

 .profile  - standard autostart-file for Bourne shell 
 .login    - C-shell version
 .bashrc   - init script for the BOURNE and KORN Shell
 .cshrc    - standard profile for the C-shell (same content, but slightly different syntax)  - looks for .sapenv scripts and executes them
 .sapsrc.csh - C-shell version    
 .sapenv_<hostname>.sh  - SAP system environment setting - specific to the local host
 .sapenv_<hostname>.csh - SAP system environment setting - specific to the local host - C-shell version  - SAP system environment setting
 .sapenv.csh - C-shell version  - User specific settings
 .customer.csh - C-shell version

Most of these files are created by the installation program for SAP HANA.
When we log in as <sid>adm the files get executed in the following order:

(1) .profile -> (2) .bashrc -> (3)  -> (4) sapenv_<hostname>.sh -> (5) (for HANA systems only)
                            |              OR  -> (4)            
                            +> (6)

Looking at this impressive entanglement you might not be too clear about where to put the time zone setting.

The answer, fortunately, is easy, as nearly all the script files come with a warning in a header comment, telling you to not edit it manually.

This leaves the as the target file for all customization including time zone settings and additional paths or environment settings for Smart Data Access (SDA) connections.
Fortunately, somebody put a comment into .profile to make clear that all custom settings will be found in That way, even the SAP-unaware Linux admin has a fair chance to figure out in which file settings are done.

SAP HANA zones in time

At this point, we can configure the system’s hardware clock, the Linux system clock and the timezone that SAP HANA should use for anything where the “local” timezone is used.

This setting sets the home-timezone for the SAP HANA installation. Important here is to pay attention to the term “installation”.
Since tenant databases become the standard for SAP HANA, it has also become possible to use “strong isolation” mode, separating SAP HANA processes of different tenant databases on Linux level. Each tenant database can run in its own Linux user security context.
It is easy now to assume that, since the databases run in their own Linux user accounts, they could each have their own environment with separate timezone settings.
This is, however, not the case, since all tenant databases of an SAP HANA installation share the same nameserver (hdbnameserver) process. And it is the environment of this nameserver process that is shared by all tenant databases.

Therefore, all databases of an SAP HANA installation share the same timezone (as well as the same OS setup for external libraries, like 3rd party DBMS clients, etc.).
If one needs SAP HANA databases with different time zones on the same system, a separate installation is necessary. (state as of SAP HANA 2 SPS 02)

What’s it all used for – SAP system timezone handling

All of the mechanisms described so far are in place to provide Linux programs access to a current time that makes sense to the user of the system.

For shared programs like SAP HANA, it is debatable what the current local time should be since users can access the system from around the globe out of completely different time zones.
Whatever time zone you decide upon, often it is simply the local timezone of where the data center is actually located, where most of the users are or the UTC timezone, you should probably pick a Standard timezone since the time changes for summer and winter time can lead to confusion with timestamp gaps and duplication, which can be especially confusing when trying to read log files.
Also, in an SAP system landscape, say with SAP NetWeaver systems based on SAP HANA, the NetWeaver application servers and SAP HANA have to have the same time, including the timezone.

Up to here, all we’ve looked at was telling the local current time. In SAP HANA one can use functions like now(), current_time(), current_date(), etc. to get the current SAP HANA system time. In many cases, there is also a UTC counterpart for these functions, e.g. current_utctime() if your application should rather work with the UTC timestamps.

SAP HANA also provides conversion functions, utctolocal()/localtoutc(), to convert between local time and UTC date/timestamp information.

Note that the documentation strongly discourages to store data in local time zone. SAP HANA date/timestamp data types don’t store the used timezone together with the data, so the application reading the data would need to keep track of that.
Instead, data should be stored as UTC and, if local date information is required, the conversion functions should be used.

Let’s stop at this point and take a look at what goes into this conversion.
I mentioned above that time zones and the rules for Daylight Saving Time (DST) are in constant change.
How can SAP HANA know what the currently correct mapping is?

In You got the time? I explained that SAP HANA can use built-in defaults or use a special set of tables that contain the current mapping information.

These tables and more information can be found in SAP note 1791342 – Time Zone Support in HANA.

To make sure the timezone data had been loaded without problems, the view M_TIMEZONE_ALERTS can be used.
If SAP HANA is started without the timezone tables loaded, this view returns:

skullbox    31203   TABLES NOT FOUND    ?               ?               ?              

The following timezone tables could not be found or are not readable: SYSTEM.TTZD, SYSTEM.TTZDF, SYSTEM.TTZDV, SYSTEM.TTZZ

skullbox    31204   TABLES NOT FOUND    ?               ?               ?

The following timezone tables could not be found or are not readable: SYSTEM.TTZD, SYSTEM.TTZDF, SYSTEM.TTZDV, SYSTEM.TTZZ

Per default, SAP HANA looks for these tables in the SYSTEM schema and expects the entries to be relevant for client 001.
If you want to use a different set of tables, you can adapt these configuration parameters

= Configuration
  Name                                  | Default    | System
  indexserver.ini                       |            |
    global                              |            |
      timezone_dataset                  | sap        |
      timezone_default_data_client_name | 001        |
      timezone_default_data_schema_name | SYSTEM     |

Important to note here is that there is only one set of conversions possible per SAP HANA database, even if you have multiple SAP clients in the database.
Likewise important: these tables need to be loaded into every tenant database and the SystemDB.
The tenant DBs cannot use the tables from the SystemDB or from another tenant DB.

Yet another caveat with the SAP HANA timezone conversion functions is that they don’t take a reference date, but always work based on the currently available conversion rules. If your application requires historical timezone conversions, then this needs to be implemented separately.

In order to check what time zones are currently known to SAP HANA the system view TIMEZONES  is available:

select * from timezones;

AUSNSW            	sap             
ALAW              	sap             
UTC               	sap             
AUSSA             	sap             
PST_NA            	sap             
BRZLWE            	sap             
AUSNT             	sap             
UTC-7             	sap             
Africa/Abidjan    	platform        
Africa/Accra      	platform        
Africa/Addis_Ababa	platform        
Africa/Asmera     	platform        

Time zone conversion and Daylight Saving Time (DST)

At a glimpse, this whole topic of timezone conversions can look very simple. After all, it is simply adding or subtracting some hours between two time-zones, isn’t it?

To make the life of time zone converters more interesting, Daylight Saving Times have been introduced in some countries for some time. When referring to “local time” this usually includes any DST rules, while the term “Standard Time” indicates that no DST rules are applied.
The major complication with DST is not so much that the number of hours to be added or substracted changes based on the (local) day of the year, but that the mapping between timestamps with and without DST is not contiguous.

Often we think of clock-timestamp values as continuously increasing values with no gaps. It’s always one second, and another and another second.

This leads to implicit assumptions like:

  • every possible timestamp between 00:00:00 and 23:59:59 will occur.
  • every timestamp will occur only once.

Both of those assumptions are false during the days when time changes from summer to winter time or back again.
When changing from winter to summer time, the clocks are set back from e.g. 03:00:00 to 02:00:00.
Every second between 02:00:00 and 02:59:00 will happen twice this night.

One the change back, summer to winter time, clocks will be set forward, jumping from 01:59:59 directly to 03:00:00, skipping all timestamps between 02:00:00 and 02:59:59.

Here we see, why it is so important to use standard time for storing timestamps (also in filesystems and trace files, etc.): DST makes timestamps ambiguous.

There are more things wrong with the simplified model of timestamps, e.g. the need for leap seconds to adjust for drifts of the defined day length (one full turn of our planet).
“Counting” based timekeeping systems like UNIX epoch time consider leap seconds, leap years, centuries etc. Covering these is beyond this post but reading up on it is highly recommended.

More problems on a stack

Knowing all this about timezones and conversions between time zones with and without DST, we should be able to use timestamp data correctly in our programs with SAP HANA.
And yet, questions like “Problem with “summer” time (DST)” come up and even I wasn’t able to immediately pinpoint the cause of the problem.

At face value, the OP found what looked like a bug in SAP HANA’s JDBC driver.
When querying the database via JDBC certain timestamps did not seem to get returned correctly, even though the SAP HANA system timezone was set to UTC:

java -jar ngdbc.jar -u ... 
     -c "select to_seconddate('2017-03-26 01:00:00') as \"2017-03-26 01:00:00\"
       , to_seconddate('2017-03-26 02:00:00') as \"2017-03-26 02:00:00\"
       , to_seconddate('2017-10-29 01:00:00') as \"2017-10-29 01:00:00\" 
         from dummy"


| 2017-03-26 01:00:00 | 2017-03-26 02:00:00 | 2017-10-29 01:00:00 |
| 2017-03-26 00:00:00 | 2017-03-26 01:00:00 | 2017-10-29 02:00:00 |
             ^^                    ^^                    ^^

When running the same query with hdbsql and thus avoiding JDBC  the result is very different:

hdbsql -U ...  -A 
  "select to_seconddate('2017-03-26 01:00:00') as \"2017-03-26 01:00:00\"
   , to_seconddate('2017-03-26 02:00:00') as \"2017-03-26 02:00:00\"
   , to_seconddate('2017-10-29 01:00:00') as \"2017-10-29 01:00:00\" 
     from dummy"

| 2017-03-26 01:00:0  | 2017-03-26 02:00:0  | 2017-10-29 01:00:0  |
| ------------------- | ------------------- | ------------------- |
| 2017-03-26 01:00:00 | 2017-03-26 02:00:00 | 2017-10-29 01:00:00 |

What is going on here?

The answer lies with JAVA, more precisely the way how JDBC handles timestamp data (see also here)
When fetching timestamp data from a database that does not store the timezone with the timestamp data, the JDBC method getTimestamp (int columnIndex, Calendar cal) gets used.

The method returns a java.sql.Timestamp object which in turn is based on java.util.Date. Java handles dates/timestamps as moments (instants) in time that can be expressed in calendar fields (year, month, day of month, etc.). This requires defining the calendar that should be used here.
In our example, without specifying any specific calendar, the JVM uses its default timezone.
If the JVM property user.timezone is not set, then the JVM will try to find the current operating system users timezone. If the JVM runs on Linux, that would be the value of TZ again, on MS Windows it would be the current timezone setting for the Windows user that runs the JVM.

What JDBC is doing basically means: if the JAVA program does not explicitly set the Calendar to be used, then the setting of the client OS is used. A prominent example for this is SAP HANA Studio.

To avoid this implicit, nearly hidden conversion, one can set the JVM parameter -Duser.timezone=UTC in the hdbstudio.ini file.

For the interested: JAVA comes with its own complete implementation of a timezone database and the conversion data can be – just as in SAP HANA – managed independently from the operating system. (see here, here, here, here, here, here)

Time to end

And here we are now. Two full stacks of timekeeping, time zone management and conversions with and without DST.

What do I take home from this?

To me, the JDBC behaviour was the most surprising effect, so I probably try to make sure that I set the timezone to UTC whenever I want to avoid automatic conversion (e.g. server-side data access like SDI adapters or development/DBA tools like SAP HANA Studio).
For the rest, I think a good rule of thumb is to keep all timezone and time settings the same, especially across SAP systems.

Finally for applications and analytic scenarios that rely heavily on timezone conversion, making sure to fully understand all involved conversion mechanisms is really important.

There you go, now you know!


How to give what you don’t have

The development tools and techniques for SAP HANA have been extended and enhanced a couple of times since SAP HANA had first been released in 2011.
For apps developed for SAP HANA XS, the new XSA environment means a considerable change in the way application code accesses the database.

In his post “Synonyms in HANA XS Advanced, Accessing Objects in an External SchemaChristoph Gilde explains in detailed steps how XSA applications can get access to the database schema of XS classic applications. This might not be the loose coupling aimed for with a microservices architecture, but it is an efficient way of building new functionality in XSA while retaining an existing XS classic code base.

Out with the old, in with the new? Not so fast!

One example, where this is being done is the SAP Health application Medical Research Insights 2. Starting with FP 5, several new features have been added as XSA apps, while keeping the base application and the underlying Connected Health Platform as XS classic applications.

Christoph’s post covers all necessary steps to enable such a cross-application data access but relies on a prerequisite that can be hard to fulfil.
The very first step in the tutorial is to create two roles in the XS classic schema that group all required privileges to the XS classic database objects. One of these roles has to include the privileges with the GRANT OPTION that allows granting those privileges further to other users or roles.
It is fairly obvious that this is an extremely wide-ranging privilege that can make controlling who has access to which data and who manages this access very difficult. For this reason, the XS classic repository roles don’t allow the use of the GRANT OPTION.

But Christoph post does not mention anything about repository roles, so what is the problem here, you might ask.

What one can do and what one cannot do

One point is, that having to create roles manually via SQL makes the handling of the application life-cycle more complex. Ideally one wants to have an XS classic application fully contained in one or more DUs that can be transported (installed, imported) together. The less manual installation activities required, the better.

The more important point, though, is that only the owner of an object or a user with the GRANT OPTION privilege can go on and grant other users privileges with the GRANT OPTION.

In case of an application that uses a .hdbschema repository object to define the application schema as recommended, the owner of that schema is the _SYS_REPO user.
This special SAP HANA system user cannot be used to login to the database, so even if one would settle for a manual installation step, there technically is no way to run a GRANT … WITH GRANT OPTION as the _SYS_REPO user.

A way out

So, how can we make _SYS_REPO grant privileges to a user with the GRANT OPTION?
Meet “SQLScript procedure with DEFINER RIGHTS“!
This setting allows executing the procedure as the user that created it.

For repository procedures, this “creator” is always _SYS_REPO.
By now you probably see where this is going.

To have _SYS_REPO grant with GRANT OPTION, we create following repository objects:

  1. external_access_g.hdbrole
    (see Christoph’s post for the two types of roles for external access)
    This one is just an empty role file.
    Don’t get confused by the ‘pred’ prefix in the package name, this is just the name I used for the demo package and without functional relevance

    role pred.roles::pred_external_access_g
  2. external_access.hdbrole
    This one contains all access privileges without the GRANT OPTION.
    For the sake of this example, I chose SELECT on the schema.

    role pred.roles::pred_external_access
    catalog schema "PRED": SELECT;
  3. grantSchemaAccess.hdbprocedure
    PROCEDURE "PRED"."pred.roles::grantSchemaAccess" ( )
    exec 'GRANT SELECT on schema PRED to "pred.roles::pred_external_access_g" WITH GRANT OPTION';

    Important to note here:

    • SQL SECURITY DEFINER – this really is core to this approach. Leave it on the default INVOKER and it won’t work.
    • no READS SQL DATA – we use Dynamic SQL in this procedure which counts as READ/WRITE access.
    • the GRANT statement is executed via ‘exec’.
      If you try to put the GRANT in the procedure body directly, you will get the following error

      message:Error while activating /pred/roles/grantSchemaAccess.hdbprocedure:[pred.roles:grantSchemaAccess.hdbprocedure] Syntax error in procedure object: feature not supported;
      ddl statement 'GRANT SELECT on schema PRED to "pred.roles::pred_external_access_g" WITH GRANT OPTION' is not supported in procedure/function: line 11 col 5 (at pos 288)

    (BTW: To whoever decided that it was a good thing to disable selecting text in the console area of the Web-Based Development Workbench Editor: it’s not!
    Being able to copy error message output is rather important and having to manually change the ‘user-select’ option for the CSS style is rather cumbersome )

  4. Another role that covers the privilege to run the procedure
    role pred.roles::installUser
    sql object pred.roles::grantSchemaAccess: //Objecttype: PROCEDURE

With these objects in place, the setup procedure for granting privileges looks like this:

  1. Import the DU with the objects created above into the target system.
  2. Grant the installUser role to the user that performs the installation and setup activities (e.g. INSTALLUSER).
  3. As the INSTALLUSER execute the procedure:
    call "PRED"."pred.roles::grantSchemaAccess"();
  4. Check that the assignment was successful:
    select *
    grantee like 'pred_riles::external_access%'
    pred.roles::pred_external_access    ROLE         _SYS_REPO SCHEMA       PRED         ...  SELECT    FALSE        TRUE
    pred.roles::pred_external_access_g  ROLE         _SYS_REPO SCHEMA       PRED         ...  SELECT    TRUE         TRUE
  5. Revoke the role to execute the procedure from the INSTALLUSER again.
    As this is a setup-time-only activity, the role and the privilege should not be granted to anyone once the setup is done.
    To see who the role has been granted to run the following SQL statements:

    select *
    where object_type = 'PROCEDURE'
    and object_name = 'pred.roles::grantSchemaAccess';

    –> only the procedure pred.roles::installUser should be returned here.

    select *
    from "GRANTED_ROLES"
    role_name = 'pred.roles::installUser';

    –> only _SYS_REPO should be returned here.

Once these steps are successfully finished, the remaining procedure is the same as described in Christoph’s blog post, continuing at step 2.

Due to the critical nature of granting WITH GRANT OPTION one might also consider to set up auditing the roles and the procedure so that any use of them gets logged.

There you go, now you know!


Maps and Paths

A brief twitter exchange triggered some thoughts about expectations and how those determine whether something is a good A, a passable B or a terrible C.


Turn by turn navigation in a mobile app
Turn by turn navigation in a mobile app

When you want to go somewhere and you do not know the way, you need directions. Somebody or something needs to show you the path to follow. That’s what turn by turn navigation on your mobile does. Following this path does not mean you got to know the area. You know one just one way to get from A to B know.


London Underground map from 1908
London Underground map from 1908

A map, on the other hand, provides an overview, context and alternatives but no immediate path.
Using a map means you have to actively engage with the information, see for yourself which roads cannot be taken (maybe you are in a huge truck and the small sideways with wood bridges are not going to work for you) and eventually plot your own path.

Both maps and paths, are related but are far from being the same.

A useful analogy

This analogy works not just for navigating transportation networks.

If your company wants to build an information system to do X then usually it will give the job to someone with a path to get there. It does not need to discuss all the alternative technologies or architectures. When X can be achieved reliably, in-time and on-budget, then that’s usually already a great outcome. Getting such a path (think project) is commodity shopping.

On the other hand, when you want to do new things or old things in a very different way than before, a path is not the answer to that. This is where you need to work with a map. You need different viewpoints and scenarios to walk through. This situation requires you to discover and create a path. You can buy help for this process but you cannot skip the work.

(You might notice, that product or service roadmaps are specifically not maps. They are usually paths with a decreasing level of certainty towards the end. They can provide a heads-up to what might be coming but they don’t show alternatives, contexts or overview.)

To avoid disappointment and frustration it is important to be clear about what you expect.

Are you asking questions in user forums to get a map? Do you want blog posts to give you a path from A to B? Do you want to discuss options and discover/create paths?

For the SAP Community Platform getting answers to such questions will be important to avoid further disappointments.

Is SAP HANA SQL/SQLScript Turing complete?

There was an interesting (aka nerdy) question in the SAP Community Q&A the other day:
Are SQLScript and HANA SQL Turing complete?”

I took a swing at it and the following is my result (a slightly edited version of my original answer).

Alright, let’s do this! (even though I don’t have an idea why it would matter at all…)

So, I’m not a computer scientist, and only a third through “The Annotated Turing” (which seems to be a great book, btw). This means what follows is the layman’s approach on this.

First off, I had to understand, what is meant by “Turing complete” and how to show that a language has that characteristic.

For that, there are a couple of Wikipedia entries and SO discussions available and I leave that for everyone to read (e.g. here and here).
One of those discussions linked to a presentation that claimed to prove that PostgreSQL-SQL with recursive common table expressions (CTS) is Turing complete. In order to prove this, the author of the presentation (here) said, that it’s enough to prove that a language can emulate another language that has already been shown to be Turing complete.
Fair call.
The author’s choice was a cyclic tag system with a specific rule set (rule 110) which apparently has been shown to be Turing complete.
Then the author goes on and implements this cyclic tag system with a recursive common table expression and thereby proves the claim.


So, what does that mean for SAP HANA SQL?
SAP HANA SQL/SQLScript does not support recursive common table expressions (much to the distaste of everyone who tries to handle hierarchies and does not know about SAP HANA’s special hierarchy views and functions (look there) and it also does not support recursive procedure calls.

Bummer, one might think.
Fortunately, every recursion can be expressed as an iteration (compare here), so I thought, let’s try this cyclic tag system in SQLScript.
This is the result (HANA 1, rev.122.15, on my NUC system). The SQL code is also available in my GitHub repo.

do begin
declare prods VARCHAR(4) ARRAY;
declare currProd, initWord, currWord VARC<span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"></span>HAR(300); -- 300 is arbitrary and would be exceeded for more runs
declare currProdNo integer = 0;
declare runs, maxruns bigint = 0;

	initWord :='11001';                   -- the starting/initial 'word'
	maxruns := 100;                       -- a limit to the number of iterations 
                                              -- rule 110 is suspected to run indefinitively
    prods = ARRAY ('010', '000', '1111');     -- the three 'producer rules' stored in a string array

    currWord := :initWord;
    runs := 0;
    -- dummy table var to monitor output
    tmp = select :runs as RUNS, :currProd as CURRPROD, :currWord as CURRWORD 
    from dummy;

    while (:runs &lt; :maxruns) DO
        runs := :runs+1;
        currProdNo :=  mod(:runs,3)+1;    -- pick rule no. 1,2 or 3 but never 0
                                          -- as SQLScript arrays are 1 based
        currProd := :prods[:currProdNo];

        if (left (:currWord, 1)='1') then  -- add current producer to the 'word'
            currWord := :currWord || :currProd;
        end if;
        currWord := substring (:currWord, 2); -- remove leftmost character

        -- save current state into temp table var
        tmp = select RUNS, CURRPROD, CURRWORD from :tmp
              union all 
              select :runs as RUNS, :currProd as CURRPROD, :currWord as CURRWORD 
              from dummy;

    end while;

    select * from :tmp;                      -- output the table var

Running this gives the following output:

Statement 'do begin declare prods VARCHAR(4) ARRAY; declare currProd, initWord, currWord VARCHAR(300); declare ...' 
successfully executed in 7&lt;span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"&gt;&lt;/span&gt;17 ms 39 µs  (server processing time: 715 ms 590 µs)
Fetched 101 row(s) in 2 ms 517 µs (server processing time: 0 ms 424 µs)

RUNS    CURRPROD    CURRWORD                                                                             
0       ?           11001                                                                                
1       000         1001000                                                                              
2       1111        0010001111                                                                           
3       010         010001111                                                                            
4       000         10001111                                                                             
5       1111        00011111111                                                                          
6       010         0011111111                                                                           
7       000         011111111                                                                            
8       1111        11111111                                                                             
9       010         1111111010                                                                           
10      000         111111010000                                                                         
11      1111        111110100001111                                                                      
12      010         11110100001111010                                                                    
13      000         1110100001111010000                                                                  
14      1111        1101000011110100001111                                                               
15      010         101000011110100001111010                                                             
16      000         01000011110100001111010000                                                           
17      1111        1000011110100001111010000                                                            
18      010         000011110100001111010000010                                                          
19      000         00011110100001111010000010                                                           
20      1111        0011110100001111010000010                                                            
21      010         011110100001111010000010                                                             
22      000         11110100001111010000010                                                              
23      1111        11101000011110100000101111                                                           
24      010         1101000011110100000101111010                                                         
25      000         101000011110100000101111010000                                                       

That looks suspiciously like the output from the Wikipedia link (above) and does not seem to stop (except for the super-clever maxruns variable ).
With that, I’d say, it’s possible to create a cyclic tag system rule 110 with SQLScript.
The tag system is Turing complete, therefore SQLScript must be Turing complete.

Still, I have no idea why this would matter at all and that’s really all I can say/write about this.


Not a New Year’s thing

Happy New Year, dear reader!

I hope you had an enjoyable and relaxing time between Christmas and New Years Eve with your thoughts away from work. I can say, I did.

During this time I took the opportunity to finish some thoughts on my engagement with the SAP Community Platform (SCP) where I have been a moderator for the SAP HANA topic/tag for several years.
I have also been a top contributor for most of the time since I joined the platform and its predecessors – my current profile picture there states that it has been 14 years and a bit by now.

During my tenure, the SAP HANA community grew, new volunteers joined the moderator team and many experts and users shared their experiences, insights and questions. Today, this community is the place to go to ask SAP HANA related questions and I take pride in knowing that I helped to kick-start it.
But now, no starting assistance is necessary any longer and I my professional focus has shifted to less HANA centric topics (I work in the SAP Health team) and so I decided to hang up the “moderator’s hat”.

I have worked on a second “project” that initially triggered setting up the Lars Breddemann Blog: all my blog posts covering topics from SAP on Oracle and MaxDB over to SAP Business Warehouse and SAP HANA can now be found right here.
Given their first publication dates, many of them are outdated, but some seem to still enjoy interest from readers.
As the SCP was re-platformed a couple of times, I think it is easier to have them at a steady “home” with this personal blog.

Since there is no automatic way to download all my blogs from SCP and upload them to WordPress, this migration project was a manual copy & paste activity during several long evenings and weekends. The comments and discussions to the blog posts could not be copied. To read those the interested reader still needs to go back to the SCP version of the blog post.

Having to re-read my own blogs from the last ten years was quite the experience. Look the horrid English, behold the dire need for a copy editor and witness in shock and awe the ever repeating phrases!
It was a tough reminder for me to try to be more careful with my texts. I will try to follow the guidance of real writers (like this) in future texts.

So long, dear reader, have yourself an interesting 2018!

What’s your number?

Have you been using the ISNUMERIC() function of MySQL, MS SQL Server or PostgreSQL and wondered what to use in SAP HANA instead?
If so, this is the blog for you.

Just recently I’ve seen a customer using my old workaround (first published here) and then a few days later the related stackoverflow question got updated.

This topic is not dead

The new answer to the SO discussion took a popular approach to the IS_NUMERIC problem:

If the string can successfully be converted into a numeric type, then the value is numeric (IS_NUMERIC returns 1 for TRUE), else it’s not (IS_NUMERIC returns 0 for FALSE).
This is a completely feasible approach (and used on other platforms as well, e.g. Oracle or PostgreSQL) and provides a result that allows making the decision of whether or not the tested value can be turned into a number data type in HANA.

IT DEPENDS, as usual

The problem, or rather the limitation of this approach is, that numeric can mean different things in different circumstances.
Sometimes, one might want to accept ‘123.456,–‘ as numeric. Sometimes “$500.00 –” shall be OK, and so on.
Checking the T-SQL definition of ISNUMERIC (here and here) shows that there are plenty other options.

This means nothing else that there are alternative approaches for this problem.
One of those alternatives could be to use regular expressions (regex), which have been supported by HANA for quite some time now (since SP10 if I’m not mistaken).
With tools like one can comfortably work out a pattern that matches or doesn’t match the kind of data one is interested in.

It might well be, that in your application the question is not about whether a string or sub-string can pass IS_NUMERIC() but rather something more specific.
The customer that I mentioned earlier, for example, wanted to see if a string looks like an ICD10 code (one character followed by two digits, ‘B13’, ‘A00’, etc.). Snipping out the first character and putting the rest into an IS_NUMERIC() function is not ideal here.

Instead, the regex ‘^\D\d{2}.’ can be used to capture the whole pattern.

Show me how it’s done

Let’s take a rather simple regex that matches any string that

  • is not a number between 0-9
  • is not a plus (+), minus (-), comma (,) or full stop (.)
  • is not an exponent ‘E’ (as in 1.23E02)

Such a regex is ‘[^0-9-.\,+e]’ (check here) with flags for global, case-insensitive matching.

In SAP HANA SQL we can use this regex in the LOCATE_REGEXPR() function. If it doesn’t find any matching string, then the result is 0, else it will return the location of the first matching character.
Since we are only interested in those cases, where no matches were found, we can easily MAP this output to make up a IS_NUMERIC value:
1 = no unwanted characters found, it’s numeric
0 = unwanted characters found, it’s not numeric

MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                    FLAG 'i' 
                    IN <string to be searched>)
     , 0, 1
     , 0)

So, what’s better now?

Using the “try-the-conversion“-approach can only be done via a procedure or a scalar function. This brings some overhead with it for the function invocation which can be avoided, when using the regex directly in the SQL statement.

As an example, I ran a quick test against 10.000.000 unique values (all tests were done against HANA1 1.00.122 and HANA2 2.00.020).

Conversion-approach function

       sum("IS_NUMERIC_CONV"(id_char) )
       "IS_NUMERIC_CONV"(id_char) = 1
   and "IS_NUMERIC_CONV"(id_char) IS NOT NULL;

This ran for roughly 19.5 seconds with 8 CPU cores at 100% usage.
And this is the runtime when all values actually can be converted correctly. With the same number of non-numeric values, the runtime goes up to 2:45 mins.

Regex-approach pure SQL

             MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                                 FLAG 'i' 
                                 IN id_char)
                  , 0, 1
                  , 0) IS_NUM_REGEX
     IS_NUM_REGEX = 1 

This ran for roughly 8.6 seconds with 8 CPU cores at 100% usage. Same runtime for both convertible and non-convertible values, here.

Too good to be true

By now you maybe think: “Gee, that regex thing is a lot faster, but also really ugly code. Let’s put that into a function and have our cake and eat it too!“.

Off we go:

                 in stringToCheck nvarchar(5000)) 
         RETURNS isNumeric integer
     isNumeric := MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                                      FLAG 'i' 
                                      IN :stringToCheck)
                       , 0, 1
                       , 0);

A statement using this function takes roughly 30 seconds with 8 CPU cores at 100% usage.

No cake for us!

As long as all our input data is clean, the conversion-function approach works nicely but really becomes expensive, when there are many cases, for which the exception handling needs to be executed. The lesson here is probably that it’s not a great (performance) idea to make exception handling part of your main processing path.

A truly high performing solution would need to be implemented as a native HANA function, so for it needs to be carefully checked what exactly the job of IS_NUMERIC()  shall be in your application and how many matches/non-matches are expected.

There you go, now you know!


HANA inverts the Thunderdome law

If you’ve watched “Mad Max Beyond Thunderdome”  then “… You know the law: Two men enter, one man leaves …“.

MV5BNDM4N2NkZmEtMjQ2Yi00ZTBkLWExODYtYWJiYzkwNWQ4NDQ4XkEyXkFqcGdeQXVyMTI3MDk3MzQ@._V1_Watch out, he’s coming for you! — pic from

HANA, of course, reverses, sorry, disrupts this law by making two (or more) values get out of functions, even when only one input value was provided.

Wait, what?

Alright, this Thunderdome reference is really a massive plug and only used to get your attention here. Now, that I seem to have it, let us get to the topic.

If you’ve been using SQLScript for some time you are aware of user defined functions (UDF). These UDFs can nowadays handle both scalar and table typed parameters, both for the input and for the output aka result of the function.

The main purpose of functions, just like in other programming languages, is code reuse and making the code that uses them easier to understand.
With some thought about the naming, one can even make up parts of a domain specific language for the problem at hand.
Think of

applyTaxTRate(sales_price) as "priceWithGST"

in your SELECT statements instead of

 CASE sales_price < 10 
 THEN sales_price 
 ELSE sales_price * 0.20 
 END as "priceWithGST"

as an example.

All good so far. Now, with HANA performance considerations are nearly always the main concern, so a question often asked is:
Isn’t using functions bad for performance?

Which I can only answer with
Isn’t using high-level languages bad for performance?
And indeed, it is easy to lose some microseconds and several MegaBytes of memory by using functions compared to “pure” SQL. But then again, the main goal for a developer is to implement a functionality (correctly) that eventually provides value for the users.

So, my recommendation is to use functions where it helps with making the programmers more productive and worry about the performance impact, when one can actually measure it.

Back to the topic of this blog.

I had looked at a question on stackoverflow a while ago which basically asked, why using UDFs with more than one result seems to lead to slower execution times, the more results are actually used.

So something like

      getUserDetails (143776).userName
    , getUserDetails (143776).expiryDate
    , getUserDetails (143776).currentAge
    , getUserDetails (143776).customerSegment
from dummy;

would take around four times as long as

    getUserDetails (143776).userName
from dummy;

even though we’re running the same function with the same input parameter.
The programmers’ intuition, well at least mine and that of the OP from stackoverflow, was that HANA would evaluate the function once for the provided input parameter and use the different results from this one execution.

Some testing of the query execution times showed that this could not be the case.
A more in-depth analysis was in order here.

The try setup

For my “learn about the system by poking around“-try I created a very simple scenario: a user account table and a query to find the date for when any user account would expire.
Not wasting time with inventing play-doh data I simply ran:

create column table myusers as (select * from users);

With that in place a UDF was easy to build:

drop function userdetails;
create or replace 
 function userDetails(user_id BIGINT) 
 returns userName NVARCHAR (256),
         expiryDate DATE
   select user_name
        , to_date(add_days(valid_from, 180)) as expiryDate
    into userName
      , expiryDate
         user_id = :user_id;
select userDetails (143776).userName
     , userDetails (143776).expiryDate
from dummy;

_SYS_REPO                    21.01.2018

This should be equivalent to this pure SQL statement:

select user_name
     , to_date(add_days(valid_from, 180)) as expiryDate
     user_id = 143776;
_SYS_REPO 2018-01-21

Looks good, I’d say.

Checking on the execution times shows:

UDF (multi-result)

successfully executed in 39 ms 67 µs (server processing time: 37 ms 819 µs)
successfully executed in 42 ms 929 µs (server processing time: 41 ms 60 µs)
successfully executed in 44 ms 13 µs (server processing time: 42 ms 492 µs)

pure SQL multi-result)

successfully executed in 1 ms 191 µs (server processing time: 0 ms 385 µs)
successfully executed in 1 ms 933 µs (server processing time: 0 ms 825 µs)
successfully executed in 2 ms 507 µs (server processing time: 0 ms 827 µs)

While this is a big difference (30 times) it is actually the wrong one for the current question. What we need to look at is the difference between executions with just one or two results.

Running the statements with just the expiryDate output:

UDF (single parameter)

successfully executed in 29 ms 443 µs (server processing time: 28 ms 200 µs)
successfully executed in 30 ms 61 µs (server processing time: 28 ms 610 µs)
successfully executed in 29 ms 586 µs (server processing time: 28 ms 131 µs)

pure SQL (single parameter)

successfully executed in 1 ms 917 µs (server processing time: 0 ms 735 µs)
successfully executed in 1 ms 63 µs (server processing time: 0 ms 308 µs)
successfully executed in 2 ms 864 µs (server processing time: 0 ms 742 µs)

And here we do see an important difference: the pure SQL runtime kept stable while the UDF fetching just a single result did so quite a bit faster.

Phase One “Reproducing the issue” successfully done.

Moving on with the analysis, I played around with the statements and wanted to check what happens when you select a non-existing user:

Trouble with no data

     userDetails (-4711).expiryDate
from dummy;

Could not execute 'select userDetails (-4711).expiryDate from dummy' in 40 ms 99 µs . 
SAP DBTech JDBC: [478]: 
user defined function runtime error: "DEVDUDE"."USERDETAILS": 
line 9 col 5 (at pos 131): 
[1299] (range 3) no data found exception: 
no data found at user-defined function (at pos 8)
pure SQL
     to_date(add_days(valid_from, 180)) as expiryDate
     user_id = -4711;


When there is no matching record we get an error with our UDF. That is likely not what we want, so we need to cater for that in the code.
Here we can use the EXIT HANDLER to catch the ‘NO DATA FOUND’ SQL error.

create or replace 
 function userDetails(user_id BIGINT) 
 returns userName NVARCHAR (256),
         expiryDate DATE
declare exit handler for SQL_ERROR_CODE 1299 
     userName := NULL;
     expiryDate := NULL;

select user_name
     , to_date(add_days(valid_from, 180)) as expiryDate
 into userName
    , expiryDate
      user_id = :user_id;

With this change, the error is caught and the UDF returns NULL values. That’s not the same as an empty set, but good enough for this case and a lot better than the error.

      userDetails (-4711).expiryDate
from dummy;


Trouble with tracing

The next step of the analysis should be to actually show that the UDF is executed once for each result.

I looked into the different usual suspects for statement analysis and was rather disappointed:
SQL Plan Cache, Expensive Statement Trace and PlanViz all only show the top-level statement + statistics.

PlanViz shows a “Row Search”-Pop as the final processing node and this contains the projection list but that’s about it.

Name: Row Search
ID: ID_EA341305E062BA47B440FD71F07CA318_0
Execution Time (Inclusive): 19.9 ms
Execution Time (Exclusive): 19.893 ms
Execution Start Time: 6,934.218 ms
Execution End Time: 6,954.174 ms
CPU Time (User): 19.883 ms
Projected Cols: 

So what would be a good way to find out whether the function had been called more than once during the statement execution?
I didn’t want to go some low-level tracing, so I came up with a modification of the function:

Pudding, proofs… is it Christmas yet?

create sequence pcnt;

drop function userdetails;
create or replace 
 function userDetails(user_id BIGINT) 
 returns userName NVARCHAR (256),
       expiryDate NVARCHAR (30)
declare exit handler for SQL_ERROR_CODE 1299 
     userName := NULL;
     expiryDate := NULL;

    select user_name|| '_> '|| pcnt.nextval
         , to_nvarchar(add_days(valid_from, 180), 'DD.MM.YYYY') ||'_> '|| pcnt.nextval as expiryDate
     into userName
        , expiryDate
       user_id = :user_id;

I am going to use a sequence to add the current counter for execution to the output values.

In pure SQL the sequence is bumped up once for each output row:

select user_name || '_> '|| pcnt.nextval
 , to_nvarchar(add_days(valid_from, 180), 'DD.MM.YYYY') ||'_> '|| pcnt.nextval as expiryDate
 user_id = 143776;
Exection …
_SYS_REPO_> 30                 21.01.2018_> 30
Next exection…
_SYS_REPO_> 31                 21.01.2018_> 31

That means, if the UDF is really called once per output argument, then we should see increasing numbers for the same result row.
Let’s see what we get:

select userDetails (143776).userName
     , userDetails (143776).expiryDate
from dummy;

_SYS_REPO_> 32               21.01.2018_> 33

Here we got our proof, the function actually is executed twice.

Ok… so what?

Alright, up to here I spend your time and attention on explaining how to find out that HANA does not do the thing we liked it to do.
If that was the end of it, you rightfully would ask, what this was for.

Worry not! There is a good end in sight!

The answer lies with using the UDFs in SQLScript blocks.
But not like this:

do begin
declare _userName   NVARCHAR(256);
declare _expiryDate NVARCHAR(30);
    _userName  := userDetails (143776).userName;
    _expiryDate:= userDetails (143776).expiryDate;
    select _userName, _expiryDate from dummy;
:1              :2 
_SYS_REPO_> 34  21.01.2018_> 35

We have to step up our SQLScript game here.
While we’re already avoiding the stupid SELECT … INTO FROM dummy workaround for assigning function results to variables, it’s obviously calling the function twice now.

Instead, we need to use the function in a single assignment.
How can you assign values to multiple variables in a single assignment?“, you ask?

I did the same until one of our SQLScript developers pointed me to something that is, well, “mentioned” in the documentation. I put “mentioned” in quotation marks, as this non-obvious programming construct is not really explained, but shows up in just one example code (this one)

The solution and answer

do begin
declare _userName   NVARCHAR(256);
declare _expiryDate NVARCHAR(30);
    (_userName, _expiryDate) := userDetails (143776);
    select _userName, _expiryDate from dummy;
:1               :2 
_SYS_REPO_> 36   21.01.2018_> 36

With this multi-result assignment, we not only have just one function call but also much nicer to read code.

There you go, now you know.

Cheers from Mad Max-country,



HANA backtrace 36/2017

What to read? Whom to follow?

One of the important things to pick up in any community is finding out who are the people that usually have good, funny or interesting content. Many community platforms make it easy to find out who are the “leading contributors”, the “heroes”, “aces” etc.Currently, the platform SAP Community is run on struggles with this, big time.
It is actually hard to not miss new blog posts, sometimes even if one has subscribed to a specific author.

So, here is a short list of people that I noticed putting out really good content in the SAP HANA space.
This list is really just what my very biased eye has seen and my colander-like memory has not lost.

If you know more authors or think that your own material should be seen by more HANA-heads then please, do not hold back and add it in the comments section.

Laura Nevin

A SAP colleague up North close to the Niagra Falls working as a Development Architect in the HANA documentation team. Her blog “atrium” does not just have your vanilla product updates but provide the reader with a whole tour around what’s new and noteworthy with the documentation.

I find her blogs particularly well written and produced and silently wish more people (myself included) would publish such nice blogs.

Lucia Subatin

Yes, also a SAP colleague, but from the other end of the world: Argentina.
Lucia has a lot to say and write about the SAP HANA Express edition.
Actually, that’s her job as she is a SAP HANA Evangelist and apparently a very good one.

Jan Zwickel

Jan is a SAP HANA product manager who has written a number of in-depth explanations of the otherwise mysterious “flags” of the graphical calculation views.
Working from SAP’s center of gravity in Walldorf, Germany he is as close to the source as it gets.

Sharyu Sarode

There’s not much that I know about Sharyu, except for that she seems rather experienced in the administration corner of SAP on HANA and that she got a knack for writing about that.

She is not an SAP employee, which makes it even better that she shares her experiences with us. Judging from her profile page, she works in Australia, just like myself.
Her blogs are well prepared and definitively worth the read.


As mentioned, these pointers are just the content I’ve seen and liked here on SAP Community platform.

I am very happy to see that the reports of the death of the platform and the community seem to have been exaggerated and that there are interesting, read-worthy contributions from both SAP and non-SAP authors.

Thanks for that folks!

Other stuff I like to recommend

As there is so much other stuff going on outside the SAP Community let’s have a quick view at that, too.

Weapons of Math Destruction

If you work with HANA, then Big Data, Data Science, and Predictive are your daily bread. Make sure to get an understanding of the social impact this can have.

Both the book and the talk by the author are highly recommendable.

“Badass: Making Users Awesome”

from the famous Kathy-“Head first”-Sierra is a particularly enjoyable and easy read. Well designed book with good ideas on how to create succesful products and not just a bunch of features.

There you go, now you know!

HANA gets stripped

This is going to be one of those weird blog posts, that basically convey just a single piece of information, but since the new SAP Community platform doesn’t really care for WIKI/document style knowledge storage, I had to come up with a click-bait title to make it a blog post.

I’ve written before on installing HANA cockpit and yesterday I fired it up once again.
Snooping around a bit in the tracefiles (running two HANA 2 instances + XSA stacks on my one 32GB NUC always leaves room for questions like “how much actual processing memory do we have left now?“) I found this jewel of an “error curtain”:

[26144]{-1}[-1/-1] 2017-09-12 13:30:34.208039 w Basis ELF.cpp(00770) :
[26144]{-1}[-1/-1] 2017-09-12 13:30:34.208039 w Basis ELF.cpp(00771) :
Using only dynamic symbol table for symbols of ELF file, no string section
Using only dynamic symbol table for symbols of ELF file, no string section
Using only dynamic symbol table for symbols of ELF file, no string section
Using only dynamic symbol table for symbols of ELF file, no string section
Using only dynamic symbol table for symbols of ELF file, no string section
Using only dynamic symbol table for symbols of ELF file, no string section
Using only dynamic symbol table for symbols of ELF file, no string section
Using only dynamic symbol table for symbols of ELF file, no string section

[... Buzz-Lightyear continuation mode ... ]

Strip it!

To the average advanced computer user like myself, this looks like error messages.

Yes, these messages are marked as ‘w’ like wwwwwarning and the whole system still seems to be doing its thing, but personally, I don’t really like when I see tons of such messages and find precisely zero documents, SAP notes, KBAs or other information sources that can tell me what they mean.
I have the same issue with looking into the CONSOLE app of my MacBook by the way… #who_thinks_of_the_poor_end_user_nerd?

After some HANA source code reading (that really helps every now and then, especially when the trace file message neatly points to C++ file ELF.cpp, lines 770 and 771) I figured out two things:

  • this is really just an info-warning written out when shared libraries are loaded by a HANA process. That usually happens once, at process startup, e.g. when the nameserver starts up.
  • this specific code has been around for several years now. Yet, I have not seen these messages before, which means that something else (aka not this piece of code) must have changed.

The next best thing from here was to look into the trace files of my other HANA 2 installation.
Guess what! No such messages in any of the trace files.

Ok, I thought, so maybe there is something different with the HANA installation that comes with HANA cockpit.

Fighting with ELFes

(sadly, she has nothing to do with this…) (image

Let’s have a look into that anyway.

The warning message listed the shared objects (.so) files that it seems to have issues with, one of them was
Checking this in my HANA Cockpit installation with the Linux tool <em>file</em> shows:

file /usr/sap/H4C/HDB96/exe/
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV)
, dynamically linked
, BuildID[sha1]=cafedde1f0db3527e5a8f3f69fc75019287f4196
, stripped <<< !

Compare this with my other HANA 2 installation:

file /usr/sap/S20/HDB20/exe/
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV)
, dynamically linked
, BuildID[sha1]=1089da0d45a4fc59a631f0877b0f2786a5282937
, not stripped <<< !

Aha! So, the files actually have been “stripped”.

What that means is that human readable information, so called symbols, about the compiled program in the .so files have been removed.
This information is not required to run the program but is extremely useful for debugging and problem analysis.
Without the information, a crash dump would only contain memory addresses for functions and since dynamic loading is used, these addresses change every time the program is run.
On the other hand, with symbols in the files, a stack trace or a crash dump literally print out the names of the involved functions and methods.
This is what makes the problem analysis actually possible in most cases.

Huh? I mean why?

Peachy“, you think, “why on earth would then somebody strip this important info from the files“?
Great question!

The reason is the file size for the .so files.
Symbols can take quite some space in them. This alone might not be a problem since disk space is cheap, but the symbols also get loaded/mapped into memory when the shared object file is used.

And if there is a single precious resource on HANA systems it is the main memory.

How much can we save with this stripping?
First the unstripped version:

ls -lh /usr/sap/S20/HDB20/exe/
-r-xr-xr-x 1 s20adm sapsys 8.0M Jul 25 05:11 /usr/sap/S20/HDB20/exe/

Now the stripped version:

ls -lh /usr/sap/H4C/HDB96/exe/
-r-xr-xr-x 1 h4cadm sapsys 2.2M Jun 26 19:42 /usr/sap/H4C/HDB96/exe/


Down from 8MB to 2.2MB!
And that is just one of 418 shared objects
(counted on my HANA 2 system via “ls -l *.so | wc -l“).

In total that amounts to quite a difference between the binary sizes for the normal HANA installation and the HANA Cockpit one:

“Normal HANA 2”
ls -lh /usr/sap/S20/HDB20/exe/ | grep total
total 5.2G
“HANA Cockpit”
ls -lh /usr/sap/H4C/HDB96/exe/ | grep total
total 1.7G

Obviously, it has been a goal for the HANA cockpit team to make the additional overhead of having its own HANA instance as small as possible.
There is actually another case where memory usage of HANA needed to be minimized: the HANA Express edition.

After quickly firing up my HXE virtual machine (gee, I really love my NUC system for making this so easy) I checked whether it also uses symbol stripping and lo and behold, it does:

“HANA Express edition”
ls -lh /usr/sap/HXE/HDB90/exe/
-r-xr-xr-x 1 hxeadm sapsys 2.2M Jul 24 19:16 /usr/sap/HXE/HDB90/exe/

file /usr/sap/HXE/HDB90/exe/
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV)
, dynamically linked
, BuildID[sha1]=1089da0d45a4fc59a631f0877b0f2786a5282937
, stripped <<< !

For those interested to learn more about symbols and shared libraries here are some pointers:

Finally, to go really down into the deep end of this rabbit hole, can we actually see that HANA uses less memory now?

Yes, we can!

The first option, on OS level:

“HANA Express”
hxeadm@hxehost:/usr/sap/HXE/HDB90> HDB info
hxeadm 1707 1 0.0 21628 2792 sapstart pf=/usr/sap/HXE/SYS/profile/HXE_HDB90_hxehost
hxeadm 1716 1707 0.0 217104 60060 \_ /usr/sap/HXE/HDB90/hxehost/trace/hdb.sapHXE_HDB90 ...
hxeadm 1739 1716 9.5 6880612 5089404 \_ hdbnameserver

We take the process ID of the nameserver and use the pmap tool with it.
As the man page explains, pmap displays information about process memory mappings.

hxeadm@hxehost:/usr/sap/HXE/HDB90> pmap -x 1739

1739: hdbnameserver
000055b587d67000 1932K    1928K    1928K      12K   0K      0K r-xp .../hdbnameserver
000055b587f4b000  248K     248K     248K     248K   0K      0K r--p .../hdbnameserver
000055b587f89000    4K       4K       4K       4K   0K      0K rw-p .../hdbnameserver
00007f0be3758000  712K     520K     288K       8K   0K      0K r-xp .../
00007f0be380a000   16K      16K      16K      16K   0K      0K r--p .../
00007f0be380e000    4K       4K       4K       4K   0K      0K rw-p .../
00007f0c54af6000    4K       4K       4K       4K   0K      0K rw-p [anon]
00007ffccc580000  136K      68K      68K      68K   0K      0K rw-p [stack]
00007ffccc5bc000   12K       0K       0K       0K   0K      0K r--p [vvar]
00007ffccc5bf000    8K       8K       0K       0K   0K      0K r-xp [vdso]
ffffffffff600000    4K       0K       0K       0K   0K      0K r-xp [vsyscall]
Total:        6880616K 5092916K 4825683K 4633372K   0K      0K

5870672K writable-private, 992656K readonly-private, 17288K shared, and 4621300K referenced

(full path to objects /hana/shared/HXE/exe/linuxx86_64/HDB_2.

The output from pmap is usually very long and wide, so I cut out a lot here and shortened the file path.
The important bit is 17288K (~16 MB) shared in the last line.

s20adm@skullbox:/usr/sap/S20/SYS/exe/hdb> HDB info
s20adm 20398 1 0.0 21600 228 sapstart pf=/usr/sap/S20/SYS/profile/S20_HDB20_skullbox.lab.cat5
s20adm 20407 20398 0.0 322428 580 \_ /usr/sap/S20/HDB20/skullbox.lab.cat5/trace/hdb.sapS20_HDB20 ...
s20adm 20429 20407 11.8 9131476 2749264 \_ hdbnameserver

s20adm@skullbox:/usr/sap/S20/SYS/exe/hdb> pmap -x 20429
20429: hdbnameserver
000000679a5a6000 1932K 240K 240K    8K   4K      4K r-xp .../hdbnameserver
000000679a78a000 248K   24K  24K   24K 224K    224K r--p .../hdbnameserver
000000679a7c8000 4K      4K   4K    4K   0K      0K rw-p .../hdbnameserver
00007fd36a355000 76K     0K   0K    0K   0K      0K r--s .../
00007fd36a368000 32K     0K   0K    0K   0K      0K r--s .../
00007fd36a370000 648K    0K   0K    0K   0K      0K r--s .../
00007fd36a412000 1368K   0K   0K    0K   0K      0K r--s .../
00007fd36a568000 116K    0K   0K    0K   0K      0K r--s .../
00007fd36a585000 156K    0K   0K    0K   0K      0K r--s .../
00007fd36a5ac000 80K     0K   0K    0K   0K      0K r--s .../
00007fd46f0d7000 4K      4K   4K    4K   0K      0K rw-p [anon]
00007ffdd3c51000 136K   28K  28K   28K  40K     40K rw-p [stack]
00007ffdd3dd6000 12K     0K   0K    0K   0K      0K r--p [vvar]
00007ffdd3dd9000 8K      8K   0K    0K   0K      0K r-xp [vdso]
ffffffffff600000 4K      0K   0K    0K   0K      0K r-xp [vsyscall]
Total:          8925232K 2753308K 2717475K 2491696K 3128688K 3023716K

5720584K writable-private, 994368K readonly-private, 2210280K shared, and 2569084K referenced

(full path to objects /hana/shared/S20/exe/linuxx86_64/HDB_2.

Same story here, but here we have 2210280K (2.1 GB) shared data.

The HANA way

Another approach to compare the memory usage is of course to query the HANA monitoring views:

round (value/1024/1024, 2) as value_MB
                               HANA 2    HANA Express
NAME                          VALUE_MB       VALUE_MB
SYSTEM_MEMORY_SIZE           32063.550      23546.860
SYSTEM_MEMORY_FREE_SIZE        272.140        246.680
PROCESS_MEMORY_SIZE           8652.050       6717.480
PROCESS_RESIDENT_SIZE         2110.720       4960.260 <<< ~ Linux RSS
PROCESS_CODE_SIZE             2017.950          3.580 <<< ~ Linux shared
PROCESS_STACK_SIZE             146.50          99.690
PROCESS_ALLOCATION_LIMIT     28857.190      21192.170
GLOBAL_ALLOCATION_LIMIT      28857.190      21192.170

More about HANA memory can be found in SAP Note 1999997 – FAQ: SAP HANA Memory

If you see warning about the “using only dynamic symbol table” now, you can put this into perspective.

There you go, now you know.