How to know “all these things”

There is one predominant reason for why databases and database technology get a spot in the conversation of “the business”: performance.
Understanding system performance and coming up with ideas for improving it typically requires a form of end-to-end understanding with different levels of detail insight on how a query from a client application is processed by SAP HANA. Before long, one looks into “explain plans“, “execution paths”, SQL, PlanViz diagrams, data distributions, table definitions, stack traces and even NUMA-process management on Linux/Hypervisor-level. That’s a lot of concepts and layered technology and most of those concepts come with a development history of at least 30 years if not more.
This means that prior knowledge is not only implied but required to get a proper understanding of what concepts are put to work in your specific setup. Intuition about how the system works is not your friend here.

But where do you learn about all these things? The product documentation? A university SQL 101 class? Blog posts from strangers on the Internet? Books? Or a mooc with a certificate?

Personally, I’m always happy to find good collections of material, maybe even curated into topics. One of those collections is The Red Book (Readings in Database Systems), which covers all important topics of Database Management Systems. It even features a chapter “3. Techniques Everyone Should Know” in case you really only want to have a quick overview.

I read about this awesome (and free) online book in a blog that I can only recommend: The morning paper by Adrian Coyler. He linked to the red book in his post (Database) Techiques Everyone Should Know.
There are also many more recent posts about database related papers like Learned Index Structures or the design and implementation of modern column-oriented database systems which explain current developments in DBMS technology that will probably end up as a feature in one of the big DBMS systems. The great thing about Adrian’s blog is that it is a kind of Cliff’s Notes for computer science papers. One does not have to actually read the paper to get the core ideas; but Adrian always makes the paper available to read for oneself.
While I mention this blog, I like to say “Thank you!” to Mr. Coyler whom I never met, but I certainly benefitted a lot from reading his outstanding blog.

A long comment

Last Monday I read DJ Adam’s blog post about GraphQL and how this tool might be useful to be added to the toolbox of SAP UI5 developers.
There is a lively discussion over the blog’s topics going on and most of it was around the GraphQL technology and its potential usage in UI5.
I added a comment to this and Mr Adams used this to hook me in for a blog post.
So here we are.

As someone that read through the discussion somewhat uninvolved since I am not doing any front-end development what stroke me was the narrow focus of it.
Why was there no question about how this added tool would likely make a change to the development projects and teams? Reading a bit deeper in the linked blog post from Jeff Handley a lot of the arguments are not about technology, features and implementation details but about how the tool is helping the product development teams to do their jobs better.
To me, it seems like a rather typical SAP technology discussion, where “THE TOOL”, “THE PLATFORM” or “THE SUITE” is the bit that will make the difference between successful developments and the other ones.

Now, while I am a sucker for great tools like the next person, I am in the corner of folks that believe “it’s the photographer that makes the great picture – not the camera“. Technology can help with many problems but if one does not have a clear understanding what problem needs solving simply picking up the lastest-greatest-shiniest tool likely means the wrong tool is used and/or used in a wrong way.

One example I mentioned in my comment hinged on the “declarative” nature of GraphQL. I do not know how GraphQL works specifically but I have some experience with how declarative programming works in many SAP shops.
The main declarative language present in these organisations is SQL. ABAP programs use it, SAP HANA models have it (in some way) and the analytics teams are constantly building “queries”.

My point is that this typically does not work too well.
Invariably I come across data models and queries that show the developers tried to “tell the database what to do“. And that even though the very same developers commonly do neither have a proper understanding of how the database works nor how to understand what it actually does in the first place.
Still, “that index needs to be used!” and “this hint is crucial for the performance“!

I cannot claim to know for sure why this is, but I got some suspicions:

  • often ABAP/SAP developers moved into their roles horizontally from either the “tech support” or from the “business support” team. That means great familiarity with the organisation’s landscape and processes but does not guarantee a firm grasp of the concepts implemented in the technology.
  • education and learning of abstract concepts is not a common activity in SAP development teams as there is no immediate payoff.
  • the official SAP technology documentation and training is focussed on functions and features and actively avoids concepts and broader ideas.
  • training content often focusses on simplistic examples leaving out real-world complex scenarios

This is not to say that it is the development team’s fault – but a lot of the IT technology that SAP has embraced in recent years requires a comprehension of the underlying concepts. Most of those concepts tend to be more involved than computation “mechanics” and cannot be intuitively understood correctly.
“Learning by example”, which I would bet is the most prevalent learning style in SAP environments, only goes so far if one is trying to get a hold on new non-trivial ideas.

Coming back to the GraphQL discussion to me this looks like another tool that will be “used the wrong way without error messages” if at all. Will the majority of development teams be able to realize the same benefits as the Concur team? I doubt that.
Will the choice of tool make a difference to the outcome for the organisation in a way that makes a difference to the organisation? Probably not.

If one squints now, it is pretty obvious that this is the “full-stack developer is dead” argument from some years ago.

So why would I be putting this into the ring here? Because it holds true so well – especially with SAP tech environments.
We (SAP) have tried to hammer in the idea of “push down” and “processing where the data sits” for years now, still most teams I encounter think “business logic” == “ABAP/JavaScript” and subsequentially get surprised when the DB access does not deliver the expected speed ); after all it’s the new tool, HANA, right?
Understand that this is just one example of the issue here.
It could have also been ABAP OO, SOAP architecture, or Virtual Data Models. Common with all these tools is that they are going to be used by the very same people that used to do “things the proven way” for many good years.
Common also is that the creators of those tools usually have a very different view and probably assume too much conceptual overlap with their audience. If you ever listened to one of the gurus of “Graph DBs”, “Docker” or “Haskell” you probably know that feeling of conceptual dissonance from your day to day work.

This gives a lead into another aspect of the discussion around DJ Adams’ blog post: the argument that it would “help to get more developers to SAP platforms” if GraphQL (or any other currently popular tool) would be available.
When I read claims like that I wonder if there are any organisations that use IT as part of their established infrastructure and that then go and swap out major parts of it, because some developers like the new parts better.
In my personal experience that never happens. Sure, the new marketing hero gets the reporting frontend s/he needs to be successful – but that is closely tied to tangible results.
If developers cannot show how the end result will be (very) positively affected in concrete, preferably short terms, then existing tools tend to stay in place.

Change is hard, slow and expensive and the ticket for this journey really must be worth it. And commonly the new lands look a lot nicer from a good distance.

Finding answers on workdays

Cunningham’s Law is the idea that “the best way to get the right answer on the internet is not to ask a question; it’s to post the wrong answer.“.

More often than not I feel that some of the answers provided to questions on the SAP Community platform follow this approach, probably being unaware that there is a name for this strategy.

XKCD Duty calls

Question “SAP HANA Exclude weekends while calculating diff between two dates.” is such a question.

“I have requirement to find out the Number of hour between two dates, but we have to exclude the weekends (Saturday & Sunday).

Example : DATE1 is 19-July and DATE2 is July-26 – Actual diff is – 168 Hours.

Required Output is (168 – 48 Hours (Saturday & Sunday) – 120 Hours.

This I want to achieve using Graphical or Script Cal view, please help me to solve the issue”

The OP did not bother googling for solutions for this very common problem nor does (s)he seem to have access to classic SQL development literature, like “Jeo Celko’s Thinking in Sets“.

This is one of those “throw the requirement over the fence” question and really should have been moderated.
However, as so often, an eager answerer offered his help and provided the following code to solve the problem:

<br />
DO<br />
BEGIN<br />
	declare v_i integer;<br />
	declare v_count integer;<br />
sel1 = select<br />
			&quot;0CALDAY&quot; as &quot;CALDAY&quot;,<br />
 	 		&quot;Yesterday&quot; as &quot;YESTERSDAY&quot;,<br />
	 		 sum(&quot;DATE_DIFF&quot;) AS &quot;DATE_DIFF&quot; ,<br />
	 		 sum((select 0 from dummy)) as &quot;HOUR_DIFF&quot;,<br />
	 		 sum((select 0 from dummy)) as &quot;WRK_HOUR_DIFF&quot;<br />
	 		 from  &quot;_SYS_BIC&quot;.&quot;ZTABLE&quot;<br />
--	 		 where &quot;0CALDAY&quot; &gt;= '20180701'<br />
			 GROUP BY &quot;0CALDAY&quot;, &quot;Yesterday&quot;;</p>
<p>select count(*) into v_count from :sel1;</p>
<p>for v_i in 1..v_count do<br />
	WHILE to_date(:sel1.CALDAY[v_i]) &lt; to_date(:sel1.YESTERSDAY[v_i]) do<br />
		if( weekday(to_date(:sel1.CALDAY[v_i])) ) &lt; 5<br />
		then<br />
			sel1.WRK_HOUR_DIFF[v_i] = :sel1.WRK_HOUR_DIFF[v_i] 	+ 24 ;<br />
			sel1.HOUR_DIFF[v_i] 	= :sel1.HOUR_DIFF[v_i] 		+ 24  ;<br />
		else<br />
			sel1.HOUR_DIFF[v_i] 	= :sel1.HOUR_DIFF[v_i] 		+ 24  ;<br />
		end if;<br />
		sel1.CALDAY[v_i] = ADD_DAYS(to_date(:sel1.CALDAY[v_i]), 1) ;</p>
<p>	end while ;<br />
end for;</p>
<p>select * from :sel1;</p>
<p>END;<br />

Reading through this approach several problems stick out:

  • this really shouldn’t require SQLScript at all and definitively not a loop
  • using ‘SELECT 0 from DUMMY’ to generate a constant value of integer 0 is plain wrong
  • summing constant 0 over a grouping set is plain wrong as well – the result is a constant 0
  • removing the aggregation allows for removing the grouping altogether: a SELECT DISTINCT would do the trick here
  • the number of records in sel1 depends solely on the number of different values in ZTABLE.CALDAY a table that has not been defined
  • this number of different CALDAY values is used to loop and add 24 hours to the sel1 table variable HOUR_DIFF and WRK_HOUR_DIFF to keep a tally of hours between dates, which could have been done with HANA SQL functions directly

This looks very much like the kind of SQL code developers tend to write that are not “at home” with SQL and rather try pushing an imperative programming style into their SQL code.

This code is inefficient, slow and hard to maintain.
In short: don’t do it like this.

Alternative solutions

The first approach for this type of common problem really should be to not trying to develop the solution yourself, but instead to look for existing standard solutions.
The mentioned book uses so-called auxiliary tables for this (see chapter 9.5 Calendar Tables) and so does SAP NetWeaver and also the freestyle solution presented in “Want to Calculate WORKDAYS_BETWEEN without TFACS Table?” .

The core of this solution approach is to maintain a table of all dates that indicates whether or not a given date is a weekend or not. Such a table could even hold the information if a given date has been a working day or a public holiday.

Looking at the requirement of “excluding weekends” from counting hours being able to discern workdays from holidays seem like a close-by requirement.

Let’s stick to the case of simply excluding the weekends which the OP defined as Saturday and Sunday.
When working on such a requirement it is particularly important to consider the calendar context in which the solution should work.
Certain calendars (e.g. Isreali calendar) have Friday and Saturday as their weekends.
When the underlying requirement actually is “hours of working days” this might become more complicated with different calendars. Consider, for example, that daylight saving time shifts effectively add/subtract hours – commonly over weekends, but sometimes also on working days.
The OP hasn’t mentioned anything beyond the description of a mechanical calculation problem, so we cannot know what the underlying requirement really was.

Anyway, a quick solution for the simple counting of hours on days other than Saturday and Sunday would be to use the SAP HANA built-in table “_SYS_BI”.”M_TIME_DIMENSION”.
This table provides dates in different formats such as DATE_SQL as well a day of the week number DAY_OF_THE_WEEK_INT (Monday =0, Sunday=6).

With data generated in the table, the original question can be answered as “number of non-weekend days between the given days multiplied by 24 hours

<br />
    select count(*) as no_of_workdays,<br />
           count(*) * 24 as no_of_hours<br />
    from (<br />
            select<br />
                date_sql,<br />
                day_of_week_int<br />
            from &quot;_SYS_BI&quot;.&quot;M_TIME_DIMENSION&quot;<br />
            where day_of_week not in (5, 6)<br />
            and date_sql between to_date('19.07.2018', 'DD.MM.YYYY')<br />
                             and to_date('26.07.2018', 'DD.MM.YYYY')<br />
        )<br />
    ;</p>
<p>    NO_OF_WORKDAYS	NO_OF_HOURS<br />
    6             	144<br />

Note how this differs from the OPs calculation by one day.
Checking the inner query gives us:

<br />
    DATE_SQL  	DAY_OF_WEEK_INT<br />
    19/07/2018	3<br />
    20/07/2018	4<br />
    23/07/2018	0<br />
    24/07/2018	1<br />
    25/07/2018	2<br />
    26/07/2018	3<br />

Which means that the OP did miscalculate the hours in the example, by not including the last day of the given range.
This points to another common trap when dealing with dates and intervals: one needs to be clear about whether interval end dates should be included or excluded.

The same approach can be taken with graphical calculation views:

Example calculation view with filter expression

<br />
SELECT<br />
    count(distinct &quot;DATE_SQL&quot; ) as no_of_workdays,<br />
    count(distinct &quot;DATE_SQL&quot; )  * 24 as no_of_hours<br />
FROM &quot;_SYS_BIC&quot;.&quot;sandpit/EXCLUDE_DATES&quot;;</p>
<p>/*<br />
NO_OF_HOURS<br />
144<br />
*/<br />

The broader view

I mentioned above that this common problem can be broadened to “separate working days from non-working days”.
The SAP business applications have been dealing with such problems for many decades now and so it’s not surprising that there is indeed a standard solution available for this.
This standard solution comes in the form of “FACTORY CALENDARS”.
SAP Netweaver table TFACS holds one row for each year and each calendar.
Every row contains twelve column with a “bitfield” represented by ‘1’ and ‘0’ characters for each day of the month.
A ‘1’ indicates a working day and a ‘0’ denotes a non-working day.

There are some other columns available, but for our purpose, this is what we need to know.

<br />
IDENT JAHR MON01 MON02 ... MON12<br />
AU 2018 0111100111110011111001111000111 1100111110011111001111100111 ... 0011111001111100111110010010110<br />

Typically this table is re-used or replicated from the SAP NetWeaver environment so that all date calculations are using the same calendar definitions.
If you want to use the table without an SAP NetWeaver system you could use the following commands to create it yourself:

<br />
create row table TFACS<br />
( IDENT NVARCHAR(2) not null -- factory calendar<br />
, JAHR NVARCHAR(4) not null -- year stored<br />
, MON01 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON02 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON03 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON04 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON05 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON06 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON07 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON08 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON09 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON10 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON11 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, MON12 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day<br />
, BASIS NVARCHAR(5) not null -- start of factory day numbering<br />
, FENUM NVARCHAR(3) not null -- number of days in year<br />
, WENUM NVARCHAR(3) not null -- number of workdays in year<br />
, LOAD NVARCHAR(1) not null -- load year in bufer flag ('X' = YES)<br />
, primary key (IDENT, JAHR));<br />

As an example I entered one row with the working days of Australia/Victoria in 2018:

<br />
insert into tfacs values (<br />
'AU', '2018'<br />
-- 1 2 3<br />
-- 1234567890123456789012345678901<br />
, '0111100111110011111001111000111' -- january<br />
, '1100111110011111001111100111' -- february<br />
, '1100111110011111001111100111100' -- march<br />
, '001111100111110011111001011001' -- april<br />
, '1111001111100111110011111001111' -- may<br />
, '100111110001111001111100111110' -- june<br />
, '0111110011111001111100111110011' -- july<br />
, '1110011111001111100111110011111' -- august<br />
, '001111100111110011111001111100' -- september<br />
, '011110011111001111100111110011' -- october<br />
, '11001111100111110011111001111' -- november<br />
, '0011111001111100111110010010110' -- december<br />
, ' '<br />
, ' '<br />
, ' '<br />
, ' ');<br />

With this table in place, SAP HANA provides SQL function WORKDAYS_BETWEEN (see also the SAP HANA Academy video), that returns the desired number of working days directly:

<br />
SELECT WORKDAYS_BETWEEN('AU', to_date('19.07.2018', 'DD.MM.YYYY')<br />
, to_date('26.07.2018', 'DD.MM.YYYY')) &quot;no of working days&quot;<br />
FROM dummy;</p>
<p>/*<br />
no of working days<br />
5<br />
*/<br />

Note that here, the “between” condition is understood as “workdays computed starting on the 19th and finishing on, but not including, the 26th” (see documentation on this) which seem to be the understanding of the OP.

The benefits of using calendar tables over pure function approaches are clear:

  • it’s easy to understand which days are considered working days and which days are not
  • multiple calendars can be maintained and used in parallel (different years, countries, union agreements, operating shifts, etc.)
  • the SQL/application code does not need to be changed when the calendars need updating

There you go, now you know.

Trouble with time?

There are a couple of topics that are an ongoing source of confusion even for experienced IT professionals and developers.
UNICODE handling is one example (see here for info), floating point calculations is another one (this is a good primer) and then there is everything to do with time.

Dealing with dates, time, time zones and the like is confusing and counter-intuitive in itself; adding computers to the mix doesn’t make this any better.

This post is going to shine some light on what goes into date/time handling in SAP HANA and tries to answer some questions that I came across recently.

The big picture

The SAP HANA DBMS obviously doesn’t come with its own timekeeper (though I think it would make a nice gift for HANA DBAs) but uses date/time information provided by the Linux operating system whenever it needs to know the current time.

Linux gets its time from different possible sources, starting with the hardware clock (aka BIOS clock) and, if configured, the ntpd(aemon).
The man-pages on this (hwclock, date, tz, adjtimex, /proc/timer_list) are an interesting read.

A super-short version of the story here is: When the computer is started, the hardware clock sets the current time kept by the battery and the BIOS. This time does not have a time zone assigned to it – it is open to interpretation.
At some point, Linux starts up, reads the hardware clock and sets the current system time and date considering the time zone that was declared for this computer when Linux was installed (yes, one can change this later on).
From here on the hardware clock is out of the picture. All system requests concerning current date/time are served from the Linux managed clock(s) – the System time – and likely continuously updated via the ntp-daemon.
It is possible to check the current hardware time (output in ISO 8061 format):

<br />
> sudo hwclock --utc<br />
  2018-03-26 17:57:36.608111+1100<br />

The +1100 is the time zone offset that the system time uses/assumes. The hardware clock does not ‘know’ what timezone it is in, so specifying either –utc or –localtime and checking which results in the correct local time is crucial here. Once again, the man-page for hwclock has the whole story.

Comparing this to the output of  date:

<br />
> date<br />
  Mon Mar 26 11:05:37 AEDT 2018<br />

we learn the system time is set to use the Australian Eastern time zone.
Another way to check which timezone has been set on system level is to check where the file-system link /etc/localtime points to:

<br />
> ls -l /etc/localtime<br />
  lrwxrwxrwx 1 root root 39 Mar 12 18:48 /etc/localtime -&gt; /usr/share/zoneinfo/Australia/Melbourne<br />

The /usr/share/zoneinfo folder is the system timezone database directory.
All time zones, including Daylight Saving Time (DST), that the Linux system knows about, can be found here. Since both time zone definitions, as well as DST rules, change regularly (see Time Zone News), these files are commonly updated during Linux updates.
We will come back to the topic of Daylight Saving Time later on – it is yet another can of worms.

The time of the day

Here we are, with Linux booted up and ready to go. We can log in now and ask for the current time and date (as seen above):

<br />
> date<br />
  Mon Mar 26 18:17:07 AEDT 2018<br />

But what if my user should use a different time zone?
To do that, the Linux shell provides a setting in form of an environment variable named TZ.
Setting the variable to one of the time zones in the timezone database directory allows programs to return the date in the local time of choice:

<br />
> export TZ=UTC+0<br />
> echo $TZ<br />
  UTC+0<br />
> date<br />
  Mon Mar 26 07:18:00 UTC 2018<br />

Any program started from a shell with this setting will now work with the time zone set to UTC.

Becoming a profiler

If this setting should be made permanent (just for the current user) the environment variable has to be set in the shell profile of the user.
The shell profile is an auto-start script that contains the commands the shell should execute every time it gets started.
For the common Bourne shell this script file is called .profile.
As the name starts with a dot, it is a so-called hidden file, but using ls -a in the user’s home directory lists it nevertheless:

<br />
> cd ~ # change into the user's home directory<br />
> ls -la # list all files, including the 'hidden' files</p>
<p> drwx------ 1 root      root     730 Mar 26 05:53 .<br />
 drwxr-xr-x 1 root      root     200 Jan 31 00:51 ..<br />
 [...]<br />
 drwx------ 1 root      root      16 Jun 13  2017 .hdb<br />
 -rw------- 1 root      root     146 Mar 26 07:15 .lesshst<br />
 drwx------ 1 root      root      10 Jun  6  2017 .local<br />
 drwx------ 1 root      root      34 Sep 11  2017 .mozilla<br />
 -rw-r--r-- 1 root      root      80 Jan  3  2017 .profile   &lt;&lt; this one<br />
[...]<br />

If the line export TZ=UTC+0 is added to the .profile-file then the new time zone will be set for the user every time the shell is used.

Now, for Linux based SAP systems things are a tad more … flexible. And with “flexible” of course, I mean complicated.
Putting all settings into just a single file would make managing system- or organisation-wide settings rather tedious.
Maybe some setting should be user-specific, while others shall apply to all users of a system and so on.
For that, there is the option to call other .profile scripts.

Cutting to the chase, for the SAP HANA operating system user <sid>adm we’re looking at the following profile scripts:

<br />
 .profile  - standard autostart-file for Bourne shell<br />
 .login    - C-shell version</p>
<p> .bashrc   - init script for the BOURNE and KORN Shell<br />
 .cshrc    - standard profile for the C-shell (same content, but slightly different syntax)</p>
<p> .sapsrc.sh  - looks for .sapenv scripts and executes them<br />
 .sapsrc.csh - C-shell version    </p>
<p> .sapenv_&lt;hostname&gt;.sh  - SAP system environment setting - specific to the local host<br />
 .sapenv_&lt;hostname&gt;.csh - SAP system environment setting - specific to the local host - C-shell version</p>
<p> .sapenv.sh  - SAP system environment setting<br />
 .sapenv.csh - C-shell version</p>
<p> .customer.sh  - User specific settings<br />
 .customer.csh - C-shell version<br />

Most of these files are created by the installation program for SAP HANA.
When we log in as <sid>adm the files get executed in the following order:

<br />
(1) .profile -&gt; (2) .bashrc -&gt; (3) .sapsrc.sh  -&gt; (4) sapenv_&lt;hostname&gt;.sh -&gt; (5) HDBSettings.sh (for HANA systems only)<br />
                            |              OR  -&gt; (4) sapenv.sh<br />
                            +&gt; (6) .customer.sh<br />

Looking at this impressive entanglement you might not be too clear about where to put the time zone setting.

The answer, fortunately, is easy, as nearly all the script files come with a warning in a header comment, telling you to not edit it manually.

This leaves the .customer.sh as the target file for all customization including time zone settings and additional paths or environment settings for Smart Data Access (SDA) connections.
Fortunately, somebody put a comment into .profile to make clear that all custom settings will be found in .customer.sh. That way, even the SAP-unaware Linux admin has a fair chance to figure out in which file settings are done.

SAP HANA zones in time

At this point, we can configure the system’s hardware clock, the Linux system clock and the timezone that SAP HANA should use for anything where the “local” timezone is used.

This setting sets the home-timezone for the SAP HANA installation. Important here is to pay attention to the term “installation”.
Since tenant databases become the standard for SAP HANA, it has also become possible to use “strong isolation” mode, separating SAP HANA processes of different tenant databases on Linux level. Each tenant database can run in its own Linux user security context.
It is easy now to assume that, since the databases run in their own Linux user accounts, they could each have their own environment with separate timezone settings.
This is, however, not the case, since all tenant databases of an SAP HANA installation share the same nameserver (hdbnameserver) process. And it is the environment of this nameserver process that is shared by all tenant databases.

Therefore, all databases of an SAP HANA installation share the same timezone (as well as the same OS setup for external libraries, like 3rd party DBMS clients, etc.).
If one needs SAP HANA databases with different time zones on the same system, a separate installation is necessary. (state as of SAP HANA 2 SPS 02)

What’s it all used for – SAP system timezone handling

All of the mechanisms described so far are in place to provide Linux programs access to a current time that makes sense to the user of the system.

For shared programs like SAP HANA, it is debatable what the current local time should be since users can access the system from around the globe out of completely different time zones.
Whatever time zone you decide upon, often it is simply the local timezone of where the data center is actually located, where most of the users are or the UTC timezone, you should probably pick a Standard timezone since the time changes for summer and winter time can lead to confusion with timestamp gaps and duplication, which can be especially confusing when trying to read log files.
Also, in an SAP system landscape, say with SAP NetWeaver systems based on SAP HANA, the NetWeaver application servers and SAP HANA have to have the same time, including the timezone.

Up to here, all we’ve looked at was telling the local current time. In SAP HANA one can use functions like now(), current_time(), current_date(), etc. to get the current SAP HANA system time. In many cases, there is also a UTC counterpart for these functions, e.g. current_utctime() if your application should rather work with the UTC timestamps.

SAP HANA also provides conversion functions, utctolocal()/localtoutc(), to convert between local time and UTC date/timestamp information.

Note that the documentation strongly discourages to store data in local time zone. SAP HANA date/timestamp data types don’t store the used timezone together with the data, so the application reading the data would need to keep track of that.
Instead, data should be stored as UTC and, if local date information is required, the conversion functions should be used.

Let’s stop at this point and take a look at what goes into this conversion.
I mentioned above that time zones and the rules for Daylight Saving Time (DST) are in constant change.
How can SAP HANA know what the currently correct mapping is?

In You got the time? I explained that SAP HANA can use built-in defaults or use a special set of tables that contain the current mapping information.

These tables and more information can be found in SAP note 1791342 – Time Zone Support in HANA.

To make sure the timezone data had been loaded without problems, the view M_TIMEZONE_ALERTS can be used.
If SAP HANA is started without the timezone tables loaded, this view returns:

<br />
HOST        PORT    STATUS              TIMEZONE_NAME   MISMATCH_BEGIN  MISMATCH_END<br />
skullbox    31203   TABLES NOT FOUND    ?               ?               ?              </p>
<p>DETAILS<br />
The following timezone tables could not be found or are not readable: SYSTEM.TTZD, SYSTEM.TTZDF, SYSTEM.TTZDV, SYSTEM.TTZZ</p>
<p>HOST        PORT    STATUS              TIMEZONE_NAME   MISMATCH_BEGIN  MISMATCH_END<br />
skullbox    31204   TABLES NOT FOUND    ?               ?               ?</p>
<p>DETAILS<br />
The following timezone tables could not be found or are not readable: SYSTEM.TTZD, SYSTEM.TTZDF, SYSTEM.TTZDV, SYSTEM.TTZZ<br />

Per default, SAP HANA looks for these tables in the SYSTEM schema and expects the entries to be relevant for client 001.
If you want to use a different set of tables, you can adapt these configuration parameters

<br />
= Configuration<br />
  Name                                  | Default    | System<br />
  indexserver.ini                       |            |<br />
    global                              |            |<br />
      timezone_dataset                  | sap        |<br />
      timezone_default_data_client_name | 001        |<br />
      timezone_default_data_schema_name | SYSTEM     |<br />

Important to note here is that there is only one set of conversions possible per SAP HANA database, even if you have multiple SAP clients in the database.
Likewise important: these tables need to be loaded into every tenant database and the SystemDB.
The tenant DBs cannot use the tables from the SystemDB or from another tenant DB.

Yet another caveat with the SAP HANA timezone conversion functions is that they don’t take a reference date, but always work based on the currently available conversion rules. If your application requires historical timezone conversions, then this needs to be implemented separately.

In order to check what time zones are currently known to SAP HANA the system view TIMEZONES  is available:

<br />
select * from timezones;</p>
<p>TIMEZONE_NAME     	TIMEZONE_DATASET<br />
[...]<br />
AUSNSW            	sap<br />
ALAW              	sap<br />
UTC               	sap<br />
AUSSA             	sap<br />
PST_NA            	sap<br />
BRZLWE            	sap<br />
AUSNT             	sap<br />
UTC-7             	sap<br />
Africa/Abidjan    	platform<br />
Africa/Accra      	platform<br />
Africa/Addis_Ababa	platform<br />
Africa/Asmera     	platform<br />
[...]<br />

Time zone conversion and Daylight Saving Time (DST)

At a glimpse, this whole topic of timezone conversions can look very simple. After all, it is simply adding or subtracting some hours between two time-zones, isn’t it?

To make the life of time zone converters more interesting, Daylight Saving Times have been introduced in some countries for some time. When referring to “local time” this usually includes any DST rules, while the term “Standard Time” indicates that no DST rules are applied.
The major complication with DST is not so much that the number of hours to be added or substracted changes based on the (local) day of the year, but that the mapping between timestamps with and without DST is not contiguous.

Often we think of clock-timestamp values as continuously increasing values with no gaps. It’s always one second, and another and another second.

This leads to implicit assumptions like:

  • every possible timestamp between 00:00:00 and 23:59:59 will occur.
  • every timestamp will occur only once.

Both of those assumptions are false during the days when time changes from summer to winter time or back again.
When changing from winter to summer time, the clocks are set back from e.g. 03:00:00 to 02:00:00.
Every second between 02:00:00 and 02:59:00 will happen twice this night.

One the change back, summer to winter time, clocks will be set forward, jumping from 01:59:59 directly to 03:00:00, skipping all timestamps between 02:00:00 and 02:59:59.

Here we see, why it is so important to use standard time for storing timestamps (also in filesystems and trace files, etc.): DST makes timestamps ambiguous.

There are more things wrong with the simplified model of timestamps, e.g. the need for leap seconds to adjust for drifts of the defined day length (one full turn of our planet).
“Counting” based timekeeping systems like UNIX epoch time consider leap seconds, leap years, centuries etc. Covering these is beyond this post but reading up on it is highly recommended.

More problems on a stack

Knowing all this about timezones and conversions between time zones with and without DST, we should be able to use timestamp data correctly in our programs with SAP HANA.
And yet, questions like “Problem with “summer” time (DST)” come up and even I wasn’t able to immediately pinpoint the cause of the problem.

At face value, the OP found what looked like a bug in SAP HANA’s JDBC driver.
When querying the database via JDBC certain timestamps did not seem to get returned correctly, even though the SAP HANA system timezone was set to UTC:

<br />
java -jar ngdbc.jar -u ...<br />
     -c &quot;select to_seconddate('2017-03-26 01:00:00') as \&quot;2017-03-26 01:00:00\&quot;<br />
       , to_seconddate('2017-03-26 02:00:00') as \&quot;2017-03-26 02:00:00\&quot;<br />
       , to_seconddate('2017-10-29 01:00:00') as \&quot;2017-10-29 01:00:00\&quot;<br />
         from dummy&quot;</p>
<p>Connected.</p>
<p>| 2017-03-26 01:00:00 | 2017-03-26 02:00:00 | 2017-10-29 01:00:00 |<br />
-------------------------------------------------------------------<br />
| 2017-03-26 00:00:00 | 2017-03-26 01:00:00 | 2017-10-29 02:00:00 |<br />
             ^^                    ^^                    ^^<br />

When running the same query with hdbsql and thus avoiding JDBC  the result is very different:

<br />
hdbsql -U ...  -A<br />
  &quot;select to_seconddate('2017-03-26 01:00:00') as \&quot;2017-03-26 01:00:00\&quot;<br />
   , to_seconddate('2017-03-26 02:00:00') as \&quot;2017-03-26 02:00:00\&quot;<br />
   , to_seconddate('2017-10-29 01:00:00') as \&quot;2017-10-29 01:00:00\&quot;<br />
     from dummy&quot;</p>
<p>| 2017-03-26 01:00:0  | 2017-03-26 02:00:0  | 2017-10-29 01:00:0  |<br />
| ------------------- | ------------------- | ------------------- |<br />
| 2017-03-26 01:00:00 | 2017-03-26 02:00:00 | 2017-10-29 01:00:00 |<br />

What is going on here?

The answer lies with JAVA, more precisely the way how JDBC handles timestamp data (see also here)
When fetching timestamp data from a database that does not store the timezone with the timestamp data, the JDBC method getTimestamp (int columnIndex, Calendar cal) gets used.

The method returns a java.sql.Timestamp object which in turn is based on java.util.Date. Java handles dates/timestamps as moments (instants) in time that can be expressed in calendar fields (year, month, day of month, etc.). This requires defining the calendar that should be used here.
In our example, without specifying any specific calendar, the JVM uses its default timezone.
If the JVM property user.timezone is not set, then the JVM will try to find the current operating system users timezone. If the JVM runs on Linux, that would be the value of TZ again, on MS Windows it would be the current timezone setting for the Windows user that runs the JVM.

What JDBC is doing basically means: if the JAVA program does not explicitly set the Calendar to be used, then the setting of the client OS is used. A prominent example for this is SAP HANA Studio.

To avoid this implicit, nearly hidden conversion, one can set the JVM parameter -Duser.timezone=UTC in the hdbstudio.ini file.

For the interested: JAVA comes with its own complete implementation of a timezone database and the conversion data can be – just as in SAP HANA – managed independently from the operating system. (see here, here, here, here, here, here)

Time to end

And here we are now. Two full stacks of timekeeping, time zone management and conversions with and without DST.

What do I take home from this?

To me, the JDBC behaviour was the most surprising effect, so I probably try to make sure that I set the timezone to UTC whenever I want to avoid automatic conversion (e.g. server-side data access like SDI adapters or development/DBA tools like SAP HANA Studio).
For the rest, I think a good rule of thumb is to keep all timezone and time settings the same, especially across SAP systems.

Finally for applications and analytic scenarios that rely heavily on timezone conversion, making sure to fully understand all involved conversion mechanisms is really important.

There you go, now you know!

Cheers,
Lars

How to give what you don’t have

The development tools and techniques for SAP HANA have been extended and enhanced a couple of times since SAP HANA had first been released in 2011.
For apps developed for SAP HANA XS, the new XSA environment means a considerable change in the way application code accesses the database.

In his post “Synonyms in HANA XS Advanced, Accessing Objects in an External SchemaChristoph Gilde explains in detailed steps how XSA applications can get access to the database schema of XS classic applications. This might not be the loose coupling aimed for with a microservices architecture, but it is an efficient way of building new functionality in XSA while retaining an existing XS classic code base.

Out with the old, in with the new? Not so fast!

One example, where this is being done is the SAP Health application Medical Research Insights 2. Starting with FP 5, several new features have been added as XSA apps, while keeping the base application and the underlying Connected Health Platform as XS classic applications.

Christoph’s post covers all necessary steps to enable such a cross-application data access but relies on a prerequisite that can be hard to fulfil.
The very first step in the tutorial is to create two roles in the XS classic schema that group all required privileges to the XS classic database objects. One of these roles has to include the privileges with the GRANT OPTION that allows granting those privileges further to other users or roles.
It is fairly obvious that this is an extremely wide-ranging privilege that can make controlling who has access to which data and who manages this access very difficult. For this reason, the XS classic repository roles don’t allow the use of the GRANT OPTION.

But Christoph post does not mention anything about repository roles, so what is the problem here, you might ask.

What one can do and what one cannot do

One point is, that having to create roles manually via SQL makes the handling of the application life-cycle more complex. Ideally one wants to have an XS classic application fully contained in one or more DUs that can be transported (installed, imported) together. The less manual installation activities required, the better.

The more important point, though, is that only the owner of an object or a user with the GRANT OPTION privilege can go on and grant other users privileges with the GRANT OPTION.

In case of an application that uses a .hdbschema repository object to define the application schema as recommended, the owner of that schema is the _SYS_REPO user.
This special SAP HANA system user cannot be used to login to the database, so even if one would settle for a manual installation step, there technically is no way to run a GRANT … WITH GRANT OPTION as the _SYS_REPO user.

A way out

So, how can we make _SYS_REPO grant privileges to a user with the GRANT OPTION?
Meet “SQLScript procedure with DEFINER RIGHTS“!
This setting allows executing the procedure as the user that created it.

For repository procedures, this “creator” is always _SYS_REPO.
By now you probably see where this is going.

To have _SYS_REPO grant with GRANT OPTION, we create following repository objects:

  1. external_access_g.hdbrole
    (see Christoph’s post for the two types of roles for external access)
    This one is just an empty role file.
    Don’t get confused by the ‘pred’ prefix in the package name, this is just the name I used for the demo package and without functional relevance

    <br />
    role pred.roles::pred_external_access_g<br />
    {}<br />
    

  2. external_access.hdbrole
    This one contains all access privileges without the GRANT OPTION.
    For the sake of this example, I chose SELECT on the schema.

    <br />
    role pred.roles::pred_external_access<br />
    {<br />
    catalog schema &quot;PRED&quot;: SELECT;<br />
    }<br />
    

  3. grantSchemaAccess.hdbprocedure

    <br />
    PROCEDURE &quot;PRED&quot;.&quot;pred.roles::grantSchemaAccess&quot; ( )<br />
    LANGUAGE SQLSCRIPT<br />
    SQL SECURITY DEFINER<br />
    DEFAULT SCHEMA PRED<br />
    AS<br />
    BEGIN<br />
    exec 'GRANT SELECT on schema PRED to &quot;pred.roles::pred_external_access_g&quot; WITH GRANT OPTION';<br />
    END<br />
    

    Important to note here:

    • SQL SECURITY DEFINER – this really is core to this approach. Leave it on the default INVOKER and it won’t work.
    • no READS SQL DATA – we use Dynamic SQL in this procedure which counts as READ/WRITE access.
    • the GRANT statement is executed via ‘exec’.
      If you try to put the GRANT in the procedure body directly, you will get the following error

      message:Error while activating /pred/roles/grantSchemaAccess.hdbprocedure:[pred.roles:grantSchemaAccess.hdbprocedure] Syntax error in procedure object: feature not supported;
      ddl statement 'GRANT SELECT on schema PRED to "pred.roles::pred_external_access_g" WITH GRANT OPTION' is not supported in procedure/function: line 11 col 5 (at pos 288)
      

    (BTW: To whoever decided that it was a good thing to disable selecting text in the console area of the Web-Based Development Workbench Editor: it’s not!
    Being able to copy error message output is rather important and having to manually change the ‘user-select’ option for the CSS style is rather cumbersome )

  4. Another role that covers the privilege to run the procedure

    <br />
    role pred.roles::installUser<br />
    {<br />
    sql object pred.roles::grantSchemaAccess: //Objecttype: PROCEDURE<br />
    EXECUTE;<br />
    }<br />
    

With these objects in place, the setup procedure for granting privileges looks like this:

  1. Import the DU with the objects created above into the target system.
  2. Grant the installUser role to the user that performs the installation and setup activities (e.g. INSTALLUSER).
  3. As the INSTALLUSER execute the procedure:

    <br />
    call &quot;PRED&quot;.&quot;pred.roles::grantSchemaAccess&quot;();<br />
    

  4. Check that the assignment was successful:

    select *<br />
    from &quot;GRANTED_PRIVILEGES&quot;<br />
    where<br />
    grantee like 'pred_riles::external_access%'<br />
    

    GRANTEE                             GRANTEE_TYPE GRANTOR   OBJECT_TYPE  SCHEMA_NAME  ...  PRIVILEGE IS_GRANTABLE IS_VALID
    pred.roles::pred_external_access    ROLE         _SYS_REPO SCHEMA       PRED         ...  SELECT    FALSE        TRUE
    pred.roles::pred_external_access_g  ROLE         _SYS_REPO SCHEMA       PRED         ...  SELECT    TRUE         TRUE
    
  5. Revoke the role to execute the procedure from the INSTALLUSER again.
    As this is a setup-time-only activity, the role and the privilege should not be granted to anyone once the setup is done.
    To see who the role has been granted to run the following SQL statements:

    <br />
    select *<br />
    from &quot;GRANTED_PRIVILEGES&quot;<br />
    where object_type = 'PROCEDURE'<br />
    and object_name = 'pred.roles::grantSchemaAccess';<br />
    

    –> only the procedure pred.roles::installUser should be returned here.

    <br />
    select *<br />
    from &quot;GRANTED_ROLES&quot;<br />
    where<br />
    role_name = 'pred.roles::installUser';<br />
    

    –> only _SYS_REPO should be returned here.

Once these steps are successfully finished, the remaining procedure is the same as described in Christoph’s blog post, continuing at step 2.

Due to the critical nature of granting WITH GRANT OPTION one might also consider to set up auditing the roles and the procedure so that any use of them gets logged.

There you go, now you know!

Cheers,
Lars

Maps and Paths

A brief twitter exchange triggered some thoughts about expectations and how those determine whether something is a good A, a passable B or a terrible C.

Paths

Turn by turn navigation in a mobile app
Turn by turn navigation in a mobile app

When you want to go somewhere and you do not know the way, you need directions. Somebody or something needs to show you the path to follow. That’s what turn by turn navigation on your mobile does. Following this path does not mean you got to know the area. You know one just one way to get from A to B know.

Maps

London Underground map from 1908
London Underground map from 1908

A map, on the other hand, provides an overview, context and alternatives but no immediate path.
Using a map means you have to actively engage with the information, see for yourself which roads cannot be taken (maybe you are in a huge truck and the small sideways with wood bridges are not going to work for you) and eventually plot your own path.

Both maps and paths, are related but are far from being the same.

A useful analogy

This analogy works not just for navigating transportation networks.

If your company wants to build an information system to do X then usually it will give the job to someone with a path to get there. It does not need to discuss all the alternative technologies or architectures. When X can be achieved reliably, in-time and on-budget, then that’s usually already a great outcome. Getting such a path (think project) is commodity shopping.

On the other hand, when you want to do new things or old things in a very different way than before, a path is not the answer to that. This is where you need to work with a map. You need different viewpoints and scenarios to walk through. This situation requires you to discover and create a path. You can buy help for this process but you cannot skip the work.

(You might notice, that product or service roadmaps are specifically not maps. They are usually paths with a decreasing level of certainty towards the end. They can provide a heads-up to what might be coming but they don’t show alternatives, contexts or overview.)

To avoid disappointment and frustration it is important to be clear about what you expect.

Are you asking questions in user forums to get a map? Do you want blog posts to give you a path from A to B? Do you want to discuss options and discover/create paths?

For the SAP Community Platform getting answers to such questions will be important to avoid further disappointments.

Is SAP HANA SQL/SQLScript Turing complete?

There was an interesting (aka nerdy) question in the SAP Community Q&A the other day:
Are SQLScript and HANA SQL Turing complete?”

I took a swing at it and the following is my result (a slightly edited version of my original answer).


Alright, let’s do this! (even though I don’t have an idea why it would matter at all…)

So, I’m not a computer scientist, and only a third through “The Annotated Turing” (which seems to be a great book, btw). This means what follows is the layman’s approach on this.

First off, I had to understand, what is meant by “Turing complete” and how to show that a language has that characteristic.

For that, there are a couple of Wikipedia entries and SO discussions available and I leave that for everyone to read (e.g. here and here).
One of those discussions linked to a presentation that claimed to prove that PostgreSQL-SQL with recursive common table expressions (CTS) is Turing complete. In order to prove this, the author of the presentation (here) said, that it’s enough to prove that a language can emulate another language that has already been shown to be Turing complete.
Fair call.
The author’s choice was a cyclic tag system with a specific rule set (rule 110) which apparently has been shown to be Turing complete.
Then the author goes on and implements this cyclic tag system with a recursive common table expression and thereby proves the claim.

Yippie.

So, what does that mean for SAP HANA SQL?
SAP HANA SQL/SQLScript does not support recursive common table expressions (much to the distaste of everyone who tries to handle hierarchies and does not know about SAP HANA’s special hierarchy views and functions (look there) and it also does not support recursive procedure calls.

Bummer, one might think.
Fortunately, every recursion can be expressed as an iteration (compare here), so I thought, let’s try this cyclic tag system in SQLScript.
This is the result (HANA 1, rev.122.15, on my NUC system). The SQL code is also available in my GitHub repo.

<br />
do begin<br />
declare prods VARCHAR(4) ARRAY;<br />
declare currProd, initWord, currWord VARC&lt;span data-mce-type=&quot;bookmark&quot; style=&quot;display: inline-block; width: 0px; overflow: hidden; line-height: 0;&quot; class=&quot;mce_SELRES_start&quot;&gt;&lt;/span&gt;HAR(300); -- 300 is arbitrary and would be exceeded for more runs<br />
declare currProdNo integer = 0;<br />
declare runs, maxruns bigint = 0;</p>
<p>	initWord :='11001';                   -- the starting/initial 'word'<br />
	maxruns := 100;                       -- a limit to the number of iterations<br />
                                              -- rule 110 is suspected to run indefinitively<br />
    prods = ARRAY ('010', '000', '1111');     -- the three 'producer rules' stored in a string array</p>
<p>    currWord := :initWord;<br />
    runs := 0;<br />
    -- dummy table var to monitor output<br />
    tmp = select :runs as RUNS, :currProd as CURRPROD, :currWord as CURRWORD<br />
    from dummy;</p>
<p>    while (:runs &amp;lt; :maxruns) DO<br />
        runs := :runs+1;</p>
<p>        currProdNo :=  mod(:runs,3)+1;    -- pick rule no. 1,2 or 3 but never 0<br />
                                          -- as SQLScript arrays are 1 based<br />
        currProd := :prods[:currProdNo];</p>
<p>        if (left (:currWord, 1)='1') then  -- add current producer to the 'word'<br />
            currWord := :currWord || :currProd;<br />
        end if;</p>
<p>        currWord := substring (:currWord, 2); -- remove leftmost character</p>
<p>        -- save current state into temp table var<br />
        tmp = select RUNS, CURRPROD, CURRWORD from :tmp<br />
              union all<br />
              select :runs as RUNS, :currProd as CURRPROD, :currWord as CURRWORD<br />
              from dummy;</p>
<p>    end while;</p>
<p>    select * from :tmp;                      -- output the table var<br />
end;<br />

Running this gives the following output:

<br />
/*<br />
Statement 'do begin declare prods VARCHAR(4) ARRAY; declare currProd, initWord, currWord VARCHAR(300); declare ...'<br />
successfully executed in 7&amp;lt;span data-mce-type=&quot;bookmark&quot; style=&quot;display: inline-block; width: 0px; overflow: hidden; line-height: 0;&quot; class=&quot;mce_SELRES_start&quot;&amp;gt;&amp;lt;/span&amp;gt;17 ms 39 µs  (server processing time: 715 ms 590 µs)<br />
Fetched 101 row(s) in 2 ms 517 µs (server processing time: 0 ms 424 µs)</p>
<p>RUNS    CURRPROD    CURRWORD<br />
0       ?           11001<br />
1       000         1001000<br />
2       1111        0010001111<br />
3       010         010001111<br />
4       000         10001111<br />
5       1111        00011111111<br />
6       010         0011111111<br />
7       000         011111111<br />
8       1111        11111111<br />
9       010         1111111010<br />
10      000         111111010000<br />
11      1111        111110100001111<br />
12      010         11110100001111010<br />
13      000         1110100001111010000<br />
14      1111        1101000011110100001111<br />
15      010         101000011110100001111010<br />
16      000         01000011110100001111010000<br />
17      1111        1000011110100001111010000<br />
18      010         000011110100001111010000010<br />
19      000         00011110100001111010000010<br />
20      1111        0011110100001111010000010<br />
21      010         011110100001111010000010<br />
22      000         11110100001111010000010<br />
23      1111        11101000011110100000101111<br />
24      010         1101000011110100000101111010<br />
25      000         101000011110100000101111010000<br />
[...]<br />
*/<br />

That looks suspiciously like the output from the Wikipedia link (above) and does not seem to stop (except for the super-clever maxruns variable ).
With that, I’d say, it’s possible to create a cyclic tag system rule 110 with SQLScript.
The tag system is Turing complete, therefore SQLScript must be Turing complete.

Still, I have no idea why this would matter at all and that’s really all I can say/write about this.

Cheers,
Lars

Not a New Year’s thing

Happy New Year, dear reader!

I hope you had an enjoyable and relaxing time between Christmas and New Years Eve with your thoughts away from work. I can say, I did.

During this time I took the opportunity to finish some thoughts on my engagement with the SAP Community Platform (SCP) where I have been a moderator for the SAP HANA topic/tag for several years.
I have also been a top contributor for most of the time since I joined the platform and its predecessors – my current profile picture there states that it has been 14 years and a bit by now.

During my tenure, the SAP HANA community grew, new volunteers joined the moderator team and many experts and users shared their experiences, insights and questions. Today, this community is the place to go to ask SAP HANA related questions and I take pride in knowing that I helped to kick-start it.
But now, no starting assistance is necessary any longer and I my professional focus has shifted to less HANA centric topics (I work in the SAP Health team) and so I decided to hang up the “moderator’s hat”.

I have worked on a second “project” that initially triggered setting up the Lars Breddemann Blog: all my blog posts covering topics from SAP on Oracle and MaxDB over to SAP Business Warehouse and SAP HANA can now be found right here.
Given their first publication dates, many of them are outdated, but some seem to still enjoy interest from readers.
As the SCP was re-platformed a couple of times, I think it is easier to have them at a steady “home” with this personal blog.

Since there is no automatic way to download all my blogs from SCP and upload them to WordPress, this migration project was a manual copy & paste activity during several long evenings and weekends. The comments and discussions to the blog posts could not be copied. To read those the interested reader still needs to go back to the SCP version of the blog post.

Having to re-read my own blogs from the last ten years was quite the experience. Look the horrid English, behold the dire need for a copy editor and witness in shock and awe the ever repeating phrases!
It was a tough reminder for me to try to be more careful with my texts. I will try to follow the guidance of real writers (like this) in future texts.

So long, dear reader, have yourself an interesting 2018!

What’s your number?

Have you been using the ISNUMERIC() function of MySQL, MS SQL Server or PostgreSQL and wondered what to use in SAP HANA instead?
If so, this is the blog for you.

Just recently I’ve seen a customer using my old workaround (first published here) and then a few days later the related stackoverflow question got updated.

This topic is not dead

The new answer to the SO discussion took a popular approach to the IS_NUMERIC problem:

If the string can successfully be converted into a numeric type, then the value is numeric (IS_NUMERIC returns 1 for TRUE), else it’s not (IS_NUMERIC returns 0 for FALSE).
This is a completely feasible approach (and used on other platforms as well, e.g. Oracle or PostgreSQL) and provides a result that allows making the decision of whether or not the tested value can be turned into a number data type in HANA.

IT DEPENDS, as usual

The problem, or rather the limitation of this approach is, that numeric can mean different things in different circumstances.
Sometimes, one might want to accept ‘123.456,–‘ as numeric. Sometimes “$500.00 –” shall be OK, and so on.
Checking the T-SQL definition of ISNUMERIC (here and here) shows that there are plenty other options.

This means nothing else that there are alternative approaches for this problem.
One of those alternatives could be to use regular expressions (regex), which have been supported by HANA for quite some time now (since SP10 if I’m not mistaken).
With tools like https://regex101.com/ one can comfortably work out a pattern that matches or doesn’t match the kind of data one is interested in.

It might well be, that in your application the question is not about whether a string or sub-string can pass IS_NUMERIC() but rather something more specific.
The customer that I mentioned earlier, for example, wanted to see if a string looks like an ICD10 code (one character followed by two digits, ‘B13’, ‘A00’, etc.). Snipping out the first character and putting the rest into an IS_NUMERIC() function is not ideal here.

Instead, the regex ‘^\D\d{2}.’ can be used to capture the whole pattern.

Show me how it’s done

Let’s take a rather simple regex that matches any string that

  • is not a number between 0-9
  • is not a plus (+), minus (-), comma (,) or full stop (.)
  • is not an exponent ‘E’ (as in 1.23E02)

Such a regex is ‘[^0-9-.\,+e]’ (check here) with flags for global, case-insensitive matching.

In SAP HANA SQL we can use this regex in the LOCATE_REGEXPR() function. If it doesn’t find any matching string, then the result is 0, else it will return the location of the first matching character.
Since we are only interested in those cases, where no matches were found, we can easily MAP this output to make up a IS_NUMERIC value:
1 = no unwanted characters found, it’s numeric
0 = unwanted characters found, it’s not numeric

MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                    FLAG 'i' 
                    IN <string to be searched>)
     , 0, 1
     , 0)

So, what’s better now?

Using the “try-the-conversion“-approach can only be done via a procedure or a scalar function. This brings some overhead with it for the function invocation which can be avoided, when using the regex directly in the SQL statement.

As an example, I ran a quick test against 10.000.000 unique values (all tests were done against HANA1 1.00.122 and HANA2 2.00.020).

Conversion-approach function

select 
       sum("IS_NUMERIC_CONV"(id_char) )
from 
       "DEVDUDE"."DIM10000000"
where 
       "IS_NUMERIC_CONV"(id_char) = 1
   and "IS_NUMERIC_CONV"(id_char) IS NOT NULL;

This ran for roughly 19.5 seconds with 8 CPU cores at 100% usage.
And this is the runtime when all values actually can be converted correctly. With the same number of non-numeric values, the runtime goes up to 2:45 mins.

Regex-approach pure SQL

select 
       SUM (IS_NUM_REGEX)
from 
     (select 
             MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                                 FLAG 'i' 
                                 IN id_char)
                  , 0, 1
                  , 0) IS_NUM_REGEX
       from 
            "DEVDUDE"."DIM10000000")
where 
     IS_NUM_REGEX = 1 
 and IS_NUM_REGEX IS NOT NULL;

This ran for roughly 8.6 seconds with 8 CPU cores at 100% usage. Same runtime for both convertible and non-convertible values, here.

Too good to be true

By now you maybe think: “Gee, that regex thing is a lot faster, but also really ugly code. Let’s put that into a function and have our cake and eat it too!“.

Off we go:

FUNCTION "DEVDUDE"."sandpit::IS_NUMERIC_REGEX" (
                 in stringToCheck nvarchar(5000)) 
         RETURNS isNumeric integer
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER AS
BEGIN
     isNumeric := MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                                      FLAG 'i' 
                                      IN :stringToCheck)
                       , 0, 1
                       , 0);
END;

A statement using this function takes roughly 30 seconds with 8 CPU cores at 100% usage.

No cake for us!

As long as all our input data is clean, the conversion-function approach works nicely but really becomes expensive, when there are many cases, for which the exception handling needs to be executed. The lesson here is probably that it’s not a great (performance) idea to make exception handling part of your main processing path.

A truly high performing solution would need to be implemented as a native HANA function, so for it needs to be carefully checked what exactly the job of IS_NUMERIC()  shall be in your application and how many matches/non-matches are expected.

There you go, now you know!

Cheers,
Lars

HANA inverts the Thunderdome law

If you’ve watched “Mad Max Beyond Thunderdome”  then “… You know the law: Two men enter, one man leaves …“.

MV5BNDM4N2NkZmEtMjQ2Yi00ZTBkLWExODYtYWJiYzkwNWQ4NDQ4XkEyXkFqcGdeQXVyMTI3MDk3MzQ@._V1_Watch out, he’s coming for you! — pic from www.imdb.com.

HANA, of course, reverses, sorry, disrupts this law by making two (or more) values get out of functions, even when only one input value was provided.

Wait, what?

Alright, this Thunderdome reference is really a massive plug and only used to get your attention here. Now, that I seem to have it, let us get to the topic.

If you’ve been using SQLScript for some time you are aware of user defined functions (UDF). These UDFs can nowadays handle both scalar and table typed parameters, both for the input and for the output aka result of the function.

The main purpose of functions, just like in other programming languages, is code reuse and making the code that uses them easier to understand.
With some thought about the naming, one can even make up parts of a domain specific language for the problem at hand.
Think of

applyTaxTRate(sales_price) as "priceWithGST"

in your SELECT statements instead of

 CASE sales_price < 10 
 THEN sales_price 
 ELSE sales_price * 0.20 
 END as "priceWithGST"

as an example.

All good so far. Now, with HANA performance considerations are nearly always the main concern, so a question often asked is:
Isn’t using functions bad for performance?

Which I can only answer with
Isn’t using high-level languages bad for performance?
And indeed, it is easy to lose some microseconds and several MegaBytes of memory by using functions compared to “pure” SQL. But then again, the main goal for a developer is to implement a functionality (correctly) that eventually provides value for the users.

So, my recommendation is to use functions where it helps with making the programmers more productive and worry about the performance impact, when one can actually measure it.

Back to the topic of this blog.

I had looked at a question on stackoverflow a while ago which basically asked, why using UDFs with more than one result seems to lead to slower execution times, the more results are actually used.

So something like

select 
      getUserDetails (143776).userName
    , getUserDetails (143776).expiryDate
    , getUserDetails (143776).currentAge
    , getUserDetails (143776).customerSegment
from dummy;

would take around four times as long as

select 
    getUserDetails (143776).userName
from dummy;

even though we’re running the same function with the same input parameter.
The programmers’ intuition, well at least mine and that of the OP from stackoverflow, was that HANA would evaluate the function once for the provided input parameter and use the different results from this one execution.

Some testing of the query execution times showed that this could not be the case.
A more in-depth analysis was in order here.

The try setup

For my “learn about the system by poking around“-try I created a very simple scenario: a user account table and a query to find the date for when any user account would expire.
Not wasting time with inventing play-doh data I simply ran:

create column table myusers as (select * from users);

With that in place a UDF was easy to build:

drop function userdetails;
create or replace 
 function userDetails(user_id BIGINT) 
 returns userName NVARCHAR (256),
         expiryDate DATE
as
begin
   select user_name
        , to_date(add_days(valid_from, 180)) as expiryDate
    into userName
      , expiryDate
   from 
         myusers
   where 
         user_id = :user_id;
end;
select userDetails (143776).userName
     , userDetails (143776).expiryDate
from dummy;

USERDETAILS(143776).USERNAME USERDETAILS(143776).EXPIRYDATE
_SYS_REPO                    21.01.2018

This should be equivalent to this pure SQL statement:

select user_name
     , to_date(add_days(valid_from, 180)) as expiryDate
from 
     myusers
where 
     user_id = 143776;
 
USER_NAME EXPIRYDATE
_SYS_REPO 2018-01-21

Looks good, I’d say.

Checking on the execution times shows:

UDF (multi-result)

successfully executed in 39 ms 67 µs (server processing time: 37 ms 819 µs)
successfully executed in 42 ms 929 µs (server processing time: 41 ms 60 µs)
successfully executed in 44 ms 13 µs (server processing time: 42 ms 492 µs)

pure SQL multi-result)

successfully executed in 1 ms 191 µs (server processing time: 0 ms 385 µs)
successfully executed in 1 ms 933 µs (server processing time: 0 ms 825 µs)
successfully executed in 2 ms 507 µs (server processing time: 0 ms 827 µs)

While this is a big difference (30 times) it is actually the wrong one for the current question. What we need to look at is the difference between executions with just one or two results.

Running the statements with just the expiryDate output:

UDF (single parameter)

successfully executed in 29 ms 443 µs (server processing time: 28 ms 200 µs)
successfully executed in 30 ms 61 µs (server processing time: 28 ms 610 µs)
successfully executed in 29 ms 586 µs (server processing time: 28 ms 131 µs)

pure SQL (single parameter)

successfully executed in 1 ms 917 µs (server processing time: 0 ms 735 µs)
successfully executed in 1 ms 63 µs (server processing time: 0 ms 308 µs)
successfully executed in 2 ms 864 µs (server processing time: 0 ms 742 µs)

And here we do see an important difference: the pure SQL runtime kept stable while the UDF fetching just a single result did so quite a bit faster.

Phase One “Reproducing the issue” successfully done.

Moving on with the analysis, I played around with the statements and wanted to check what happens when you select a non-existing user:

Trouble with no data

UDF
select 
     userDetails (-4711).expiryDate
from dummy;

Could not execute 'select userDetails (-4711).expiryDate from dummy' in 40 ms 99 µs . 
SAP DBTech JDBC: [478]: 
user defined function runtime error: "DEVDUDE"."USERDETAILS": 
line 9 col 5 (at pos 131): 
[1299] (range 3) no data found exception: 
no data found at user-defined function (at pos 8)
pure SQL
select
     to_date(add_days(valid_from, 180)) as expiryDate
from 
     myusers
where 
     user_id = -4711;

Shoot!

When there is no matching record we get an error with our UDF. That is likely not what we want, so we need to cater for that in the code.
Here we can use the EXIT HANDLER to catch the ‘NO DATA FOUND’ SQL error.

create or replace 
 function userDetails(user_id BIGINT) 
 returns userName NVARCHAR (256),
         expiryDate DATE
as
begin
declare exit handler for SQL_ERROR_CODE 1299 
 begin 
     userName := NULL;
     expiryDate := NULL;
 end;

select user_name
     , to_date(add_days(valid_from, 180)) as expiryDate
 into userName
    , expiryDate
 from 
      myusers
 where 
      user_id = :user_id;
end;

With this change, the error is caught and the UDF returns NULL values. That’s not the same as an empty set, but good enough for this case and a lot better than the error.

select 
      userDetails (-4711).expiryDate
from dummy;

USERDETAILS(-4711).EXPIRYDATE
?

Trouble with tracing

The next step of the analysis should be to actually show that the UDF is executed once for each result.

I looked into the different usual suspects for statement analysis and was rather disappointed:
SQL Plan Cache, Expensive Statement Trace and PlanViz all only show the top-level statement + statistics.

PlanViz shows a “Row Search”-Pop as the final processing node and this contains the projection list but that’s about it.

Name: Row Search
ID: ID_EA341305E062BA47B440FD71F07CA318_0
Execution Time (Inclusive): 19.9 ms
Execution Time (Exclusive): 19.893 ms
Execution Start Time: 6,934.218 ms
Execution End Time: 6,954.174 ms
CPU Time (User): 19.883 ms
Projected Cols: 
 "DEVDUDE"."USERDETAILS"(143776).USERNAME, 
 "DEVDUDE"."USERDETAILS"(143776).EXPIRYDATE

So what would be a good way to find out whether the function had been called more than once during the statement execution?
I didn’t want to go some low-level tracing, so I came up with a modification of the function:

Pudding, proofs… is it Christmas yet?

create sequence pcnt;

drop function userdetails;
create or replace 
 function userDetails(user_id BIGINT) 
 returns userName NVARCHAR (256),
       expiryDate NVARCHAR (30)
as
begin
 
declare exit handler for SQL_ERROR_CODE 1299 
   begin 
     userName := NULL;
     expiryDate := NULL;
   end;

    select user_name|| '_> '|| pcnt.nextval
         , to_nvarchar(add_days(valid_from, 180), 'DD.MM.YYYY') ||'_> '|| pcnt.nextval as expiryDate
     into userName
        , expiryDate
    from 
         myusers
    where 
       user_id = :user_id;
end;

I am going to use a sequence to add the current counter for execution to the output values.

In pure SQL the sequence is bumped up once for each output row:

select user_name || '_> '|| pcnt.nextval
 , to_nvarchar(add_days(valid_from, 180), 'DD.MM.YYYY') ||'_> '|| pcnt.nextval as expiryDate
 from 
 myusers
 where 
 user_id = 143776;
Exection …
USER_NAME||'_> '||PCNT.NEXTVAL EXPIRYDATE 
_SYS_REPO_> 30                 21.01.2018_> 30
Next exection…
USER_NAME||'_> '||PCNT.NEXTVAL EXPIRYDATE 
_SYS_REPO_> 31                 21.01.2018_> 31

That means, if the UDF is really called once per output argument, then we should see increasing numbers for the same result row.
Let’s see what we get:

select userDetails (143776).userName
     , userDetails (143776).expiryDate
from dummy;

USERDETAILS(143776).USERNAME USERDETAILS(143776).EXPIRYDATE
_SYS_REPO_> 32               21.01.2018_> 33

Here we got our proof, the function actually is executed twice.

Ok… so what?

Alright, up to here I spend your time and attention on explaining how to find out that HANA does not do the thing we liked it to do.
If that was the end of it, you rightfully would ask, what this was for.

Worry not! There is a good end in sight!

The answer lies with using the UDFs in SQLScript blocks.
But not like this:

do begin
declare _userName   NVARCHAR(256);
declare _expiryDate NVARCHAR(30);
 
    _userName  := userDetails (143776).userName;
    _expiryDate:= userDetails (143776).expiryDate;
 
    select _userName, _expiryDate from dummy;
end;
:1              :2 
_SYS_REPO_> 34  21.01.2018_> 35

We have to step up our SQLScript game here.
While we’re already avoiding the stupid SELECT … INTO FROM dummy workaround for assigning function results to variables, it’s obviously calling the function twice now.

Instead, we need to use the function in a single assignment.
How can you assign values to multiple variables in a single assignment?“, you ask?

I did the same until one of our SQLScript developers pointed me to something that is, well, “mentioned” in the documentation. I put “mentioned” in quotation marks, as this non-obvious programming construct is not really explained, but shows up in just one example code (this one)

The solution and answer

do begin
declare _userName   NVARCHAR(256);
declare _expiryDate NVARCHAR(30);
 
    (_userName, _expiryDate) := userDetails (143776);
 
    select _userName, _expiryDate from dummy;
end;
:1               :2 
_SYS_REPO_> 36   21.01.2018_> 36

With this multi-result assignment, we not only have just one function call but also much nicer to read code.

There you go, now you know.

Cheers from Mad Max-country,

Lars

 

SAP HANA, databases and computers