SQL refactoring example

About a year ago I answered a question on the SAP Community Q&A site and have it on my posts-to-write list ever since to turn it into a blog post.

I feel that I learn a lot from seeing applied examples of concepts and rules – for example from this great presentation from Kevlin Henney that includes a “talk-through” a refactoring example. So, this is my example for others to review.

The premise of the question is rather common: a piece of code that was used on Oracle should be made to run on SAP HANA.
The OP had already managed to translate several parts of the Oracle code to work on SAP HANA but got stuck with an SQL Script error message.
The SAP HANA version used by the OP did not support BETWEEN as a check condition in an IF-clause. 
Current SAP HANA versions meanwhile have support for this but the code suffered from much more than just lack of syntactic sugar.
Here’s the original user-defined scalar function:

create FUNCTION "KABIL_PRACTICE"."F_KEEP_DATE"( TDATE DATE, TTODAY DATE) 
RETURNS RETVALUE CHAR
LANGUAGE SQLSCRIPT   
SQL SECURITY INVOKER AS
/******************************************************************************
NAME: F_KEEP_DATE
PURPOSE: RETURNS T/F WHETHER OR NOT THE DATE MEETS THE CRITERIA TO MAINTAIN DATA.
REVISIONS:
VER DATE AUTHOR DESCRIPTION
--------- ---------- --------------- ------------------------------------
1.0 6/18/2008 1. CREATED THIS FUNCTION.
******************************************************************************/
BEGIN
RETVALUE := 'F';
IF :TDATE BETWEEN ADD_DAYS (TO_DATE (:TTODAY, 'YYYY-MM-DD'), -30) AND :TTODAY 
OR  TO_CHAR( EXTRACT (MONTH FROM ADD_DAYS (TO_DATE (:TDATE, 'YYYY-MM-DD'), 1)))<> 
TO_CHAR( EXTRACT (MONTH FROM TO_DATE (TDATE, 'YYYY-MM-DD')))
THEN 
RETVALUE := 'T';
END IF;
--RETURN :RETVALUE;
END ;

Step 1: remove the fluff

First, remove the useless fluff of comments. The change history of the code is captured in the code repository, so keeping a log of changes is pointless here.
This implies that there is a code repository, of course. SAP HANA comes with the repository for XS (classic) and git support for XSA out of the box and both options allow for distributed development and change tracking of development resources.
Just removing the comments and adding line-breaks and some indentation gets us to this:

create FUNCTION "KABIL_PRACTICE"."F_KEEP_DATE"
                    ( TDATE DATE, TTODAY DATE) 
RETURNS RETVALUE CHAR
LANGUAGE SQLSCRIPT   
SQL SECURITY INVOKER AS
BEGIN
    RETVALUE := 'F';
    
    IF :TDATE BETWEEN ADD_DAYS (TO_DATE (:TTODAY, 'YYYY-MM-DD')
                               ,-30) 
              AND :TTODAY 
       OR TO_CHAR(EXTRACT(MONTH FROM ADD_DAYS 
                          (TO_DATE (:TDATE, 'YYYY-MM-DD'), 1)))
           <> TO_CHAR(EXTRACT(MONTH FROM 
                           TO_DATE (:TDATE, 'YYYY-MM-DD')))
    THEN 
        RETVALUE := 'T';
    END IF;
END;

Step 2: Find out what the code should do

 It’s a single IF statement, so the job of this function is to encapsulate the condition check.
Let’s focus on that and find out what are the conditions checked here:

:TDATE BETWEEN ADD_DAYS (TO_DATE (:TTODAY, 'YYYY-MM-DD'),-30) 
           AND :TTODAY 
OR 
   TO_CHAR(EXTRACT(MONTH FROM ADD_DAYS 
                          (TO_DATE (:TDATE, 'YYYY-MM-DD'), 1)))
<> TO_CHAR(EXTRACT(MONTH FROM 
                           TO_DATE (:TDATE, 'YYYY-MM-DD')))

We see that there are in fact two conditions of which either one needs to be true to make the whole function true.
Still, with all the data type conversion in place, it is hard to see, what is being checked here. 
So, let’s get rid of them, where we can.

Step 3: remove type conversions where possible

Something I see quite a lot in converted code is that date values get converted into text values and back again. This is probably due to practices on Oracle, but on SAP HANA, date values can be used directly without the need for conversion or data format specification. A date does not have a specific format, this only comes into play when the display should be printed to the screen. More on this here.

:TDATE BETWEEN ADD_DAYS (:TTODAY, -30) AND :TTODAY 
OR 
   EXTRACT (MONTH FROM ADD_DAYS (:TDATE, 1)) 
<> EXTRACT (MONTH FROM :TDATE) 

Now, we can see, that the first condition checks whether :TDATE is within the past 30 days.

The second condition checks if the day after :TDATE is still in the same month as :TDATE.
That means, this condition checks if :TDATE is the last day of the month.

Summarizing, the function returns 'T' (TRUE) when the provided date is either within the last 30 days or when the provided date is the last day of the month.

Checking the Datetime functions in SAP HANA we find a couple of useful functions that we can use to make this code clearer. 

Step 4: make use of built-in functions

The LAST_DAY function returns the last day of the month for the date it is used on, saving us the manual steps of extracting the month and checking if the next day would be a different month.

   :TDATE BETWEEN ADD_DAYS (:TTODAY, -30) AND :TTODAY 
OR :TDATE = LAST_DAY(:TDATE) 

Step 5: Pick names that help understanding the code

Removing the TTODAY variable is a bit tricky here; and we should remove it in order to make the function easier to understand. 

This variable could, in fact, be used as a flexible reference or anchor date. In this case, it should be renamed to something like reference_date.

But I have the suspicion that it really is always only used with today’s date. This allows us to remove the parameter altogether. 

We also should pick a better name for both the parameter and the function itself and change the return value to the more commonly used integer type as a substitute for a BOOLEAN.

The original comment indicated that the purpose of the function is to check whether data was still in a time window during which it can be modified.

I am not really happy with dataInEditPeriod for the function name but it sure is more obvious than F_KEEP_DATE.
Indicating the object type (‘F‘ for function? ‘SP‘ for stored procedure?) in the object name does not make it easier to understand the code.
If I can access the database via SQL I can easily look up what type any object has.
The remaining “KEEP_DATE” part is problematic as it refers to the intended usage scenario of the function (determining whether data should be kept). This says something about what the function shall be used for instead of what it does. 
My proposed alternative dataInEditPeriod tries to convey what the function does on the semantic level of the data model. Here, we are dealing with data that has a time frame in which it can be changed: “EditPeriod” and the function checks if the data is in this period.

As for the parameter, how about checked_day?

Finally, swap the clumsy RETVALUE for RESULT and we get this:

create FUNCTION "dataInEditPeriod"(checked_day DATE) 
RETURNS result INTEGER
LANGUAGE SQLSCRIPT   
SQL SECURITY INVOKER AS
BEGIN
    result := 0;
    IF     (:checked_day BETWEEN ADD_DAYS (current_date, -30) 
                                 AND current_date) 
       OR  (:checked_day = LAST_DAY(:checked_day))
    THEN 
        result := 1;
    END IF;
END;

Step 6: review how the function gets used

The calling code would then read like this:

begin
    if "dataInEditPeriod" (:CC_DATE) = 1
    then
        ...sql statements...
    else 
        ...sql statements...;
    end if;
end;

That is not perfect, but I’d say it is heaps better to understand than your starting position.

IF EXISTS would exist

A recurring question of folks using SAP HANA is how to only run a DROP command for objects that actually exist in the database. One could argue that the effect of executing DROP does not depend on whether the object to be dropped exists, because after running the command it will not be there either way, but there is a bit more to it.

See, DROP commands, like say DROP TABLE, report back an error message when the object cannot be found (e.g. because it does not exist).

drop table bla;

Could not execute 'drop table bla'
SAP DBTech JDBC: [259]: invalid table name: BLA: line 1 col 12 (at pos 11)

This can be annoying when running scripts to set up a database structure which is a common part of product installation and update procedures.

On MS SQL Server there exists (ha ha ha – extra point for the flattest pun) an extension to the DROP command (and to other DDL commands): DROP IF EXISTS

SAP HANA up to now (HANA 2 SPS 03) does not provide such a functionality, but it is easy to come up with a workaround.

The following utility function can be used:

drop function obj_exists ;

create function obj_exists (in schema_name NVARCHAR(256)
                          , in object_name NVARCHAR(256))
returns obj_exists int
as 
begin
declare _exists int := 0;    
    
      select case 
                when count(*) >0 then 1
                else 0
             end into _exists
       from objects
       where 
           schema_name = IFNULL (:schema_name, current_schema)
       and object_name = IFNULL (:object_name, '');

    obj_exists = :_exists;

end;

The function returns 1 if the object exists and 0 if it does not. If the input parameters contain NULL, then the function still works using the CURRENT schema but will likely return 0 as output.

select 
    obj_exists (schema_name => NULL 
              , object_name => 'OBJ_EXISTS') as OBJ_EXISTS
from dummy;

OBJ_EXISTS
1         

The above worked because the OBJ_EXISTS function was created in the current schema. With that one can easily write a SQL Script that “looks before it fires“.

How to know “all these things”

There is one predominant reason for why databases and database technology get a spot in the conversation of “the business”: performance.
Understanding system performance and coming up with ideas for improving it typically requires a form of end-to-end understanding with different levels of detail insight on how a query from a client application is processed by SAP HANA. Before long, one looks into “explain plans“, “execution paths”, SQL, PlanViz diagrams, data distributions, table definitions, stack traces and even NUMA-process management on Linux/Hypervisor-level. That’s a lot of concepts and layered technology and most of those concepts come with a development history of at least 30 years if not more.
This means that prior knowledge is not only implied but required to get a proper understanding of what concepts are put to work in your specific setup. Intuition about how the system works is not your friend here.

But where do you learn about all these things? The product documentation? A university SQL 101 class? Blog posts from strangers on the Internet? Books? Or a mooc with a certificate?

Personally, I’m always happy to find good collections of material, maybe even curated into topics. One of those collections is The Red Book (Readings in Database Systems), which covers all important topics of Database Management Systems. It even features a chapter “3. Techniques Everyone Should Know” in case you really only want to have a quick overview.

I read about this awesome (and free) online book in a blog that I can only recommend: The morning paper by Adrian Coyler. He linked to the red book in his post (Database) Techiques Everyone Should Know.
There are also many more recent posts about database related papers like Learned Index Structures or the design and implementation of modern column-oriented database systems which explain current developments in DBMS technology that will probably end up as a feature in one of the big DBMS systems. The great thing about Adrian’s blog is that it is a kind of Cliff’s Notes for computer science papers. One does not have to actually read the paper to get the core ideas; but Adrian always makes the paper available to read for oneself.
While I mention this blog, I like to say “Thank you!” to Mr. Coyler whom I never met, but I certainly benefitted a lot from reading his outstanding blog.

A long comment

Last Monday I read DJ Adam’s blog post about GraphQL and how this tool might be useful to be added to the toolbox of SAP UI5 developers.
There is a lively discussion over the blog’s topics going on and most of it was around the GraphQL technology and its potential usage in UI5.
I added a comment to this and Mr Adams used this to hook me in for a blog post.
So here we are.

As someone that read through the discussion somewhat uninvolved since I am not doing any front-end development what stroke me was the narrow focus of it.
Why was there no question about how this added tool would likely make a change to the development projects and teams? Reading a bit deeper in the linked blog post from Jeff Handley a lot of the arguments are not about technology, features and implementation details but about how the tool is helping the product development teams to do their jobs better.
To me, it seems like a rather typical SAP technology discussion, where “THE TOOL”, “THE PLATFORM” or “THE SUITE” is the bit that will make the difference between successful developments and the other ones.

Now, while I am a sucker for great tools like the next person, I am in the corner of folks that believe “it’s the photographer that makes the great picture – not the camera“. Technology can help with many problems but if one does not have a clear understanding what problem needs solving simply picking up the lastest-greatest-shiniest tool likely means the wrong tool is used and/or used in a wrong way.

One example I mentioned in my comment hinged on the “declarative” nature of GraphQL. I do not know how GraphQL works specifically but I have some experience with how declarative programming works in many SAP shops.
The main declarative language present in these organisations is SQL. ABAP programs use it, SAP HANA models have it (in some way) and the analytics teams are constantly building “queries”.

My point is that this typically does not work too well.
Invariably I come across data models and queries that show the developers tried to “tell the database what to do“. And that even though the very same developers commonly do neither have a proper understanding of how the database works nor how to understand what it actually does in the first place.
Still, “that index needs to be used!” and “this hint is crucial for the performance“!

I cannot claim to know for sure why this is, but I got some suspicions:

  • often ABAP/SAP developers moved into their roles horizontally from either the “tech support” or from the “business support” team. That means great familiarity with the organisation’s landscape and processes but does not guarantee a firm grasp of the concepts implemented in the technology.
  • education and learning of abstract concepts is not a common activity in SAP development teams as there is no immediate payoff.
  • the official SAP technology documentation and training is focussed on functions and features and actively avoids concepts and broader ideas.
  • training content often focusses on simplistic examples leaving out real-world complex scenarios

This is not to say that it is the development team’s fault – but a lot of the IT technology that SAP has embraced in recent years requires a comprehension of the underlying concepts. Most of those concepts tend to be more involved than computation “mechanics” and cannot be intuitively understood correctly.
“Learning by example”, which I would bet is the most prevalent learning style in SAP environments, only goes so far if one is trying to get a hold on new non-trivial ideas.

Coming back to the GraphQL discussion to me this looks like another tool that will be “used the wrong way without error messages” if at all. Will the majority of development teams be able to realize the same benefits as the Concur team? I doubt that.
Will the choice of tool make a difference to the outcome for the organisation in a way that makes a difference to the organisation? Probably not.

If one squints now, it is pretty obvious that this is the “full-stack developer is dead” argument from some years ago.

So why would I be putting this into the ring here? Because it holds true so well – especially with SAP tech environments.
We (SAP) have tried to hammer in the idea of “push down” and “processing where the data sits” for years now, still most teams I encounter think “business logic” == “ABAP/JavaScript” and subsequentially get surprised when the DB access does not deliver the expected speed ); after all it’s the new tool, HANA, right?
Understand that this is just one example of the issue here.
It could have also been ABAP OO, SOAP architecture, or Virtual Data Models. Common with all these tools is that they are going to be used by the very same people that used to do “things the proven way” for many good years.
Common also is that the creators of those tools usually have a very different view and probably assume too much conceptual overlap with their audience. If you ever listened to one of the gurus of “Graph DBs”, “Docker” or “Haskell” you probably know that feeling of conceptual dissonance from your day to day work.

This gives a lead into another aspect of the discussion around DJ Adams’ blog post: the argument that it would “help to get more developers to SAP platforms” if GraphQL (or any other currently popular tool) would be available.
When I read claims like that I wonder if there are any organisations that use IT as part of their established infrastructure and that then go and swap out major parts of it, because some developers like the new parts better.
In my personal experience that never happens. Sure, the new marketing hero gets the reporting frontend s/he needs to be successful – but that is closely tied to tangible results.
If developers cannot show how the end result will be (very) positively affected in concrete, preferably short terms, then existing tools tend to stay in place.

Change is hard, slow and expensive and the ticket for this journey really must be worth it. And commonly the new lands look a lot nicer from a good distance.

Finding answers on workdays

Cunningham’s Law is the idea that “the best way to get the right answer on the internet is not to ask a question; it’s to post the wrong answer.“.

More often than not I feel that some of the answers provided to questions on the SAP Community platform follow this approach, probably being unaware that there is a name for this strategy.

XKCD Duty calls

Question “SAP HANA Exclude weekends while calculating diff between two dates.” is such a question.

“I have requirement to find out the Number of hour between two dates, but we have to exclude the weekends (Saturday & Sunday).

Example : DATE1 is 19-July and DATE2 is July-26 – Actual diff is – 168 Hours.

Required Output is (168 – 48 Hours (Saturday & Sunday) – 120 Hours.

This I want to achieve using Graphical or Script Cal view, please help me to solve the issue”

The OP did not bother googling for solutions for this very common problem nor does (s)he seem to have access to classic SQL development literature, like “Jeo Celko’s Thinking in Sets“.

This is one of those “throw the requirement over the fence” question and really should have been moderated.
However, as so often, an eager answerer offered his help and provided the following code to solve the problem:

DO
BEGIN
	declare v_i integer;
	declare v_count integer;
sel1 = select
			&amp;amp;quot;0CALDAY&amp;amp;quot; as &amp;amp;quot;CALDAY&amp;amp;quot;,
 	 		&amp;amp;quot;Yesterday&amp;amp;quot; as &amp;amp;quot;YESTERSDAY&amp;amp;quot;,
	 		 sum(&amp;amp;quot;DATE_DIFF&amp;amp;quot;) AS &amp;amp;quot;DATE_DIFF&amp;amp;quot; ,
	 		 sum((select 0 from dummy)) as &amp;amp;quot;HOUR_DIFF&amp;amp;quot;,
	 		 sum((select 0 from dummy)) as &amp;amp;quot;WRK_HOUR_DIFF&amp;amp;quot;
	 		 from  &amp;amp;quot;_SYS_BIC&amp;amp;quot;.&amp;amp;quot;ZTABLE&amp;amp;quot;
--	 		 where &amp;amp;quot;0CALDAY&amp;amp;quot; &amp;amp;amp;amp;gt;= '20180701'
			 GROUP BY &amp;amp;quot;0CALDAY&amp;amp;quot;, &amp;amp;quot;Yesterday&amp;amp;quot;;

select count(*) into v_count from :sel1;

for v_i in 1..v_count do
	WHILE to_date(:sel1.CALDAY[v_i]) &amp;amp;amp;amp;lt; to_date(:sel1.YESTERSDAY[v_i]) do
		if( weekday(to_date(:sel1.CALDAY[v_i])) ) &amp;amp;amp;amp;lt; 5
		then
			sel1.WRK_HOUR_DIFF[v_i] = :sel1.WRK_HOUR_DIFF[v_i] 	+ 24 ;
			sel1.HOUR_DIFF[v_i] 	= :sel1.HOUR_DIFF[v_i] 		+ 24  ;
		else
			sel1.HOUR_DIFF[v_i] 	= :sel1.HOUR_DIFF[v_i] 		+ 24  ;
		end if;
		sel1.CALDAY[v_i] = ADD_DAYS(to_date(:sel1.CALDAY[v_i]), 1) ;

	end while ;
end for;

select * from :sel1;

END;

Reading through this approach several problems stick out:

  • this really shouldn’t require SQLScript at all and definitively not a loop
  • using ‘SELECT 0 from DUMMY’ to generate a constant value of integer 0 is plain wrong
  • summing constant 0 over a grouping set is plain wrong as well – the result is a constant 0
  • removing the aggregation allows for removing the grouping altogether: a SELECT DISTINCT would do the trick here
  • the number of records in sel1 depends solely on the number of different values in ZTABLE.CALDAY a table that has not been defined
  • this number of different CALDAY values is used to loop and add 24 hours to the sel1 table variable HOUR_DIFF and WRK_HOUR_DIFF to keep a tally of hours between dates, which could have been done with HANA SQL functions directly

This looks very much like the kind of SQL code developers tend to write that are not “at home” with SQL and rather try pushing an imperative programming style into their SQL code.

This code is inefficient, slow and hard to maintain.
In short: don’t do it like this.

Alternative solutions

The first approach for this type of common problem really should be to not trying to develop the solution yourself, but instead to look for existing standard solutions.
The mentioned book uses so-called auxiliary tables for this (see chapter 9.5 Calendar Tables) and so does SAP NetWeaver and also the freestyle solution presented in “Want to Calculate WORKDAYS_BETWEEN without TFACS Table?” .

The core of this solution approach is to maintain a table of all dates that indicates whether or not a given date is a weekend or not. Such a table could even hold the information if a given date has been a working day or a public holiday.

Looking at the requirement of “excluding weekends” from counting hours being able to discern workdays from holidays seem like a close-by requirement.

Let’s stick to the case of simply excluding the weekends which the OP defined as Saturday and Sunday.
When working on such a requirement it is particularly important to consider the calendar context in which the solution should work.
Certain calendars (e.g. Isreali calendar) have Friday and Saturday as their weekends.
When the underlying requirement actually is “hours of working days” this might become more complicated with different calendars. Consider, for example, that daylight saving time shifts effectively add/subtract hours – commonly over weekends, but sometimes also on working days.
The OP hasn’t mentioned anything beyond the description of a mechanical calculation problem, so we cannot know what the underlying requirement really was.

Anyway, a quick solution for the simple counting of hours on days other than Saturday and Sunday would be to use the SAP HANA built-in table “_SYS_BI”.”M_TIME_DIMENSION”.
This table provides dates in different formats such as DATE_SQL as well a day of the week number DAY_OF_THE_WEEK_INT (Monday =0, Sunday=6).

With data generated in the table, the original question can be answered as “number of non-weekend days between the given days multiplied by 24 hours

    select count(*) as no_of_workdays,
           count(*) * 24 as no_of_hours
    from (
            select
                date_sql,
                day_of_week_int
            from &amp;amp;quot;_SYS_BI&amp;amp;quot;.&amp;amp;quot;M_TIME_DIMENSION&amp;amp;quot;
            where day_of_week not in (5, 6)
            and date_sql between to_date('19.07.2018', 'DD.MM.YYYY')
                             and to_date('26.07.2018', 'DD.MM.YYYY')
        )
    ;

    NO_OF_WORKDAYS	NO_OF_HOURS
    6             	144

Note how this differs from the OPs calculation by one day.
Checking the inner query gives us:

    DATE_SQL  	DAY_OF_WEEK_INT
    19/07/2018	3
    20/07/2018	4
    23/07/2018	0
    24/07/2018	1
    25/07/2018	2
    26/07/2018	3

Which means that the OP did miscalculate the hours in the example, by not including the last day of the given range.
This points to another common trap when dealing with dates and intervals: one needs to be clear about whether interval end dates should be included or excluded.

The same approach can be taken with graphical calculation views:

Example calculation view with filter expression
SELECT
    count(distinct &amp;amp;quot;DATE_SQL&amp;amp;quot; ) as no_of_workdays,
    count(distinct &amp;amp;quot;DATE_SQL&amp;amp;quot; )  * 24 as no_of_hours
FROM &amp;amp;quot;_SYS_BIC&amp;amp;quot;.&amp;amp;quot;sandpit/EXCLUDE_DATES&amp;amp;quot;;

/*
NO_OF_HOURS
144
*/

The broader view

I mentioned above that this common problem can be broadened to “separate working days from non-working days”.
The SAP business applications have been dealing with such problems for many decades now and so it’s not surprising that there is indeed a standard solution available for this.
This standard solution comes in the form of “FACTORY CALENDARS”.
SAP Netweaver table TFACS holds one row for each year and each calendar.
Every row contains twelve column with a “bitfield” represented by ‘1’ and ‘0’ characters for each day of the month.
A ‘1’ indicates a working day and a ‘0’ denotes a non-working day.

There are some other columns available, but for our purpose, this is what we need to know.

IDENT JAHR MON01 MON02 ... MON12
AU 2018 0111100111110011111001111000111 1100111110011111001111100111 ... 0011111001111100111110010010110

Typically this table is re-used or replicated from the SAP NetWeaver environment so that all date calculations are using the same calendar definitions.
If you want to use the table without an SAP NetWeaver system you could use the following commands to create it yourself:

create row table TFACS
( IDENT NVARCHAR(2) not null -- factory calendar
, JAHR NVARCHAR(4) not null -- year stored
, MON01 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON02 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON03 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON04 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON05 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON06 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON07 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON08 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON09 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON10 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON11 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, MON12 NVARCHAR(31) not null -- Calendar: days in a month, 1 for working day, 0 for non-working day
, BASIS NVARCHAR(5) not null -- start of factory day numbering
, FENUM NVARCHAR(3) not null -- number of days in year
, WENUM NVARCHAR(3) not null -- number of workdays in year
, LOAD NVARCHAR(1) not null -- load year in bufer flag ('X' = YES)
, primary key (IDENT, JAHR));

As an example I entered one row with the working days of Australia/Victoria in 2018:

insert into tfacs values (
'AU', '2018'
-- 1 2 3
-- 1234567890123456789012345678901
, '0111100111110011111001111000111' -- january
, '1100111110011111001111100111' -- february
, '1100111110011111001111100111100' -- march
, '001111100111110011111001011001' -- april
, '1111001111100111110011111001111' -- may
, '100111110001111001111100111110' -- june
, '0111110011111001111100111110011' -- july
, '1110011111001111100111110011111' -- august
, '001111100111110011111001111100' -- september
, '011110011111001111100111110011' -- october
, '11001111100111110011111001111' -- november
, '0011111001111100111110010010110' -- december
, ' '
, ' '
, ' '
, ' ');

With this table in place, SAP HANA provides SQL function WORKDAYS_BETWEEN (see also the SAP HANA Academy video), that returns the desired number of working days directly:

SELECT WORKDAYS_BETWEEN('AU', to_date('19.07.2018', 'DD.MM.YYYY')
, to_date('26.07.2018', 'DD.MM.YYYY')) &amp;amp;quot;no of working days&amp;amp;quot;
FROM dummy;

/*
no of working days
5
*/

Note that here, the “between” condition is understood as “workdays computed starting on the 19th and finishing on, but not including, the 26th” (see documentation on this) which seem to be the understanding of the OP.

The benefits of using calendar tables over pure function approaches are clear:

  • it’s easy to understand which days are considered working days and which days are not
  • multiple calendars can be maintained and used in parallel (different years, countries, union agreements, operating shifts, etc.)
  • the SQL/application code does not need to be changed when the calendars need updating

There you go, now you know.

Trouble with time?

There are a couple of topics that are an ongoing source of confusion even for experienced IT professionals and developers.
UNICODE handling is one example (see here for info), floating point calculations is another one (this is a good primer) and then there is everything to do with time.

Dealing with dates, time, time zones and the like is confusing and counter-intuitive in itself; adding computers to the mix doesn’t make this any better.

This post is going to shine some light on what goes into date/time handling in SAP HANA and tries to answer some questions that I came across recently.

The big picture

The SAP HANA DBMS obviously doesn’t come with its own timekeeper (though I think it would make a nice gift for HANA DBAs) but uses date/time information provided by the Linux operating system whenever it needs to know the current time.

Linux gets its time from different possible sources, starting with the hardware clock (aka BIOS clock) and, if configured, the ntpd(aemon).
The man-pages on this (hwclock, date, tz, adjtimex, /proc/timer_list) are an interesting read.

A super-short version of the story here is: When the computer is started, the hardware clock sets the current time kept by the battery and the BIOS. This time does not have a time zone assigned to it – it is open to interpretation.
At some point, Linux starts up, reads the hardware clock and sets the current system time and date considering the time zone that was declared for this computer when Linux was installed (yes, one can change this later on).
From here on the hardware clock is out of the picture. All system requests concerning current date/time are served from the Linux managed clock(s) – the System time – and likely continuously updated via the ntp-daemon.
It is possible to check the current hardware time (output in ISO 8061 format):

> sudo hwclock --utc
  2018-03-26 17:57:36.608111+1100

The +1100 is the time zone offset that the system time uses/assumes. The hardware clock does not ‘know’ what timezone it is in, so specifying either –utc or –localtime and checking which results in the correct local time is crucial here. Once again, the man-page for hwclock has the whole story.

Comparing this to the output of  date:

> date
  Mon Mar 26 11:05:37 AEDT 2018

we learn the system time is set to use the Australian Eastern time zone.
Another way to check which timezone has been set on system level is to check where the file-system link /etc/localtime points to:

> ls -l /etc/localtime
  lrwxrwxrwx 1 root root 39 Mar 12 18:48 /etc/localtime -&gt; /usr/share/zoneinfo/Australia/Melbourne

The /usr/share/zoneinfo folder is the system timezone database directory.
All time zones, including Daylight Saving Time (DST), that the Linux system knows about, can be found here. Since both time zone definitions, as well as DST rules, change regularly (see Time Zone News), these files are commonly updated during Linux updates.
We will come back to the topic of Daylight Saving Time later on – it is yet another can of worms.

The time of the day

Here we are, with Linux booted up and ready to go. We can log in now and ask for the current time and date (as seen above):

> date
  Mon Mar 26 18:17:07 AEDT 2018

But what if my user should use a different time zone?
To do that, the Linux shell provides a setting in form of an environment variable named TZ.
Setting the variable to one of the time zones in the timezone database directory allows programs to return the date in the local time of choice:

> export TZ=UTC+0
> echo $TZ
  UTC+0
> date
  Mon Mar 26 07:18:00 UTC 2018

Any program started from a shell with this setting will now work with the time zone set to UTC.

Becoming a profiler

If this setting should be made permanent (just for the current user) the environment variable has to be set in the shell profile of the user.
The shell profile is an auto-start script that contains the commands the shell should execute every time it gets started.
For the common Bourne shell this script file is called .profile.
As the name starts with a dot, it is a so-called hidden file, but using ls -a in the user’s home directory lists it nevertheless:

> cd ~ # change into the user's home directory
> ls -la # list all files, including the 'hidden' files

 drwx------ 1 root      root     730 Mar 26 05:53 .
 drwxr-xr-x 1 root      root     200 Jan 31 00:51 ..
 [...]
 drwx------ 1 root      root      16 Jun 13  2017 .hdb
 -rw------- 1 root      root     146 Mar 26 07:15 .lesshst
 drwx------ 1 root      root      10 Jun  6  2017 .local
 drwx------ 1 root      root      34 Sep 11  2017 .mozilla
 -rw-r--r-- 1 root      root      80 Jan  3  2017 .profile   &lt;&lt; this one
[...]

If the line export TZ=UTC+0 is added to the .profile-file then the new time zone will be set for the user every time the shell is used.

Now, for Linux based SAP systems things are a tad more … flexible. And with “flexible” of course, I mean complicated.
Putting all settings into just a single file would make managing system- or organisation-wide settings rather tedious.
Maybe some setting should be user-specific, while others shall apply to all users of a system and so on.
For that, there is the option to call other .profile scripts.

Cutting to the chase, for the SAP HANA operating system user <sid>adm we’re looking at the following profile scripts:

 .profile  - standard autostart-file for Bourne shell 
 .login    - C-shell version
 
 .bashrc   - init script for the BOURNE and KORN Shell
 .cshrc    - standard profile for the C-shell (same content, but slightly different syntax)

 .sapsrc.sh  - looks for .sapenv scripts and executes them
 .sapsrc.csh - C-shell version    
 
 .sapenv_<hostname>.sh  - SAP system environment setting - specific to the local host
 .sapenv_<hostname>.csh - SAP system environment setting - specific to the local host - C-shell version

 .sapenv.sh  - SAP system environment setting
 .sapenv.csh - C-shell version
 
 .customer.sh  - User specific settings
 .customer.csh - C-shell version

Most of these files are created by the installation program for SAP HANA.
When we log in as <sid>adm the files get executed in the following order:

(1) .profile -> (2) .bashrc -> (3) .sapsrc.sh  -> (4) sapenv_<hostname>.sh -> (5) HDBSettings.sh (for HANA systems only)
                            |              OR  -> (4) sapenv.sh            
                            +> (6) .customer.sh

Looking at this impressive entanglement you might not be too clear about where to put the time zone setting.

The answer, fortunately, is easy, as nearly all the script files come with a warning in a header comment, telling you to not edit it manually.

This leaves the .customer.sh as the target file for all customization including time zone settings and additional paths or environment settings for Smart Data Access (SDA) connections.
Fortunately, somebody put a comment into .profile to make clear that all custom settings will be found in .customer.sh. That way, even the SAP-unaware Linux admin has a fair chance to figure out in which file settings are done.

SAP HANA zones in time

At this point, we can configure the system’s hardware clock, the Linux system clock and the timezone that SAP HANA should use for anything where the “local” timezone is used.

This setting sets the home-timezone for the SAP HANA installation. Important here is to pay attention to the term “installation”.
Since tenant databases become the standard for SAP HANA, it has also become possible to use “strong isolation” mode, separating SAP HANA processes of different tenant databases on Linux level. Each tenant database can run in its own Linux user security context.
It is easy now to assume that, since the databases run in their own Linux user accounts, they could each have their own environment with separate timezone settings.
This is, however, not the case, since all tenant databases of an SAP HANA installation share the same nameserver (hdbnameserver) process. And it is the environment of this nameserver process that is shared by all tenant databases.

Therefore, all databases of an SAP HANA installation share the same timezone (as well as the same OS setup for external libraries, like 3rd party DBMS clients, etc.).
If one needs SAP HANA databases with different time zones on the same system, a separate installation is necessary. (state as of SAP HANA 2 SPS 02)

What’s it all used for – SAP system timezone handling

All of the mechanisms described so far are in place to provide Linux programs access to a current time that makes sense to the user of the system.

For shared programs like SAP HANA, it is debatable what the current local time should be since users can access the system from around the globe out of completely different time zones.
Whatever time zone you decide upon, often it is simply the local timezone of where the data center is actually located, where most of the users are or the UTC timezone, you should probably pick a Standard timezone since the time changes for summer and winter time can lead to confusion with timestamp gaps and duplication, which can be especially confusing when trying to read log files.
Also, in an SAP system landscape, say with SAP NetWeaver systems based on SAP HANA, the NetWeaver application servers and SAP HANA have to have the same time, including the timezone.

Up to here, all we’ve looked at was telling the local current time. In SAP HANA one can use functions like now(), current_time(), current_date(), etc. to get the current SAP HANA system time. In many cases, there is also a UTC counterpart for these functions, e.g. current_utctime() if your application should rather work with the UTC timestamps.

SAP HANA also provides conversion functions, utctolocal()/localtoutc(), to convert between local time and UTC date/timestamp information.

Note that the documentation strongly discourages to store data in local time zone. SAP HANA date/timestamp data types don’t store the used timezone together with the data, so the application reading the data would need to keep track of that.
Instead, data should be stored as UTC and, if local date information is required, the conversion functions should be used.

Let’s stop at this point and take a look at what goes into this conversion.
I mentioned above that time zones and the rules for Daylight Saving Time (DST) are in constant change.
How can SAP HANA know what the currently correct mapping is?

In You got the time? I explained that SAP HANA can use built-in defaults or use a special set of tables that contain the current mapping information.

These tables and more information can be found in SAP note 1791342 – Time Zone Support in HANA.

To make sure the timezone data had been loaded without problems, the view M_TIMEZONE_ALERTS can be used.
If SAP HANA is started without the timezone tables loaded, this view returns:

HOST        PORT    STATUS              TIMEZONE_NAME   MISMATCH_BEGIN  MISMATCH_END   
skullbox    31203   TABLES NOT FOUND    ?               ?               ?              

DETAILS                                                                                                                   
The following timezone tables could not be found or are not readable: SYSTEM.TTZD, SYSTEM.TTZDF, SYSTEM.TTZDV, SYSTEM.TTZZ

HOST        PORT    STATUS              TIMEZONE_NAME   MISMATCH_BEGIN  MISMATCH_END   
skullbox    31204   TABLES NOT FOUND    ?               ?               ?

DETAILS                                                                                                                   
The following timezone tables could not be found or are not readable: SYSTEM.TTZD, SYSTEM.TTZDF, SYSTEM.TTZDV, SYSTEM.TTZZ

Per default, SAP HANA looks for these tables in the SYSTEM schema and expects the entries to be relevant for client 001.
If you want to use a different set of tables, you can adapt these configuration parameters

= Configuration
  Name                                  | Default    | System
  indexserver.ini                       |            |
    global                              |            |
      timezone_dataset                  | sap        |
      timezone_default_data_client_name | 001        |
      timezone_default_data_schema_name | SYSTEM     |

Important to note here is that there is only one set of conversions possible per SAP HANA database, even if you have multiple SAP clients in the database.
Likewise important: these tables need to be loaded into every tenant database and the SystemDB.
The tenant DBs cannot use the tables from the SystemDB or from another tenant DB.

Yet another caveat with the SAP HANA timezone conversion functions is that they don’t take a reference date, but always work based on the currently available conversion rules. If your application requires historical timezone conversions, then this needs to be implemented separately.

In order to check what time zones are currently known to SAP HANA the system view TIMEZONES  is available:

select * from timezones;

TIMEZONE_NAME     	TIMEZONE_DATASET
[...]
AUSNSW            	sap             
ALAW              	sap             
UTC               	sap             
AUSSA             	sap             
PST_NA            	sap             
BRZLWE            	sap             
AUSNT             	sap             
UTC-7             	sap             
Africa/Abidjan    	platform        
Africa/Accra      	platform        
Africa/Addis_Ababa	platform        
Africa/Asmera     	platform        
[...]

Time zone conversion and Daylight Saving Time (DST)

At a glimpse, this whole topic of timezone conversions can look very simple. After all, it is simply adding or subtracting some hours between two time-zones, isn’t it?

To make the life of time zone converters more interesting, Daylight Saving Times have been introduced in some countries for some time. When referring to “local time” this usually includes any DST rules, while the term “Standard Time” indicates that no DST rules are applied.
The major complication with DST is not so much that the number of hours to be added or substracted changes based on the (local) day of the year, but that the mapping between timestamps with and without DST is not contiguous.

Often we think of clock-timestamp values as continuously increasing values with no gaps. It’s always one second, and another and another second.

This leads to implicit assumptions like:

  • every possible timestamp between 00:00:00 and 23:59:59 will occur.
  • every timestamp will occur only once.

Both of those assumptions are false during the days when time changes from summer to winter time or back again.
When changing from winter to summer time, the clocks are set back from e.g. 03:00:00 to 02:00:00.
Every second between 02:00:00 and 02:59:00 will happen twice this night.

One the change back, summer to winter time, clocks will be set forward, jumping from 01:59:59 directly to 03:00:00, skipping all timestamps between 02:00:00 and 02:59:59.

Here we see, why it is so important to use standard time for storing timestamps (also in filesystems and trace files, etc.): DST makes timestamps ambiguous.

There are more things wrong with the simplified model of timestamps, e.g. the need for leap seconds to adjust for drifts of the defined day length (one full turn of our planet).
“Counting” based timekeeping systems like UNIX epoch time consider leap seconds, leap years, centuries etc. Covering these is beyond this post but reading up on it is highly recommended.

More problems on a stack

Knowing all this about timezones and conversions between time zones with and without DST, we should be able to use timestamp data correctly in our programs with SAP HANA.
And yet, questions like “Problem with “summer” time (DST)” come up and even I wasn’t able to immediately pinpoint the cause of the problem.

At face value, the OP found what looked like a bug in SAP HANA’s JDBC driver.
When querying the database via JDBC certain timestamps did not seem to get returned correctly, even though the SAP HANA system timezone was set to UTC:

java -jar ngdbc.jar -u ... 
     -c "select to_seconddate('2017-03-26 01:00:00') as \"2017-03-26 01:00:00\"
       , to_seconddate('2017-03-26 02:00:00') as \"2017-03-26 02:00:00\"
       , to_seconddate('2017-10-29 01:00:00') as \"2017-10-29 01:00:00\" 
         from dummy"

Connected.

| 2017-03-26 01:00:00 | 2017-03-26 02:00:00 | 2017-10-29 01:00:00 |
-------------------------------------------------------------------
| 2017-03-26 00:00:00 | 2017-03-26 01:00:00 | 2017-10-29 02:00:00 |
             ^^                    ^^                    ^^

When running the same query with hdbsql and thus avoiding JDBC  the result is very different:

hdbsql -U ...  -A 
  "select to_seconddate('2017-03-26 01:00:00') as \"2017-03-26 01:00:00\"
   , to_seconddate('2017-03-26 02:00:00') as \"2017-03-26 02:00:00\"
   , to_seconddate('2017-10-29 01:00:00') as \"2017-10-29 01:00:00\" 
     from dummy"

| 2017-03-26 01:00:0  | 2017-03-26 02:00:0  | 2017-10-29 01:00:0  |
| ------------------- | ------------------- | ------------------- |
| 2017-03-26 01:00:00 | 2017-03-26 02:00:00 | 2017-10-29 01:00:00 |

What is going on here?

The answer lies with JAVA, more precisely the way how JDBC handles timestamp data (see also here)
When fetching timestamp data from a database that does not store the timezone with the timestamp data, the JDBC method getTimestamp (int columnIndex, Calendar cal) gets used.

The method returns a java.sql.Timestamp object which in turn is based on java.util.Date. Java handles dates/timestamps as moments (instants) in time that can be expressed in calendar fields (year, month, day of month, etc.). This requires defining the calendar that should be used here.
In our example, without specifying any specific calendar, the JVM uses its default timezone.
If the JVM property user.timezone is not set, then the JVM will try to find the current operating system users timezone. If the JVM runs on Linux, that would be the value of TZ again, on MS Windows it would be the current timezone setting for the Windows user that runs the JVM.

What JDBC is doing basically means: if the JAVA program does not explicitly set the Calendar to be used, then the setting of the client OS is used. A prominent example for this is SAP HANA Studio.

To avoid this implicit, nearly hidden conversion, one can set the JVM parameter -Duser.timezone=UTC in the hdbstudio.ini file.

For the interested: JAVA comes with its own complete implementation of a timezone database and the conversion data can be – just as in SAP HANA – managed independently from the operating system. (see here, here, here, here, here, here)

Time to end

And here we are now. Two full stacks of timekeeping, time zone management and conversions with and without DST.

What do I take home from this?

To me, the JDBC behaviour was the most surprising effect, so I probably try to make sure that I set the timezone to UTC whenever I want to avoid automatic conversion (e.g. server-side data access like SDI adapters or development/DBA tools like SAP HANA Studio).
For the rest, I think a good rule of thumb is to keep all timezone and time settings the same, especially across SAP systems.

Finally for applications and analytic scenarios that rely heavily on timezone conversion, making sure to fully understand all involved conversion mechanisms is really important.

There you go, now you know!

Cheers,
Lars

How to give what you don’t have

The development tools and techniques for SAP HANA have been extended and enhanced a couple of times since SAP HANA had first been released in 2011.
For apps developed for SAP HANA XS, the new XSA environment means a considerable change in the way application code accesses the database.

In his post “Synonyms in HANA XS Advanced, Accessing Objects in an External SchemaChristoph Gilde explains in detailed steps how XSA applications can get access to the database schema of XS classic applications. This might not be the loose coupling aimed for with a microservices architecture, but it is an efficient way of building new functionality in XSA while retaining an existing XS classic code base.

Out with the old, in with the new? Not so fast!

One example, where this is being done is the SAP Health application Medical Research Insights 2. Starting with FP 5, several new features have been added as XSA apps, while keeping the base application and the underlying Connected Health Platform as XS classic applications.

Christoph’s post covers all necessary steps to enable such a cross-application data access but relies on a prerequisite that can be hard to fulfil.
The very first step in the tutorial is to create two roles in the XS classic schema that group all required privileges to the XS classic database objects. One of these roles has to include the privileges with the GRANT OPTION that allows granting those privileges further to other users or roles.
It is fairly obvious that this is an extremely wide-ranging privilege that can make controlling who has access to which data and who manages this access very difficult. For this reason, the XS classic repository roles don’t allow the use of the GRANT OPTION.

But Christoph post does not mention anything about repository roles, so what is the problem here, you might ask.

What one can do and what one cannot do

One point is, that having to create roles manually via SQL makes the handling of the application life-cycle more complex. Ideally one wants to have an XS classic application fully contained in one or more DUs that can be transported (installed, imported) together. The less manual installation activities required, the better.

The more important point, though, is that only the owner of an object or a user with the GRANT OPTION privilege can go on and grant other users privileges with the GRANT OPTION.

In case of an application that uses a .hdbschema repository object to define the application schema as recommended, the owner of that schema is the _SYS_REPO user.
This special SAP HANA system user cannot be used to login to the database, so even if one would settle for a manual installation step, there technically is no way to run a GRANT … WITH GRANT OPTION as the _SYS_REPO user.

A way out

So, how can we make _SYS_REPO grant privileges to a user with the GRANT OPTION?
Meet “SQLScript procedure with DEFINER RIGHTS“!
This setting allows executing the procedure as the user that created it.

For repository procedures, this “creator” is always _SYS_REPO.
By now you probably see where this is going.

To have _SYS_REPO grant with GRANT OPTION, we create following repository objects:

  1. external_access_g.hdbrole
    (see Christoph’s post for the two types of roles for external access)
    This one is just an empty role file.
    Don’t get confused by the ‘pred’ prefix in the package name, this is just the name I used for the demo package and without functional relevance

     

    role pred.roles::pred_external_access_g
    {}
    
  2. external_access.hdbrole
    This one contains all access privileges without the GRANT OPTION.
    For the sake of this example, I chose SELECT on the schema.

     

    role pred.roles::pred_external_access
    {
    catalog schema "PRED": SELECT;
    }
    
  3. grantSchemaAccess.hdbprocedure
    PROCEDURE "PRED"."pred.roles::grantSchemaAccess" ( )
    LANGUAGE SQLSCRIPT
    SQL SECURITY DEFINER
    DEFAULT SCHEMA PRED
    AS
    BEGIN
    exec 'GRANT SELECT on schema PRED to "pred.roles::pred_external_access_g" WITH GRANT OPTION';
    END
    

    Important to note here:

    • SQL SECURITY DEFINER – this really is core to this approach. Leave it on the default INVOKER and it won’t work.
    • no READS SQL DATA – we use Dynamic SQL in this procedure which counts as READ/WRITE access.
    • the GRANT statement is executed via ‘exec’.
      If you try to put the GRANT in the procedure body directly, you will get the following error
      message:Error while activating /pred/roles/grantSchemaAccess.hdbprocedure:[pred.roles:grantSchemaAccess.hdbprocedure] Syntax error in procedure object: feature not supported;
      ddl statement 'GRANT SELECT on schema PRED to "pred.roles::pred_external_access_g" WITH GRANT OPTION' is not supported in procedure/function: line 11 col 5 (at pos 288)
      

    (BTW: To whoever decided that it was a good thing to disable selecting text in the console area of the Web-Based Development Workbench Editor: it’s not!
    Being able to copy error message output is rather important and having to manually change the ‘user-select’ option for the CSS style is rather cumbersome )

  4. Another role that covers the privilege to run the procedure
    role pred.roles::installUser
    {
    sql object pred.roles::grantSchemaAccess: //Objecttype: PROCEDURE
    EXECUTE;
    }
    

With these objects in place, the setup procedure for granting privileges looks like this:

  1. Import the DU with the objects created above into the target system.
  2. Grant the installUser role to the user that performs the installation and setup activities (e.g. INSTALLUSER).
  3. As the INSTALLUSER execute the procedure:
    call "PRED"."pred.roles::grantSchemaAccess"();
    
  4. Check that the assignment was successful:
    select *
    from "GRANTED_PRIVILEGES"
    where
    grantee like 'pred_riles::external_access%'
    
    GRANTEE                             GRANTEE_TYPE GRANTOR   OBJECT_TYPE  SCHEMA_NAME  ...  PRIVILEGE IS_GRANTABLE IS_VALID
    pred.roles::pred_external_access    ROLE         _SYS_REPO SCHEMA       PRED         ...  SELECT    FALSE        TRUE
    pred.roles::pred_external_access_g  ROLE         _SYS_REPO SCHEMA       PRED         ...  SELECT    TRUE         TRUE
    
  5. Revoke the role to execute the procedure from the INSTALLUSER again.
    As this is a setup-time-only activity, the role and the privilege should not be granted to anyone once the setup is done.
    To see who the role has been granted to run the following SQL statements:

     

    select *
    from "GRANTED_PRIVILEGES"
    where object_type = 'PROCEDURE'
    and object_name = 'pred.roles::grantSchemaAccess';
    

    –> only the procedure pred.roles::installUser should be returned here.

    select *
    from "GRANTED_ROLES"
    where
    role_name = 'pred.roles::installUser';
    

    –> only _SYS_REPO should be returned here.

Once these steps are successfully finished, the remaining procedure is the same as described in Christoph’s blog post, continuing at step 2.

Due to the critical nature of granting WITH GRANT OPTION one might also consider to set up auditing the roles and the procedure so that any use of them gets logged.

There you go, now you know!

Cheers,
Lars

Maps and Paths

A brief twitter exchange triggered some thoughts about expectations and how those determine whether something is a good A, a passable B or a terrible C.

Paths

Turn by turn navigation in a mobile app
Turn by turn navigation in a mobile app

When you want to go somewhere and you do not know the way, you need directions. Somebody or something needs to show you the path to follow. That’s what turn by turn navigation on your mobile does. Following this path does not mean you got to know the area. You know one just one way to get from A to B know.

Maps

London Underground map from 1908
London Underground map from 1908

A map, on the other hand, provides an overview, context and alternatives but no immediate path.
Using a map means you have to actively engage with the information, see for yourself which roads cannot be taken (maybe you are in a huge truck and the small sideways with wood bridges are not going to work for you) and eventually plot your own path.

Both maps and paths, are related but are far from being the same.

A useful analogy

This analogy works not just for navigating transportation networks.

If your company wants to build an information system to do X then usually it will give the job to someone with a path to get there. It does not need to discuss all the alternative technologies or architectures. When X can be achieved reliably, in-time and on-budget, then that’s usually already a great outcome. Getting such a path (think project) is commodity shopping.

On the other hand, when you want to do new things or old things in a very different way than before, a path is not the answer to that. This is where you need to work with a map. You need different viewpoints and scenarios to walk through. This situation requires you to discover and create a path. You can buy help for this process but you cannot skip the work.

(You might notice, that product or service roadmaps are specifically not maps. They are usually paths with a decreasing level of certainty towards the end. They can provide a heads-up to what might be coming but they don’t show alternatives, contexts or overview.)

To avoid disappointment and frustration it is important to be clear about what you expect.

Are you asking questions in user forums to get a map? Do you want blog posts to give you a path from A to B? Do you want to discuss options and discover/create paths?

For the SAP Community Platform getting answers to such questions will be important to avoid further disappointments.

Is SAP HANA SQL/SQLScript Turing complete?

There was an interesting (aka nerdy) question in the SAP Community Q&A the other day:
Are SQLScript and HANA SQL Turing complete?”

I took a swing at it and the following is my result (a slightly edited version of my original answer).


Alright, let’s do this! (even though I don’t have an idea why it would matter at all…)

So, I’m not a computer scientist, and only a third through “The Annotated Turing” (which seems to be a great book, btw). This means what follows is the layman’s approach on this.

First off, I had to understand, what is meant by “Turing complete” and how to show that a language has that characteristic.

For that, there are a couple of Wikipedia entries and SO discussions available and I leave that for everyone to read (e.g. here and here).
One of those discussions linked to a presentation that claimed to prove that PostgreSQL-SQL with recursive common table expressions (CTS) is Turing complete. In order to prove this, the author of the presentation (here) said, that it’s enough to prove that a language can emulate another language that has already been shown to be Turing complete.
Fair call.
The author’s choice was a cyclic tag system with a specific rule set (rule 110) which apparently has been shown to be Turing complete.
Then the author goes on and implements this cyclic tag system with a recursive common table expression and thereby proves the claim.

Yippie.

So, what does that mean for SAP HANA SQL?
SAP HANA SQL/SQLScript does not support recursive common table expressions (much to the distaste of everyone who tries to handle hierarchies and does not know about SAP HANA’s special hierarchy views and functions (look there) and it also does not support recursive procedure calls.

Bummer, one might think.
Fortunately, every recursion can be expressed as an iteration (compare here), so I thought, let’s try this cyclic tag system in SQLScript.
This is the result (HANA 1, rev.122.15, on my NUC system). The SQL code is also available in my GitHub repo.

do begin
declare prods VARCHAR(4) ARRAY;
declare currProd, initWord, currWord VARC<span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"></span>HAR(300); -- 300 is arbitrary and would be exceeded for more runs
declare currProdNo integer = 0;
declare runs, maxruns bigint = 0;

	initWord :='11001';                   -- the starting/initial 'word'
	maxruns := 100;                       -- a limit to the number of iterations 
                                              -- rule 110 is suspected to run indefinitively
    prods = ARRAY ('010', '000', '1111');     -- the three 'producer rules' stored in a string array

    currWord := :initWord;
    runs := 0;
    -- dummy table var to monitor output
    tmp = select :runs as RUNS, :currProd as CURRPROD, :currWord as CURRWORD 
    from dummy;

    while (:runs &lt; :maxruns) DO
        runs := :runs+1;
        
        currProdNo :=  mod(:runs,3)+1;    -- pick rule no. 1,2 or 3 but never 0
                                          -- as SQLScript arrays are 1 based
        currProd := :prods[:currProdNo];

        if (left (:currWord, 1)='1') then  -- add current producer to the 'word'
            currWord := :currWord || :currProd;
        end if;
        
        currWord := substring (:currWord, 2); -- remove leftmost character

        -- save current state into temp table var
        tmp = select RUNS, CURRPROD, CURRWORD from :tmp
              union all 
              select :runs as RUNS, :currProd as CURRPROD, :currWord as CURRWORD 
              from dummy;

    end while;

    select * from :tmp;                      -- output the table var
end;

Running this gives the following output:

/*
Statement 'do begin declare prods VARCHAR(4) ARRAY; declare currProd, initWord, currWord VARCHAR(300); declare ...' 
successfully executed in 7&lt;span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"&gt;&lt;/span&gt;17 ms 39 µs  (server processing time: 715 ms 590 µs)
Fetched 101 row(s) in 2 ms 517 µs (server processing time: 0 ms 424 µs)

RUNS    CURRPROD    CURRWORD                                                                             
0       ?           11001                                                                                
1       000         1001000                                                                              
2       1111        0010001111                                                                           
3       010         010001111                                                                            
4       000         10001111                                                                             
5       1111        00011111111                                                                          
6       010         0011111111                                                                           
7       000         011111111                                                                            
8       1111        11111111                                                                             
9       010         1111111010                                                                           
10      000         111111010000                                                                         
11      1111        111110100001111                                                                      
12      010         11110100001111010                                                                    
13      000         1110100001111010000                                                                  
14      1111        1101000011110100001111                                                               
15      010         101000011110100001111010                                                             
16      000         01000011110100001111010000                                                           
17      1111        1000011110100001111010000                                                            
18      010         000011110100001111010000010                                                          
19      000         00011110100001111010000010                                                           
20      1111        0011110100001111010000010                                                            
21      010         011110100001111010000010                                                             
22      000         11110100001111010000010                                                              
23      1111        11101000011110100000101111                                                           
24      010         1101000011110100000101111010                                                         
25      000         101000011110100000101111010000                                                       
[...]
*/

That looks suspiciously like the output from the Wikipedia link (above) and does not seem to stop (except for the super-clever maxruns variable ).
With that, I’d say, it’s possible to create a cyclic tag system rule 110 with SQLScript.
The tag system is Turing complete, therefore SQLScript must be Turing complete.

Still, I have no idea why this would matter at all and that’s really all I can say/write about this.

Cheers,
Lars

Not a New Year’s thing

Happy New Year, dear reader!

I hope you had an enjoyable and relaxing time between Christmas and New Years Eve with your thoughts away from work. I can say, I did.

During this time I took the opportunity to finish some thoughts on my engagement with the SAP Community Platform (SCP) where I have been a moderator for the SAP HANA topic/tag for several years.
I have also been a top contributor for most of the time since I joined the platform and its predecessors – my current profile picture there states that it has been 14 years and a bit by now.

During my tenure, the SAP HANA community grew, new volunteers joined the moderator team and many experts and users shared their experiences, insights and questions. Today, this community is the place to go to ask SAP HANA related questions and I take pride in knowing that I helped to kick-start it.
But now, no starting assistance is necessary any longer and I my professional focus has shifted to less HANA centric topics (I work in the SAP Health team) and so I decided to hang up the “moderator’s hat”.

I have worked on a second “project” that initially triggered setting up the Lars Breddemann Blog: all my blog posts covering topics from SAP on Oracle and MaxDB over to SAP Business Warehouse and SAP HANA can now be found right here.
Given their first publication dates, many of them are outdated, but some seem to still enjoy interest from readers.
As the SCP was re-platformed a couple of times, I think it is easier to have them at a steady “home” with this personal blog.

Since there is no automatic way to download all my blogs from SCP and upload them to WordPress, this migration project was a manual copy & paste activity during several long evenings and weekends. The comments and discussions to the blog posts could not be copied. To read those the interested reader still needs to go back to the SCP version of the blog post.

Having to re-read my own blogs from the last ten years was quite the experience. Look the horrid English, behold the dire need for a copy editor and witness in shock and awe the ever repeating phrases!
It was a tough reminder for me to try to be more careful with my texts. I will try to follow the guidance of real writers (like this) in future texts.

So long, dear reader, have yourself an interesting 2018!

SAP HANA, databases and computers