Tag Archives: sql

New year, same issues and some tips.

Most folks working with SQL DBs for a while will likely come to see how beginners in this area struggle with “declaritive code” and “tell the DB what you want not how to do it”

In my experience, this issue stems partly from the way SQL is tought – examples are often way too simplistic to cover “advanced” topics – partly this is because it is very different thinking required when one comes from Python, JAVA or the whole notion of “telling the computer what to do”.

This often leads to horrendous SQL code and data models, that neither give correct results nor deliver them fast.

Despite all of this, I see many questions around tiny optimization options, command variations and “optimizer flags” from beginners all the time.

Astonishingly, this approach of looking for workarounds for problems with the data model understanding occurs regardless of the DBMS platform used.

– Lars Breddemann

What’s often missing in those questions is the broader context, WHY they want the DB to do something and WHAT eventually should be achived.

Basically, the core question “WHAT should be the outcome, the result?” remains unanswered both on top- and bottom-level.

Maybe this is because it is hard to express this desired outcome in a good way?

I recently read a really good presentation slide deck from Markus Winand that, besides explaining an interesting new SQL standard feature (row pattern matching), makes use of a nice way to illustrate what data should be selected and returned.

Screenshot of page 19 from the presentation "Row Pattern Matching" by Markus Winand.
 https://www.slideshare.net/MarkusWinand/row-pattern-matching-in-sql2016

The example used is non-trivial and certainly beyond what most SQL 101 courses cover and that’s another positive!
The visualization he uses gives a really good grasp on the kind of data and what the result should be.

Maybe this could be a pointer for how to improve your SQL writing in 2019:
Draw a picture of the data you have and what aspects of it that you want and then work build your SQL statement step by step based on this.
This is what I do in many cases when I try to understand what a SQL statement does, so this technique works both for constructing a new statement as well as for deciphering existing ones.

To finish this slightly ranting post of here are some links to presentations and twitter accounts that think are worthwhile to read or follow (not exhaustive at all).
If you do SQL programming as part of your daily struggle for bread work, take some time and check these resources.

There you go, now you know!

With that, have yourself a Happy New Year 2019.

SQL refactoring example

About a year ago I answered a question on the SAP Community Q&A site and have it on my posts-to-write list ever since to turn it into a blog post.

I feel that I learn a lot from seeing applied examples of concepts and rules – for example from this great presentation from Kevlin Henney that includes a “talk-through” a refactoring example. So, this is my example for others to review.

The premise of the question is rather common: a piece of code that was used on Oracle should be made to run on SAP HANA.
The OP had already managed to translate several parts of the Oracle code to work on SAP HANA but got stuck with an SQL Script error message.
The SAP HANA version used by the OP did not support BETWEEN as a check condition in an IF-clause. 
Current SAP HANA versions meanwhile have support for this but the code suffered from much more than just lack of syntactic sugar.
Here’s the original user-defined scalar function:

create FUNCTION "KABIL_PRACTICE"."F_KEEP_DATE"( TDATE DATE, TTODAY DATE) 
RETURNS RETVALUE CHAR
LANGUAGE SQLSCRIPT   
SQL SECURITY INVOKER AS
/******************************************************************************
NAME: F_KEEP_DATE
PURPOSE: RETURNS T/F WHETHER OR NOT THE DATE MEETS THE CRITERIA TO MAINTAIN DATA.
REVISIONS:
VER DATE AUTHOR DESCRIPTION
--------- ---------- --------------- ------------------------------------
1.0 6/18/2008 1. CREATED THIS FUNCTION.
******************************************************************************/
BEGIN
RETVALUE := 'F';
IF :TDATE BETWEEN ADD_DAYS (TO_DATE (:TTODAY, 'YYYY-MM-DD'), -30) AND :TTODAY 
OR  TO_CHAR( EXTRACT (MONTH FROM ADD_DAYS (TO_DATE (:TDATE, 'YYYY-MM-DD'), 1)))<> 
TO_CHAR( EXTRACT (MONTH FROM TO_DATE (TDATE, 'YYYY-MM-DD')))
THEN 
RETVALUE := 'T';
END IF;
--RETURN :RETVALUE;
END ;

Step 1: remove the fluff

First, remove the useless fluff of comments. The change history of the code is captured in the code repository, so keeping a log of changes is pointless here.
This implies that there is a code repository, of course. SAP HANA comes with the repository for XS (classic) and git support for XSA out of the box and both options allow for distributed development and change tracking of development resources.
Just removing the comments and adding line-breaks and some indentation gets us to this:

create FUNCTION "KABIL_PRACTICE"."F_KEEP_DATE"
                    ( TDATE DATE, TTODAY DATE) 
RETURNS RETVALUE CHAR
LANGUAGE SQLSCRIPT   
SQL SECURITY INVOKER AS
BEGIN
    RETVALUE := 'F';
    
    IF :TDATE BETWEEN ADD_DAYS (TO_DATE (:TTODAY, 'YYYY-MM-DD')
                               ,-30) 
              AND :TTODAY 
       OR TO_CHAR(EXTRACT(MONTH FROM ADD_DAYS 
                          (TO_DATE (:TDATE, 'YYYY-MM-DD'), 1)))
           <> TO_CHAR(EXTRACT(MONTH FROM 
                           TO_DATE (:TDATE, 'YYYY-MM-DD')))
    THEN 
        RETVALUE := 'T';
    END IF;
END;

Step 2: Find out what the code should do

 It’s a single IF statement, so the job of this function is to encapsulate the condition check.
Let’s focus on that and find out what are the conditions checked here:

:TDATE BETWEEN ADD_DAYS (TO_DATE (:TTODAY, 'YYYY-MM-DD'),-30) 
           AND :TTODAY 
OR 
   TO_CHAR(EXTRACT(MONTH FROM ADD_DAYS 
                          (TO_DATE (:TDATE, 'YYYY-MM-DD'), 1)))
<> TO_CHAR(EXTRACT(MONTH FROM 
                           TO_DATE (:TDATE, 'YYYY-MM-DD')))

We see that there are in fact two conditions of which either one needs to be true to make the whole function true.
Still, with all the data type conversion in place, it is hard to see, what is being checked here. 
So, let’s get rid of them, where we can.

Step 3: remove type conversions where possible

Something I see quite a lot in converted code is that date values get converted into text values and back again. This is probably due to practices on Oracle, but on SAP HANA, date values can be used directly without the need for conversion or data format specification. A date does not have a specific format, this only comes into play when the display should be printed to the screen. More on this here.

:TDATE BETWEEN ADD_DAYS (:TTODAY, -30) AND :TTODAY 
OR 
   EXTRACT (MONTH FROM ADD_DAYS (:TDATE, 1)) 
<> EXTRACT (MONTH FROM :TDATE) 

Now, we can see, that the first condition checks whether :TDATE is within the past 30 days.

The second condition checks if the day after :TDATE is still in the same month as :TDATE.
That means, this condition checks if :TDATE is the last day of the month.

Summarizing, the function returns 'T' (TRUE) when the provided date is either within the last 30 days or when the provided date is the last day of the month.

Checking the Datetime functions in SAP HANA we find a couple of useful functions that we can use to make this code clearer. 

Step 4: make use of built-in functions

The LAST_DAY function returns the last day of the month for the date it is used on, saving us the manual steps of extracting the month and checking if the next day would be a different month.

   :TDATE BETWEEN ADD_DAYS (:TTODAY, -30) AND :TTODAY 
OR :TDATE = LAST_DAY(:TDATE) 

Step 5: Pick names that help understanding the code

Removing the TTODAY variable is a bit tricky here; and we should remove it in order to make the function easier to understand. 

This variable could, in fact, be used as a flexible reference or anchor date. In this case, it should be renamed to something like reference_date.

But I have the suspicion that it really is always only used with today’s date. This allows us to remove the parameter altogether. 

We also should pick a better name for both the parameter and the function itself and change the return value to the more commonly used integer type as a substitute for a BOOLEAN.

The original comment indicated that the purpose of the function is to check whether data was still in a time window during which it can be modified.

I am not really happy with dataInEditPeriod for the function name but it sure is more obvious than F_KEEP_DATE.
Indicating the object type (‘F‘ for function? ‘SP‘ for stored procedure?) in the object name does not make it easier to understand the code.
If I can access the database via SQL I can easily look up what type any object has.
The remaining “KEEP_DATE” part is problematic as it refers to the intended usage scenario of the function (determining whether data should be kept). This says something about what the function shall be used for instead of what it does. 
My proposed alternative dataInEditPeriod tries to convey what the function does on the semantic level of the data model. Here, we are dealing with data that has a time frame in which it can be changed: “EditPeriod” and the function checks if the data is in this period.

As for the parameter, how about checked_day?

Finally, swap the clumsy RETVALUE for RESULT and we get this:

create FUNCTION "dataInEditPeriod"(checked_day DATE) 
RETURNS result INTEGER
LANGUAGE SQLSCRIPT   
SQL SECURITY INVOKER AS
BEGIN
    result := 0;
    IF     (:checked_day BETWEEN ADD_DAYS (current_date, -30) 
                                 AND current_date) 
       OR  (:checked_day = LAST_DAY(:checked_day))
    THEN 
        result := 1;
    END IF;
END;

Step 6: review how the function gets used

The calling code would then read like this:

begin
    if "dataInEditPeriod" (:CC_DATE) = 1
    then
        ...sql statements...
    else 
        ...sql statements...;
    end if;
end;

That is not perfect, but I’d say it is heaps better to understand than your starting position.

IF EXISTS would exist

A recurring question of folks using SAP HANA is how to only run a DROP command for objects that actually exist in the database. One could argue that the effect of executing DROP does not depend on whether the object to be dropped exists, because after running the command it will not be there either way, but there is a bit more to it.

See, DROP commands, like say DROP TABLE, report back an error message when the object cannot be found (e.g. because it does not exist).

drop table bla;

Could not execute 'drop table bla'
SAP DBTech JDBC: [259]: invalid table name: BLA: line 1 col 12 (at pos 11)

This can be annoying when running scripts to set up a database structure which is a common part of product installation and update procedures.

On MS SQL Server there exists (ha ha ha – extra point for the flattest pun) an extension to the DROP command (and to other DDL commands): DROP IF EXISTS

SAP HANA up to now (HANA 2 SPS 03) does not provide such a functionality, but it is easy to come up with a workaround.

The following utility function can be used:

drop function obj_exists ;

create function obj_exists (in schema_name NVARCHAR(256)
                          , in object_name NVARCHAR(256))
returns obj_exists int
as 
begin
declare _exists int := 0;    
    
      select case 
                when count(*) >0 then 1
                else 0
             end into _exists
       from objects
       where 
           schema_name = IFNULL (:schema_name, current_schema)
       and object_name = IFNULL (:object_name, '');

    obj_exists = :_exists;

end;

The function returns 1 if the object exists and 0 if it does not. If the input parameters contain NULL, then the function still works using the CURRENT schema but will likely return 0 as output.

select 
    obj_exists (schema_name => NULL 
              , object_name => 'OBJ_EXISTS') as OBJ_EXISTS
from dummy;

OBJ_EXISTS
1         

The above worked because the OBJ_EXISTS function was created in the current schema. With that one can easily write a SQL Script that “looks before it fires“.

Is SAP HANA SQL/SQLScript Turing complete?

There was an interesting (aka nerdy) question in the SAP Community Q&A the other day:
Are SQLScript and HANA SQL Turing complete?”

I took a swing at it and the following is my result (a slightly edited version of my original answer).


Alright, let’s do this! (even though I don’t have an idea why it would matter at all…)

So, I’m not a computer scientist, and only a third through “The Annotated Turing” (which seems to be a great book, btw). This means what follows is the layman’s approach on this.

First off, I had to understand, what is meant by “Turing complete” and how to show that a language has that characteristic.

For that, there are a couple of Wikipedia entries and SO discussions available and I leave that for everyone to read (e.g. here and here).
One of those discussions linked to a presentation that claimed to prove that PostgreSQL-SQL with recursive common table expressions (CTS) is Turing complete. In order to prove this, the author of the presentation (here) said, that it’s enough to prove that a language can emulate another language that has already been shown to be Turing complete.
Fair call.
The author’s choice was a cyclic tag system with a specific rule set (rule 110) which apparently has been shown to be Turing complete.
Then the author goes on and implements this cyclic tag system with a recursive common table expression and thereby proves the claim.

Yippie.

So, what does that mean for SAP HANA SQL?
SAP HANA SQL/SQLScript does not support recursive common table expressions (much to the distaste of everyone who tries to handle hierarchies and does not know about SAP HANA’s special hierarchy views and functions (look there) and it also does not support recursive procedure calls.

Bummer, one might think.
Fortunately, every recursion can be expressed as an iteration (compare here), so I thought, let’s try this cyclic tag system in SQLScript.
This is the result (HANA 1, rev.122.15, on my NUC system). The SQL code is also available in my GitHub repo.

do begin
declare prods VARCHAR(4) ARRAY;
declare currProd, initWord, currWord VARC<span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"></span>HAR(300); -- 300 is arbitrary and would be exceeded for more runs
declare currProdNo integer = 0;
declare runs, maxruns bigint = 0;

	initWord :='11001';                   -- the starting/initial 'word'
	maxruns := 100;                       -- a limit to the number of iterations 
                                              -- rule 110 is suspected to run indefinitively
    prods = ARRAY ('010', '000', '1111');     -- the three 'producer rules' stored in a string array

    currWord := :initWord;
    runs := 0;
    -- dummy table var to monitor output
    tmp = select :runs as RUNS, :currProd as CURRPROD, :currWord as CURRWORD 
    from dummy;

    while (:runs &lt; :maxruns) DO
        runs := :runs+1;
        
        currProdNo :=  mod(:runs,3)+1;    -- pick rule no. 1,2 or 3 but never 0
                                          -- as SQLScript arrays are 1 based
        currProd := :prods[:currProdNo];

        if (left (:currWord, 1)='1') then  -- add current producer to the 'word'
            currWord := :currWord || :currProd;
        end if;
        
        currWord := substring (:currWord, 2); -- remove leftmost character

        -- save current state into temp table var
        tmp = select RUNS, CURRPROD, CURRWORD from :tmp
              union all 
              select :runs as RUNS, :currProd as CURRPROD, :currWord as CURRWORD 
              from dummy;

    end while;

    select * from :tmp;                      -- output the table var
end;

Running this gives the following output:

/*
Statement 'do begin declare prods VARCHAR(4) ARRAY; declare currProd, initWord, currWord VARCHAR(300); declare ...' 
successfully executed in 7&lt;span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"&gt;&lt;/span&gt;17 ms 39 µs  (server processing time: 715 ms 590 µs)
Fetched 101 row(s) in 2 ms 517 µs (server processing time: 0 ms 424 µs)

RUNS    CURRPROD    CURRWORD                                                                             
0       ?           11001                                                                                
1       000         1001000                                                                              
2       1111        0010001111                                                                           
3       010         010001111                                                                            
4       000         10001111                                                                             
5       1111        00011111111                                                                          
6       010         0011111111                                                                           
7       000         011111111                                                                            
8       1111        11111111                                                                             
9       010         1111111010                                                                           
10      000         111111010000                                                                         
11      1111        111110100001111                                                                      
12      010         11110100001111010                                                                    
13      000         1110100001111010000                                                                  
14      1111        1101000011110100001111                                                               
15      010         101000011110100001111010                                                             
16      000         01000011110100001111010000                                                           
17      1111        1000011110100001111010000                                                            
18      010         000011110100001111010000010                                                          
19      000         00011110100001111010000010                                                           
20      1111        0011110100001111010000010                                                            
21      010         011110100001111010000010                                                             
22      000         11110100001111010000010                                                              
23      1111        11101000011110100000101111                                                           
24      010         1101000011110100000101111010                                                         
25      000         101000011110100000101111010000                                                       
[...]
*/

That looks suspiciously like the output from the Wikipedia link (above) and does not seem to stop (except for the super-clever maxruns variable ).
With that, I’d say, it’s possible to create a cyclic tag system rule 110 with SQLScript.
The tag system is Turing complete, therefore SQLScript must be Turing complete.

Still, I have no idea why this would matter at all and that’s really all I can say/write about this.

Cheers,
Lars

What’s your number?

Have you been using the ISNUMERIC() function of MySQL, MS SQL Server or PostgreSQL and wondered what to use in SAP HANA instead?
If so, this is the blog for you.

Just recently I’ve seen a customer using my old workaround (first published here) and then a few days later the related stackoverflow question got updated.

This topic is not dead

The new answer to the SO discussion took a popular approach to the IS_NUMERIC problem:

If the string can successfully be converted into a numeric type, then the value is numeric (IS_NUMERIC returns 1 for TRUE), else it’s not (IS_NUMERIC returns 0 for FALSE).
This is a completely feasible approach (and used on other platforms as well, e.g. Oracle or PostgreSQL) and provides a result that allows making the decision of whether or not the tested value can be turned into a number data type in HANA.

IT DEPENDS, as usual

The problem, or rather the limitation of this approach is, that numeric can mean different things in different circumstances.
Sometimes, one might want to accept ‘123.456,–‘ as numeric. Sometimes “$500.00 –” shall be OK, and so on.
Checking the T-SQL definition of ISNUMERIC (here and here) shows that there are plenty other options.

This means nothing else that there are alternative approaches for this problem.
One of those alternatives could be to use regular expressions (regex), which have been supported by HANA for quite some time now (since SP10 if I’m not mistaken).
With tools like https://regex101.com/ one can comfortably work out a pattern that matches or doesn’t match the kind of data one is interested in.

It might well be, that in your application the question is not about whether a string or sub-string can pass IS_NUMERIC() but rather something more specific.
The customer that I mentioned earlier, for example, wanted to see if a string looks like an ICD10 code (one character followed by two digits, ‘B13’, ‘A00’, etc.). Snipping out the first character and putting the rest into an IS_NUMERIC() function is not ideal here.

Instead, the regex ‘^\D\d{2}.’ can be used to capture the whole pattern.

Show me how it’s done

Let’s take a rather simple regex that matches any string that

  • is not a number between 0-9
  • is not a plus (+), minus (-), comma (,) or full stop (.)
  • is not an exponent ‘E’ (as in 1.23E02)

Such a regex is ‘[^0-9-.\,+e]’ (check here) with flags for global, case-insensitive matching.

In SAP HANA SQL we can use this regex in the LOCATE_REGEXPR() function. If it doesn’t find any matching string, then the result is 0, else it will return the location of the first matching character.
Since we are only interested in those cases, where no matches were found, we can easily MAP this output to make up a IS_NUMERIC value:
1 = no unwanted characters found, it’s numeric
0 = unwanted characters found, it’s not numeric

MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                    FLAG 'i' 
                    IN <string to be searched>)
     , 0, 1
     , 0)

So, what’s better now?

Using the “try-the-conversion“-approach can only be done via a procedure or a scalar function. This brings some overhead with it for the function invocation which can be avoided, when using the regex directly in the SQL statement.

As an example, I ran a quick test against 10.000.000 unique values (all tests were done against HANA1 1.00.122 and HANA2 2.00.020).

Conversion-approach function

select 
       sum("IS_NUMERIC_CONV"(id_char) )
from 
       "DEVDUDE"."DIM10000000"
where 
       "IS_NUMERIC_CONV"(id_char) = 1
   and "IS_NUMERIC_CONV"(id_char) IS NOT NULL;

This ran for roughly 19.5 seconds with 8 CPU cores at 100% usage.
And this is the runtime when all values actually can be converted correctly. With the same number of non-numeric values, the runtime goes up to 2:45 mins.

Regex-approach pure SQL

select 
       SUM (IS_NUM_REGEX)
from 
     (select 
             MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                                 FLAG 'i' 
                                 IN id_char)
                  , 0, 1
                  , 0) IS_NUM_REGEX
       from 
            "DEVDUDE"."DIM10000000")
where 
     IS_NUM_REGEX = 1 
 and IS_NUM_REGEX IS NOT NULL;

This ran for roughly 8.6 seconds with 8 CPU cores at 100% usage. Same runtime for both convertible and non-convertible values, here.

Too good to be true

By now you maybe think: “Gee, that regex thing is a lot faster, but also really ugly code. Let’s put that into a function and have our cake and eat it too!“.

Off we go:

FUNCTION "DEVDUDE"."sandpit::IS_NUMERIC_REGEX" (
                 in stringToCheck nvarchar(5000)) 
         RETURNS isNumeric integer
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER AS
BEGIN
     isNumeric := MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                                      FLAG 'i' 
                                      IN :stringToCheck)
                       , 0, 1
                       , 0);
END;

A statement using this function takes roughly 30 seconds with 8 CPU cores at 100% usage.

No cake for us!

As long as all our input data is clean, the conversion-function approach works nicely but really becomes expensive, when there are many cases, for which the exception handling needs to be executed. The lesson here is probably that it’s not a great (performance) idea to make exception handling part of your main processing path.

A truly high performing solution would need to be implemented as a native HANA function, so for it needs to be carefully checked what exactly the job of IS_NUMERIC()  shall be in your application and how many matches/non-matches are expected.

There you go, now you know!

Cheers,
Lars

Eco-friendly slide recycling with HANA

Hey there!

I recently presented a short session on my experiences with developing and debugging non-classic-SAP HANA solutions at the Leading Insights conference in Melbourne (27./28.3.2017).

And since the slides are prepared in a way that they could be used without my talking as well, I thought, I might just post them. So, there they are on slideshare:

if you feel this is a shameless content plug, you’re probably right, but I got the feeling, that this might still be interesting to some readers of the “SAP HANA” tag.

Content-wise, this is what you can expect (just in case you don’t want to swap to page 2 which shows the table of contents):

  • an exciting example for a pure HANA development without any NetWeaver/ERP background
  • some practical tips for finding performance issues with SQL statements
  • and tidbits of performance gone bad and how to fix it.

There you go; now you know.

Cheers,
Lars

On multiple mistakes with IN conditions

Based on SAP HANA SPS 11

Update 1.6.2017

Retesting the APPLY_FILTER() approach on rev. 112.07 showed that it is now possible to simply hand APPLY_FILTER() an IN condition with a list of integers.
This approach for the workaround of the general limitation on multiple input parameter values for scripted calculation views is also shown in SAP note  “2315085 – Query with Multi-Value Parameter on Scripted Calculation View Fails with Incorrect Syntax Error.

Therefore, if you are using a current version of SAP HANA and still develop/use scripted calculation views, it is not recommended to use the CE-function approach I explained in this blog post. The limitation that this approach worked around has been fixed in SAP HANA.


Dear readers

there is a long-standing modelling problem with SAP HANA calculation views:

Using multiple input parameters to filter data similar to the SQL IN predicate.

This discussion Handling multi value input parameters can be taken as a comprehensive example.

It seems so straight forward at first and so practical.

Once the input parameter is defined the data preview tool built into SAP HANA Studio or your reporting client of choice can read the meta data for it and present the user with a nice UI dialog to specify values.

Something as fancy as this:

input_par_967338

Now, the way that this works is rather counter intuitive.

For graphical calculation views, there are a couple of nicely written blog posts available, like Using Multiple Values in Input parameter for filtering in Graphical Calculation View but it seems that scripted calculation views did simply not want to be as flexible.

For those, rather clunky (and not very well performing) solutions had to be built to make it possible at all, (see SAP HANA: Handling Dynamic Select Column List and Multiple values in input parameter or How to process and use multi-value input parameter in a scripted view in HANA)

Either the solution involved dynamic SQL or some form of parameter string mangling with loops and pseudo-dynamic temporary result set constructs.

Other approaches proposed to avoid the problem altogether and use multiple parameters (instead of one multi-valued parameter).

Developer arrogance driving solution finding…

The last time I read one of those discussions (yesterday) I thought:

 

“This cannot be the right solution. There must be some easier way to do it!”

So arrogance got the better of me – HA! It cannot be that difficult. (It’s so cheesy that for once Comic Sans is a fitting choice).

I dare to guess that nearly every developer had that feeling every now and then (if not, I would have a hard time finding a good explanation for so many drastically underestimated development efforts…)

Attacking the problem

My first impulse was to use the APPLY_FILTER() function, but I soon learned what many others probably discovered before: it doesn’t solve the problem.

The reason for that is the way APPLY_FILTER() works.

It takes the table variable and your filter string and constructs a new SQL statement.

For example, if your table variable is called vfact and your input parameter selection was 1, 2 and 5 your scripted calculation view could look like this:


/********* Begin Procedure Script ************/
BEGIN
  vfact = select * from fact;
  declare vfiltD10 nvarchar(50); -- this is a temp variable to construct the filter condition
  vfiltD10 = ' "DIM10" IN ( ' || :IP_DIM10 || ' )';
  var_out = APPLY_FILTER (:vfact, :vfiltD10);
END /********* End Procedure Script ************/

This compiles fine and if you try to run it with some parameters you are greeted with a surprise:


SELECT
     "DIM10", "DIM100", "DIM1000", "DIM1000000",
     "KF1", "KF2"
FROM "_SYS_BIC"."devTest/MULTIIP"
        ('PLACEHOLDER' = ('$$IP_DIM10$$','1,3,6')) ;

Could not execute ‘SELECT “DIM10”, “DIM100”, “DIM1000”, “DIM1000000”, “KF1”, “KF2” FROM “_SYS_BIC”.”devTest/MULTIIP” …’ in 373 ms 962 µs .

SAP DBTech JDBC: [2048]: column store error: search table error:  [2620] “_SYS_BIC”.”devTest/MULTIIP/proc”: [130] (range 2) InternalFatal exception: not a valid number string ‘1,3,6’

Not only is this error annoying, but it’s FATAL… shudder!

After some investigation, I found out that the input parameter not only provides the digits and the separating commas but also the enclosing single-quotes.

Nothing easier than getting rid of those:


  vfiltD10 = ' "DIM10" IN ( ' || replace (:IP_DIM10 , char(39), '')  || ' )';

With this, the single-quotes get easily removed (39 is the ASCII value for the single quotes and the CHAR function returns the character for the provided ASCII code – this just makes it easier to handle the double-triple-whatever-quotation syntax required when the single-quote character should be put into a string).

Of course, seeing that we have not yet reached the end of this blog post, you already know: that wasn’t the solution.

The problem here was not only the quotation marks but also that  SAP HANA does not parse the string for the input parameter value. The result for the filter variable is that we do not get the condition

  actual condition          ===> syntax structure

  “DIM10” IN ( 1, 3, 6)     ===> X IN ( c1, c2, c3)

but

  “DIM10” IN ( >’1, 3, 6′<) ===> X IN ( c1 )

So even when we remove the quotation marks, we still end up with just one value (I enclosed this single value in >’ ‘< for easier distinction).

Interlude

The different syntax structures pointed out above are easily overlooked also in standard SQL. Often developers do not fully realise that an IN condition with 3 parameters is structurally different from an IN condition with 2 or 4 parameters.

Whenever the number fo parameters of the IN condition changes, the statement is effectively a new statement to the database, requiring new parsing and optimisation and also allocating its own space in the shared SQL cache.

This is another detail that ABAP developers do not need to worry about, since the

SAP NetWeaver database interface gracefully splits up IN-lists into equal chunks and recombines the result set automatically. See this ancient piece SAP Support case “FOR ALL ENTRIES disaster” for more details.

One approach to avoid this issue can be to use temporary tables instead of the IN condition. Especially when parsing/query optimisation is taking a long time for your application, this might be an approach worthwhile to implement.

Back to the main topic though!

So, the “obvious” approach of using APPLY_FILTER() does not help in this case.

Is it possible that it is just not possible to take multiple input parameter values into an IN list? But graphical calculation views can do it – and rather easy.

And in this observation laid the key for the solution. What is different between graphical and scripted calculation views?

Right, graphical calculation views do not produce SQL for the boxes we draw up.

Technically speaking it replaces them with Plan Operators – very much similar to the abandoned CE_-functions.

Do you see where this is heading?

Yes, indeed. The solution I found works with CE_-functions.

Oh, how very heretic!

May the performance gods get angry with me for making the SAP HANA execution switch engines…

But first, let’s look at the solution, shall we?


/********* Begin Procedure Script ************/
BEGIN
     vfact = select * from fact;
  var_out = CE_PROJECTION(:vfact,
                     [ "DIM10", "DIM100", "DIM1000", "DIM1000000"
                     , "KF1", "KF2" ],
                      'IN ("DIM10", $$IP_DIM10$$)');
END /********* End Procedure Script ************/

Easy to see, this approach mimics the filter approach for graphical calculation views.

To not over complicate things I only used the CE_PROJECTION function for the filter part – everything else is still in efficient, familiar SQL.

Important to note is that this works only, when the input parameter is referenced with the $$<name>$$ format.

Also important to recall is that the complete filter expression needs to be provided as one string enclosed in single quotation marks ( ‘ <filter expression goes here> ‘ ).

OK!“, you may say, “this works, but now you broke the holy rule of CE_-functions damnation. The performance of this surely is way worse due to the implicit engine change!

Well, let’s have a look into this!

First the explain plan for the SQL based statement:


SELECT
     "DIM10","DIM100","DIM1000", "DIM1000000",
     "KF1","KF2"
FROM FACT
where DIM10 IN (1,3,6) ;

OPERATOR_NAME   OPERATOR_DETAILS                                         EXEC_ENGINE SUBTREE_COST

COLUMN SEARCH   FACT.DIM10, FACT.DIM100, FACT.DIM1000, FACT.DIM1000000,  COLUMN       1.645529062

                FACT.KF1, FACT.KF2                                                  

                (LATE MATERIALIZATION, OLTP SEARCH, ENUM_BY: CS_TABLE)              

  COLUMN TABLE  FILTER CONDITION:                                                   

                (ITAB_IN (DIM10))                                                   

                FACT.DIM10 = 1 OR FACT.DIM10 = 3 OR FACT.DIM10 = 6       COLUMN    

Now the scripted calculation view version:


SELECT
     "DIM10","DIM100","DIM1000", "DIM1000000",
     "KF1","KF2"
FROM "_SYS_BIC"."devTest/MULTIIP"
        ('PLACEHOLDER' = ('$$IP_DIM10$$','1,3,6')) ;

OPERATOR_NAME   OPERATOR_DETAILS                                         EXEC_ENGINE SUBTREE_COST

COLUMN SEARCH   FACT.DIM10, FACT.DIM100, FACT.DIM1000, FACT.DIM1000000,  COLUMN       1.645529062

                FACT.KF1, FACT.KF2                                                  

                (LATE MATERIALIZATION, OLTP SEARCH, ENUM_BY: CS_TABLE)              

  COLUMN TABLE  FILTER CONDITION:                                                   

                (ITAB_IN (DIM10))                                                   

                FACT.DIM10 = 1 OR FACT.DIM10 = 3 OR FACT.DIM10 = 6       COLUMN     

See any difference?

No?

That’s right, there is none. And yes, further investigation with PlanViz confirmed this.

SAP HANA tries to transform graphical calculation views and CE_-functions internally to SQL equivalents so that the SQL optimizer can be leveraged. This does not always work since the CE_-function are not always easy to map to a SQL equivalent, but a simple projection with a filter works just fine.

Now there you have it.

Efficient and nearly elegant IN condition filtering based on multiple input parameters.

There you go, now you know.

Have a great weekend everyone!

Lars

 

Fiddle riddle – Amazing: stuff gets faster and takes less space!

In Quick notes during a fiddle-session I played around with a table that contained all possible values for a date column.

To fill it I used a FOR LOOP with an INSERT and a commit every 10000 records.

That took 9:37 minutes for 3.652.061 records.

That’s terrible performance and nothing that needs to be accepted with SAP HANA!

I found a way to do the same much quicker:

Statement ‘insert into rmanydates (select * from all_days)’

successfully executed in 6.120 seconds  (server processing time: 4.630 seconds) – Rows Affected: 3652060 ;

Question 1:

The dates inserted above have not been pre-computed in this example.

How do I do this?

——–

Answer 1

 

The 6,120 seconds to create every possible date once and insert it into the row table is a lot faster than the one-by-one looping approach I took in the procedure (manly for clarity purposes).

But, as Fernando Da Ros demonstrated, there are other options to do that.

The one I chose was part of a rather new feature in SAP HANA: SAP HANA Series


SELECT to_date(GENERATED_PERIOD_START)
    FROM SERIES_GENERATE_TIMESTAMP('INTERVAL 1 DAY', '01.01.0001', '31.12.9999')

——–

I also looked at the memory consumption for the stored date tuples.

In the blog post I just used a row store table, because I thought: well, there’s not going to be any column store compression anyhow.

(you know, because column store compression mainly builds on compressing duplicate column values. But with every possible date once, there are no duplicates, so no compression – or is there?)

However, I managed to get the data loaded into a column store table and use a lot less memory.

Question 2:

How to get from

 

BEFORE

— General —

Total Memory Consumption (KB): 25.719

Number of Entries: 3.652.061

Size on Disk (KB): 23.152

Memory Consumption in Main Storage (KB): 25.700

Memory Consumption in Delta Storage (KB): 19

Estimated Maximum Memory Consumption (KB): 25.719

to

AFTER

— General —

Total Memory Consumption (KB): 1.645

Number of Entries: 3.652.061

Size on Disk (KB): 12.912

Memory Consumption in Main Storage (KB): 1.626

Memory Consumption in Delta Storage (KB): 19

Estimated Maximum Memory Consumption (KB): 1.645

Both tables were fully loaded when the space consumption was analyzed.

if you know how that works, put your answers into the comments section! 😀

——–

Answer 2

This one is again a feature from SAP HANA Series: the Series Table,

 


create column table smanydates (dday date)
SERIES (  EQUIDISTANT INCREMENT BY INTERVAL 1 DAY
     MINVALUE '01.01.0001'
     MAXVALUE '31.12.9999'
     PERIOD FOR SERIES (dDAY));

The above CREATE TABLE statements, specifies that the timestamp information in column DDAY are not explicitly stored.

Instead the internal storage is merely a calculation formula that “knows” how every date between MIN/MAXVALUE with the granularity of 1 DAY can be computed.

Instead of storing a full date now, only a number uniquely identifying a day needs to be stored.

Given the starting date ‘01.01.0001’ the current day would then be stored just as the integer 735.832 (20 bits).

Of course there are also limitations to this type of data storage but for ever increasing regular date or timestamp information, this is really an option to save a lot of memory.

For more information around SAP HANA series please check

——–

– Lars

Quick notes during a fiddle-session

For a while now I am working as a Solution Architect for the SAP Custom Development organisation.

This is the team you’d call when you want custom-built solutions that perfectly fit into your standard solution landscape.

My job there is not coding or implementing systems, but rather designing Solutions and turning those designs into commercial offers.

Obviously, it seems like a good idea to every now and then go and check that what I remember of doing “real work” is actually still true. and so I try to keep up with our tools and platforms as well as anyone.

While trying out different stuff for something that might become a blog post, later on, I came across some things I considered noteworthy, “interesting” or otherwise out of the expected.

All this is based on SAP HANA rev. 101 and SAP HANA Studio 2.1.6.

Change of default statement preparation in SAP HANA Studio

In Issue: MDX Query not executing in Hana Studio 2.0.8 Version and Re: Regarding “Generate Time Data” Option in HANA folks complained about the fact that MDX statements suddenly started to return error messages even though the worked in older SAP HANA Studio versions.

The error message was:

SAP DBTech JDBC: [2]: general error: MDX query doesn’t support prepared statement

I first proposed to circumvent this by creating a little wrapper procedure but this is inconvenient.

Shady Shen provided the correct hint: un-check the checkbox “Prepare statements before execution”.

This seems to have been set to active by default in the more recent versions of SAP HANA Studio. Once again a change that doesn’t occur in any documentation or whats-new note… 🙁

At least it’s not necessary to close/re-open the SQL editor to get this changed setting active.

No DEFAULT value specification in INSERT/UPDATE/UPSERT possible

Let’s say you were diligent with your table column specification and not only provided a name and data type but also a NOT NULL constraint and a DEFAULT value clause.

Something as extravagant as this:


create table test (id int not null default 0);

Now you want to make use of this and by following SQL standard you use the DEFAULT keyword to specify that you want to have the pre-defined default value:


insert into test (id) values (default);

SAP DBTech JDBC: [260]: invalid column name: DEFAULT: line 1 col 31 (at pos 30)

This is not to say SAP HANA doesn’t support default values.

If you want to use default values, you mustn’t include them in the DML statement at all.

So the following works without problems:


alter table test add (id2 int not null default 1);
insert into test (id) values (1);
select * from test;
ID ID2
1 1

Would be lovely to find such nuances in the documentation.

Limits of date operations

When working with date data types it’s real easy to forget about the limitations of this data type.

One example could be that non-existing dates like the February 30 or dates between 05.10.1582 and 14.10.1582 are rejected or mapped to the correct dates.

Another limitation I actually ran into while generating random garbage data for a date column is the smallest/largest date.

This time the documentation is clear about this: The range of the date value is between 0001-01-01 and 9999-12-31.

Usually I wouldn’t think much about this, but when using expressions like the following to create random dates, I hit an error:


select add_days (date'01.01.1980', rand()*10000000) from dummy;

[303]: invalid DATE, TIME or TIMESTAMP value: internal error in AddDays() at function add_days()

In my example, the 10000000 was, in fact, a sequence number in a rather large data set.

So, the quick fix would be to set an upper limit for the value that is fed into add_days().


select add_days (date'01.01.1980', least(rand()*10000000, 3560) from dummy;

The least() function comes in real handy here at it makes sure that the largest possible value returned will be 3560 (roughly 10 years worth of days).

Together with its sister the greatest() function, it’s easy to create upper and lower limits for generated values.

But, what if we don’t want to arbitrarily set the limit below what would be possible with this data type?

Easy too:


select days_between (date'31.12.9999', current_date) from dummy;

DAYS_BETWEEN(31.12.9999,CURRENT_DATE)

-2916233

Gives you the number of days between today and the end of times – huh… gotta hurry!

Admittedly this example is super artificial and has no connection to real-life dates.

However, it shows that there are limitations to the date data type in SQL that it’s not too difficult to step into them.

There are of course other examples where a richer semantic for this data type would be useful.

Being able to have a MIN_DATE or MAX_DATE entry that would explicitly show up like this would be

Wandering off…

The date topic side-tracked me at this point and I looked into some other points, too:

I’ve shown the “generator query”-trick (cross join of objects system table to generate millions of rows) before.

This is of course not the only way to create a lot of rows.

Another option is to write SQL Script and this was the first time I used the new SPS 10 capability to run SQL script directly in the SQL console without the need for a wrapper procedure:


create table manydates (dday date);
do
begin
    declare i bigint;
    for i in 0 .. 3652060 do
        insert into manydates (dday) values (add_days(date'01.01.0001', :i) );
        if (mod(i, 10000) = 0 )
        then
            commit;
        end if;
   
    end for;
end

This little gem runs for a while and fills a table with all possible date values.

Statement ‘do begin declare i bigint; for i in 0 .. 3652060 do insert into manydates (dday) values …’

successfully executed in 9:37.694 minutes  (server processing time: 9:37.681 minutes) – Rows Affected: 0

Now, about we claimed that no “no-dates” entries would be created.

And checking for duplicates via


select count(distinct dday) from manydates;

and


select dday, count(*) from manydates
group by dday having count(*) &gt;1;

show that there aren’t any.

But how’s this?

We are looking at how many years worth of days here? 9999 exactly.

So, leaving out leap years we should end up with 365 * 9999 = 3.649.635 separate days.

That number is 2.426 days short of what we actually find in our table.

Alright, so then just taking 366 days per year will help us, maybe?

Nearly: 3.659.634 days result here, which is 7.573 days too many.

Maybe trying to account just for the leap years separately can help here.

Leap years occur roughly every four years, so for 9999/4 = 2.499 years, we should add one day.

That gets us to 3.652.134, which is just 73 days off from the actual number of days in this dates table.

We can even check this in SQL again:


select count(distinct year(dday))  from manydates
where mod(to_integer(year(dday)), 4) = 0;

COUNT(DISTINCT YEAR(DDAY) )

2499

But what leap years didn’t we cover with our simple arithmetic?


(select year(dday), count(dday)
    from manydates
    where  mod(to_integer(year(dday)), 4) = 0
    group by year(dday))
     
EXCEPT
(select year(dday),  count(dday)
    from manydates
    group by year(dday) having count(dday) &gt; 365
)
order by year(dday);

Fetched 63 row(s) in 21 ms 617 µs

YEAR(DDAY) COUNT(DDAY)
1700 365
1800 365
1900 365
2100 365
2200 365
2300 365
2500 365
2600 365
2700 365
2900 365
3000 365
3100 365
3300 365
3400 365
3500 365
3700 365
3800 365
3900 365
4100 365
4200 365
4300 365
4500 365
4600 365
4700 365
4900 365
5000 365
5100 365
5300 365
5400 365
5500 365
5700 365
5800 365
5900 365
6100 365
6200 365
6300 365
6500 365
6600 365
6700 365
6900 365
7000 365
7100 365
7300 365
7400 365
7500 365
7700 365
7800 365
7900 365
8100 365
8200 365
8300 365
8500 365
8600 365
8700 365
8900 365
9000 365
9100 365
9300 365
9400 365
9500 365
9700 365
9800 365
9900 365

If you followed up to here, you’ll notice, that we were looking for 73 “missing days”.

Comparing the set of years where the number of the year is divisible by 4 without remainder and the set of years where the number of days is larger than 365 revealed 63 days.

So, where are the 10 last days here?

Remember that odd reference on the Gregorian Calendar reformation?

That took ten days away in 1582!


select count(distinct dday) from manydates
where year(dday) = 1582;

COUNT(DISTINCT DDAY)

355

So far, so good.

We found all the “missing records”, but why aren’t these counted as leap years anyhow?

That’s because I used a too simple rule here.

Leap Year nearly every 4 years explains that a leap year is only counted as such if

  • The year is evenly divisible by 4;
  • If the year can be evenly divided by 100, it is NOT a leap year, unless;
  • The year is also evenly divisible by 400. Then it is a leap year.

Let’s run that against our dates table:


select year(dday) yr, count(dday) day_cnt
from manydates
where
       mod(to_integer(year(dday)), 4) = 0
       and
       not (mod(to_integer(year(dday)), 100) = 0)
      or
        mod(to_integer(year(dday)), 4) = 0
        and
        mod(to_integer(year(dday)), 400) = 0
group by year(dday)
;

Fetched 2424 row(s) in 123 ms 147 µs (server processing time: 1 ms 665 µs)

Alright – the rule gives us 12 years less than what is in our dates table.

Which ones?


select mod_y.yr mod_yr, long_y.yr long_yr
from
  (select year(dday) yr, count(dday) day_cnt
  from manydates
  where
        mod(to_integer(year(dday)), 4) = 0
        and
        not (mod(to_integer(year(dday)), 100) = 0)
       or
         mod(to_integer(year(dday)), 4) = 0
         and
         mod(to_integer(year(dday)), 400) = 0
        group by year(dday)
  ) mod_y
     
FULL OUTER JOIN
  (select year(dday) yr,  count(dday) day_cnt
     from manydates
     group by year(dday) having count(dday) &gt; 365
  ) long_y
  on mod_y.yr = long_y.yr
WHERE
   mod_y.yr is null or long_y.yr is null;
MOD_YR LONG_YR
? 100
? 500
? 900
? 1300
? 200
? 600
? 1000
? 1400
? 300
? 700
? 1100
? 1500

Can you spot the reason?

All of those records violate the rule of not being divisible by 100 except when they are divisible by 400.

Why had those years been generated with one day too many by our add_days() function?

The Gregorian Calendar Reformation clearly did change history but it didn’t change the past 🙂

Before this regulation got active, the leap years didn’t follow those rules (which called for the Reformation in the first place).

And as we see there are no false leap years after the year 1500.

That’s it for today.

Funny things come up when one spends some hours on a rainy Sunday afternoon fiddling with SAP HANA SQL.

There you go, now you know!

– Lars

The 7th Niceness of new toys’n’tools

If you’re not living behind some red moon and you’re reading this blog, you will likely be aware that SAP HANA SPS 10 had just been released.

With it also the SAP HANA Studio had been improved and extended.

Since I personally like eye-candy and enjoy exploring new features when they come with a button to click, I have installed the new version on my laptop and played with it; against a SPS 9 SAP HANA database!

That’s right – everything I describe here works just well on a Rev. 96, which means you can use these features even without upgrading your server.

(This is one of the benefits of the de-coupling of SAP HANA Studio releases and SAP HANA Server releases – indicated by the different version numbers.)

So here we go with the nice things I found in SAP HANA Studio 2.1.4.

Drop and drop of design time objects

Since basically ever I wished I could just drag and drop information models into the SQL editor so that I could execute SELECT statements against them.

No idea what took the dev-colleagues so long, but now that’s actually possible.

Simply dragging the design time object (you know, the object as it appears under the Content folder) over into the SQL editor and the corresponding runtime object name (yes, the _SYS_BIC.”<package_name>… <object_name>”) is placed right at your cursor.

Alternatively, there is a context menu entry “Copy Column View Name” that puts the name into the clipboard.

Nice!

SELECT statement generation incl. input parameters

Similar to the above, you might have stumbled over models with input parameters.

The usual workaround to use the SQL generated by the data preview is alright, but a bit clumsy.

With SAP HANA Studio 2.1.4 you can now simply right-click and select “Generate Select SQL“.

After a few moments of checking the metadata, you get a new SQL editor with the basic SELECT SQL:

 

As you can tell, someone automatically pressed the “Format” button as well.

Nice!

Compare two plans in PlanViz

More than once I got the question on how to perform comparisons between two PlanViz traces and the usual answer had to be: you put the two PlanViz editors next to each other and compare the Plan OPerator boxes one by one. This is of course tedious.

Now, there is the (once again – anyone remember Show me the timelines, baby! ? 😉 ) well hidden feature to compare two plans.

I found it in the context menu of an executed PlanViz by clicking on “Compare With Another Plan…” – not sure if there is another way to this feature.

Next you’re greeted by a little selection dialogue window, that I found rather self-explanatory:

Seems to be straight forward – so I selected one of the other open PlanViz displays by clicking “Other visualized plan“.

DISCLA(I)MER: for the next step, you should have a wide monitor. Or two monitors. Wide ones. Really! (It’s better for you anyhow, if you use it right.)

See? I told you so!

As I haven’t really worked with the feature there’s little I can report beyond that it’s there.

Go play around yourself!

In my books, this feature really is:

Nice!  🙂

Immediate execution for PlanViz

Alright, you have your SQL statement. You want the PlanViz for it.

Up to now you were always confronted with some “Prepare” step and a graphical version of the Explain Plan.

How useful…not!” you might have thought (so did I most of the times 😉 ).

No more of that!

Modern days SAP HANA Studio comes with this shortcut in the context menu:

Nice!

Identify plan operations in PlanViz by name

Ok, the following is not actually a new feature, but I have just found this and decided to put it in this anyhow.

A major difficulty in analysing PlanViz outputs often is to map the model design to the actual execution.

This is inherent to the calculation view instantiation process but you can make it a bit easier for you by naming the nodes in your model with your own names.

To understand what I mean, just look at those two models, that do pretty much the same thing:

Default node names Custom node names

At this point there is already a benefit as the intended function for the nodes is much clearer.

Now, when the model gets executed and a PlanViz is created, we can find those node names in several places:

1. In the POP node details:

2. In the “Tables Used” list:

3. In the “Operator List”:

You will still find a lot of things that you cannot easily match back to the model, but with the custom node names, you should be able to find some anchors in the PlanViz.

Not sure about you, but I think that’s Nice!.

Comments in information models

One of the things you wish you had when working with complex models, that you or someone else created a while ago typically is: a documentation.

  • Why, just why is this condition there?
  • What exactly is the idea behind this calculated column?
  • Where should this data go?

Wish no more!

With the SPS 10 SAP HANA Studio you can now add comments/notes to virtually anything in the model.

Click the little “Maintain Comment” button in the model, when hovering over a node…

… then enter some amazingly useful comments!

Once you close the comments editor you’ll see a little indicator at the node-icon to show that there’s a note for you:

Let’s say it together:

Nice!

Data Lineage in Modeler

When working with the modeler it can get quite difficult to understand where the data of a certain output column actually comes from.

To help with that, there is now the Data Lineage feature available.

The red borders around the model nodes indicate that the data is “touched in there” and the little annotation boxes show you the column name per node.

Nice!

Alright, given that I only had a few hours to look into it, I’m personally quite happy about the result here.

Seven “NICE” for the current major release of SAP HANA Studio (and there are more, like the on-demand loading of the Contents sub-folders, but I couldn’t really capture this with a screen shot 😀 ).

 

There you go, now you know!

  • Lars