Category Archives: SCN Blog

HANA in a pocket, a skull and some dirty hands on Linux

DISCLAIMER: this blog describes unsupported and unlicensed use of the SAP HANA software. I describe my personal try-out of a hardware/software combination. It’s not in any way meant to be used productively or recommended to use the software in the way I describe.

If you loose data, waste life-time, money, your good mood or damage anything while trying to do what I describe: that’s all on you and there’s no guarantee, warranty or liability whatsoever from SAP or myself.

The frustrating reality of ‘big machine computing’

Just like everyone else in the SAP HANA community, I really like having a private instance available for me.

Trying things out, practicing, researching solutions, working with sample data, checking system behaviour etc. are just so much easier than e.g. on a HCP instance or some “playground” instance on the company’s SAP HANA hardware.

Yes, there are always ways to work around this but e.g. for me this means I have to live with average round trip times of ca. 320ms. That’s certainly not too bad and definitively works for most use cases. Unfortunately the server is not “mine”.

So, if I e.g. wanted to reboot the server for any reason, well, folks probably wouldn’t be too happy …

Also, this server really is only available when I can logon into our corporate network in one way or another. And this turns out to be a limitation that is easily hit.

Customer visit with no immediate WIFI logon?

Nope, no way to access the SAP HANA system then.

Cable provider outage in the home office?

Yep, that SAP HANA server is running fine, far away – without any chance to be used by me.

In short a somewhat local SAP HANA installation would be a really nice thing. Especially since I am not permanently working with tons of data, but rather look into the development/admin aspect of things.

Is there a way or what?

A few weeks ago, I was on a customer visit in San Francisco and the development team there came up with a valid and interesting question: how to provision developers with private SAP HANA instances?

With a private instance, each developer could mess around with his system without impacting anyone else. This is pretty much a standard approach for development nowadays, so the request is rather reasonable.

SAP internally there exist multiple approaches to this problem.

From “small” SAP HANA hardware under the developer’s desk (actually quite expensive machines, that we would have called “workstations” back in the day) to spawn-on-demand-systems on virtual servers: there’s a flavor for everyone.

Unfortunately all that sort of screams: high initial investment.

Not great for smaller teams or individuals who would want to get deeper into SAP HANA development/administration without access to “allowed” hardware.

Coincidentally I read Mr. Atwood’s blog post and was intrigued: this Intel Skull Canyon NUC sounded awesome.

Even though I am not a gamer and the last time I assembled a PC myself was somewhat around 2000 this looked really interesting to me.

The system runs on an Intel i7 quad-core processor and allows up to 32GB of RAM.

That’s certainly enough to run and operate a small SAP HANA instance.

Here’s a picture of this nice piece of technology:

2016_07_04_14_56_37_988523

Cutting the story short

I ordered that thing online together with two 16 GB DDR4 strips and an old/slow but relatively cheap IBM SSD with 480GB capacity.

All in all the bill for this was around 1400 Australian Dollars – not too bad for a system way more powerful than my MacBook Pro.

Of course: this doesn’t come with any display or input device (well, in fact the online shop I ordered from had a special and chipped in a Logitech K400 plus for free 😉 . You have to have a monitor to use this thing.

 

A few days later the whole pack arrived and I was really surprised by the weight of it: easily below one kilogram.

I’ll skip the whole un-boxing and assembly part, but will tell you: it’s easy.

The RAM modules just need to be clicked into place and the SSD installation requires to loosen and tighten a single screw. Definitively doable for a hardware n00b like myself.

The more complicated part – again: for me, others might just do this without blinking – was to put an operating system onto the new machine.

I didn’t buy a Windows license, since the whole purpose of this buy was to eventually run SAP HANA on it, so I went for OpenSuse Linux (Leap Distribution). Creating a boot-able USB thumb drive is relatively easy.

The only changes to this tutorial I did was

  • using ‘pv‘ to get an idea on the progress of the data transfer (installed via ‘brew‘) – yes, I used a Mac for this.
  • using a blocksize of 1M instead of the 4k mentioned in the tutorial.

 

The whole command set then looked like this:

sudo zsh

pv /path/to/downloaded.iso | dd of=/dev/rdisk2 bs=1M

 

Armed with this USB thumb drive stuck into one of the 4 USB ports, I simply started my “skullbox” and the OpenSuse installation screen appeared.

Again, I’m not the Linux expert on the installation side of things, so I basically when with most of the default settings and clicked through.

One reboot after that, I had OpenSuse running on the new machine.

Meanwhile in the kitchen

As TV chefs would say “… meanwhile I prepared something else …” which translates here into the download of the SAP HANA installation files from SAP Service Marketplace (Yes, you do need to have paid access to this).

 

The installation files come in a big multi-part RAR archive with Windows Auto-Extraction, which means, that the first part is an .EXE file.Unpacking on Linux however is not a problem; all you need is the UNRAR tool (if all fails ‘zypper install http://download.opensuse.org/repositories/openSUSE:/Leap:/42.1:/NonFree/standard/x86_64/unrar-5.3.3-1.3.x86_64.rpm should fetch this).

 

The rest of the installation is fairly well documented, and only small additional changes were required to make the installation work on this NOT SUPPORTED operating system:

 

a) PNG Library couldn’t be loaded, and the hdblcmgui woudln’t start.

Fixed by installing an older version via zypper install –oldpackage libpng12-0′

 

b) nameserver couldn’t be started due to missing SSL libs.The trace file listed

Can’t load ‘/playfield/hana_inst/51050838/DATA_UNITS/HDB_SERVER_LINUX_X86_64/instruntime/SSLeay.so’ for module Net::SSLeay: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory at DynaLoader.pm line 193.”

 

Fixed by ‘zypper install libssl*’ and

ln -s /lib64/libcrypto.so.1.0.0 /usr/libcrypto.so.0.9.8

ln -s /lib64/libssl.so.1.0.0 /usr/lib64/libssl.so.0.9.8

 

(as seen here)After that the installation went through without further troubles.

The looooong fork

I have enabled the option for remote administration in OpenSuse, which means that the system now can be accessed via VNC.

For my Windows 10 based work laptop I find that MobaXTerm makes it really simple to access the Linux box – it even has a plugin that contains a VNC client.

For the Mac, there is actually a VNC client build in (I’m using El Capitan): just start “Screen Sharing” or put vnc://hostname:port into the Safari address bar and off you go…

 

After all of this (and some more wrestling with the Linux network setup – I still barely have a clue on that, but it seems to work now…)

I can now access my “HANA skullbox” via:

  • WIFI connection at homeoffice
  • drect CAT5 cable connection between my work laptop and the NUC

which are exactly the scenarios I wanted to support.

Starting a HANA Studio via X-Server works too:

2016_07_04_16_29_01_988524

But that’s neither fast nor pretty.

For working with the system I prefer using the local HANA Studio or the Web UI.

Of course I used ‘memory allocation limit’, but to have some fun, I cranked it up to the total 32 GB of installed RAM; so far no functional problems with this 😉

All of this setup happened after hours over maybe three evenings. Someone with more Linux know-how would probably be done in way less time.

So far I’m quite happy with this setup and I’m eager to try out what things work and where the hard limit of usability for this micro-HANA lies.

OK, that’s it for now.

AGAIN: this is a completely unsupported setup. No support, warranty, maintenance or endorsement is provided by SAP or myself for this.

There you go – now you know!

Lars

 

On multiple mistakes with IN conditions

Based on SAP HANA SPS 11

Update 1.6.2017

Retesting the APPLY_FILTER() approach on rev. 112.07 showed that it is now possible to simply hand APPLY_FILTER() an IN condition with a list of integers.
This approach for the workaround of the general limitation on multiple input parameter values for scripted calculation views is also shown in SAP note  “2315085 – Query with Multi-Value Parameter on Scripted Calculation View Fails with Incorrect Syntax Error.

Therefore, if you are using a current version of SAP HANA and still develop/use scripted calculation views, it is not recommended to use the CE-function approach I explained in this blog post. The limitation that this approach worked around has been fixed in SAP HANA.


Dear readers

there is a long-standing modelling problem with SAP HANA calculation views:

Using multiple input parameters to filter data similar to the SQL IN predicate.

This discussion Handling multi value input parameters can be taken as a comprehensive example.

It seems so straight forward at first and so practical.

Once the input parameter is defined the data preview tool built into SAP HANA Studio or your reporting client of choice can read the meta data for it and present the user with a nice UI dialog to specify values.

Something as fancy as this:

input_par_967338

Now, the way that this works is rather counter intuitive.

For graphical calculation views, there are a couple of nicely written blog posts available, like Using Multiple Values in Input parameter for filtering in Graphical Calculation View but it seems that scripted calculation views did simply not want to be as flexible.

For those, rather clunky (and not very well performing) solutions had to be built to make it possible at all, (see SAP HANA: Handling Dynamic Select Column List and Multiple values in input parameter or How to process and use multi-value input parameter in a scripted view in HANA)

Either the solution involved dynamic SQL or some form of parameter string mangling with loops and pseudo-dynamic temporary result set constructs.

Other approaches proposed to avoid the problem altogether and use multiple parameters (instead of one multi-valued parameter).

Developer arrogance driving solution finding…

The last time I read one of those discussions (yesterday) I thought:

 

“This cannot be the right solution. There must be some easier way to do it!”

So arrogance got the better of me – HA! It cannot be that difficult. (It’s so cheesy that for once Comic Sans is a fitting choice).

I dare to guess that nearly every developer had that feeling every now and then (if not, I would have a hard time finding a good explanation for so many drastically underestimated development efforts…)

Attacking the problem

My first impulse was to use the APPLY_FILTER() function, but I soon learned what many others probably discovered before: it doesn’t solve the problem.

The reason for that is the way APPLY_FILTER() works.

It takes the table variable and your filter string and constructs a new SQL statement.

For example, if your table variable is called vfact and your input parameter selection was 1, 2 and 5 your scripted calculation view could look like this:

/********* Begin Procedure Script ************/
BEGIN
  vfact = select * from fact;
  declare vfiltD10 nvarchar(50); -- this is a temp variable to construct the filter condition
  vfiltD10 = ' "DIM10" IN ( ' || :IP_DIM10 || ' )';
  var_out = APPLY_FILTER (:vfact, :vfiltD10);
END /********* End Procedure Script ************/

This compiles fine and if you try to run it with some parameters you are greeted with a surprise:

SELECT
     "DIM10", "DIM100", "DIM1000", "DIM1000000",
     "KF1", "KF2"
FROM "_SYS_BIC"."devTest/MULTIIP"
        ('PLACEHOLDER' = ('$$IP_DIM10$$','1,3,6')) ;

Could not execute ‘SELECT “DIM10”, “DIM100”, “DIM1000”, “DIM1000000”, “KF1”, “KF2” FROM “_SYS_BIC”.”devTest/MULTIIP” …’ in 373 ms 962 µs .

SAP DBTech JDBC: [2048]: column store error: search table error:  [2620] “_SYS_BIC”.”devTest/MULTIIP/proc”: [130] (range 2) InternalFatal exception: not a valid number string ‘1,3,6’

Not only is this error annoying, but it’s FATAL… shudder!

After some investigation, I found out that the input parameter not only provides the digits and the separating commas but also the enclosing single-quotes.

Nothing easier than getting rid of those:

  vfiltD10 = ' "DIM10" IN ( ' || replace (:IP_DIM10 , char(39), '')  || ' )';

With this, the single-quotes get easily removed (39 is the ASCII value for the single quotes and the CHAR function returns the character for the provided ASCII code – this just makes it easier to handle the double-triple-whatever-quotation syntax required when the single-quote character should be put into a string).

Of course, seeing that we have not yet reached the end of this blog post, you already know: that wasn’t the solution.

The problem here was not only the quotation marks but also that  SAP HANA does not parse the string for the input parameter value. The result for the filter variable is that we do not get the condition

  actual condition          ===> syntax structure

  “DIM10” IN ( 1, 3, 6)     ===> X IN ( c1, c2, c3)

but

  “DIM10” IN ( >’1, 3, 6′<) ===> X IN ( c1 )

So even when we remove the quotation marks, we still end up with just one value (I enclosed this single value in >’ ‘< for easier distinction).

Interlude

The different syntax structures pointed out above are easily overlooked also in standard SQL. Often developers do not fully realise that an IN condition with 3 parameters is structurally different from an IN condition with 2 or 4 parameters.

Whenever the number fo parameters of the IN condition changes, the statement is effectively a new statement to the database, requiring new parsing and optimisation and also allocating its own space in the shared SQL cache.

This is another detail that ABAP developers do not need to worry about, since the

SAP NetWeaver database interface gracefully splits up IN-lists into equal chunks and recombines the result set automatically. See this ancient piece SAP Support case “FOR ALL ENTRIES disaster” for more details.

One approach to avoid this issue can be to use temporary tables instead of the IN condition. Especially when parsing/query optimisation is taking a long time for your application, this might be an approach worthwhile to implement.

Back to the main topic though!

So, the “obvious” approach of using APPLY_FILTER() does not help in this case.

Is it possible that it is just not possible to take multiple input parameter values into an IN list? But graphical calculation views can do it – and rather easy.

And in this observation laid the key for the solution. What is different between graphical and scripted calculation views?

Right, graphical calculation views do not produce SQL for the boxes we draw up.

Technically speaking it replaces them with Plan Operators – very much similar to the abandoned CE_-functions.

Do you see where this is heading?

Yes, indeed. The solution I found works with CE_-functions.

Oh, how very heretic!

May the performance gods get angry with me for making the SAP HANA execution switch engines…

But first, let’s look at the solution, shall we?

/********* Begin Procedure Script ************/
BEGIN
     vfact = select * from fact;
  var_out = CE_PROJECTION(:vfact,
                     [ "DIM10", "DIM100", "DIM1000", "DIM1000000"
                     , "KF1", "KF2" ],
                      'IN ("DIM10", $$IP_DIM10$$)');
END /********* End Procedure Script ************/

Easy to see, this approach mimics the filter approach for graphical calculation views.

To not over complicate things I only used the CE_PROJECTION function for the filter part – everything else is still in efficient, familiar SQL.

Important to note is that this works only, when the input parameter is referenced with the $$<name>$$ format.

Also important to recall is that the complete filter expression needs to be provided as one string enclosed in single quotation marks ( ‘ <filter expression goes here> ‘ ).

OK!“, you may say, “this works, but now you broke the holy rule of CE_-functions damnation. The performance of this surely is way worse due to the implicit engine change!

Well, let’s have a look into this!

First the explain plan for the SQL based statement:

SELECT
     "DIM10","DIM100","DIM1000", "DIM1000000",
     "KF1","KF2"
FROM FACT
where DIM10 IN (1,3,6) ;

OPERATOR_NAME   OPERATOR_DETAILS                                         EXEC_ENGINE SUBTREE_COST

COLUMN SEARCH   FACT.DIM10, FACT.DIM100, FACT.DIM1000, FACT.DIM1000000,  COLUMN       1.645529062

                FACT.KF1, FACT.KF2                                                  

                (LATE MATERIALIZATION, OLTP SEARCH, ENUM_BY: CS_TABLE)              

  COLUMN TABLE  FILTER CONDITION:                                                   

                (ITAB_IN (DIM10))                                                   

                FACT.DIM10 = 1 OR FACT.DIM10 = 3 OR FACT.DIM10 = 6       COLUMN    

Now the scripted calculation view version:

SELECT
     "DIM10","DIM100","DIM1000", "DIM1000000",
     "KF1","KF2"
FROM "_SYS_BIC"."devTest/MULTIIP"
        ('PLACEHOLDER' = ('$$IP_DIM10$$','1,3,6')) ;

OPERATOR_NAME   OPERATOR_DETAILS                                         EXEC_ENGINE SUBTREE_COST

COLUMN SEARCH   FACT.DIM10, FACT.DIM100, FACT.DIM1000, FACT.DIM1000000,  COLUMN       1.645529062

                FACT.KF1, FACT.KF2                                                  

                (LATE MATERIALIZATION, OLTP SEARCH, ENUM_BY: CS_TABLE)              

  COLUMN TABLE  FILTER CONDITION:                                                   

                (ITAB_IN (DIM10))                                                   

                FACT.DIM10 = 1 OR FACT.DIM10 = 3 OR FACT.DIM10 = 6       COLUMN     

See any difference?

No?

That’s right, there is none. And yes, further investigation with PlanViz confirmed this.

SAP HANA tries to transform graphical calculation views and CE_-functions internally to SQL equivalents so that the SQL optimizer can be leveraged. This does not always work since the CE_-function are not always easy to map to a SQL equivalent, but a simple projection with a filter works just fine.

Now there you have it.

Efficient and nearly elegant IN condition filtering based on multiple input parameters.

There you go, now you know.

Have a great weekend everyone!

Lars

 

Modelling Learning Double Action or two things I just learned about modelling in SAP HANA SPS 11

Far fetched…

 

A colleague asked me over a year ago (2015 and SPS 9 … sounds ancient now, I know) whether it is possible to leverage information models in a different SAP HANA instance via SDA (Smart Data Access – look it up in the documentation if you didn’t know this yet).

The scenario in mind here was a SAP BW on HANA system reading data from a Suite on HANA system and using the SAP HANA live content (http://scn.sap.com/docs/DOC-59928, http://help.sap.com/hba) installed there.

The Open ODS feature of SAP BW on HANA was to be used here as it allows reading from tables and views exposed via SDA in the local SAP HANA instance.

Now this idea sounds splendid.

Instead of having to manually build an extractor or an data export database view (both of which can be extensive development efforts), why not simply reuse the ready made content of SAP HANA live for this?

As usual the proof of the pudding is in the eating and as soon as it was tried out a severe shortcoming was identified:


select * from "LARS"."IMACCESS_LBPB/SCV_USERS"
    ('PLACEHOLDER' = ('$$userNameFilter$$', 'USER_NAME= LARS'))
Could not execute 'select * from "LARS"."IMACCESS_LBPB/SCV_USERS"('PLACEHOLDER' = ('$$userNameFilter$$', 'USER_NAME= ...'
SAP DBTech JDBC: [7]: feature not supported:
Cannot use parameters on row table: IMACCESS_LBPB/SCV_USERS: line 1 col 22 (at pos 21)

BOOM!

I just created an Information Model similar to the ones provided with the SAP HANA Live content including the heavily used Input Parameters to enable the model to be flexible and reusable (and also to allow filter push-down) but SAP HANA tells me:

 

“Nope, I’m not doing this, because the PLACEHOLDER syntax only works for information views and not for ‘row tables’.”

This ‘row table’ part of the error message stems from the fact that SAP HANA SPS 9 showed SDA tables as row store tables. This also means that all data read from the SDA source gets temporarily stored in SAP HANA row store tables before further processed in the query.

One reason for doing that probably was that the mapping from ODBC row format to column store format (especially the data type mapping from other vendors DBMS) was easier to manage with the SAP HANA row store.

Having said that, when accessing another SAP HANA system, such format mapping surely should be no problem, right?

Right.

And in fact there is an option to change this: the parameter “virtual_table_format” in the “smart_data_access” section on of the indexserver.ini:

= Configuration

Name                     | Default

  indexserver.ini          |       

    smart_data_access      |       

     virtual_table_format  | auto 

This parameter can be set to ROW, COLUMN or AUTO (the SPS 11 default value, automatically using the right format depending on the SDA adapter capabilities).

For more on how “capabilities” influence the SDA adapter behavior, check the documentation.

Back last year I wasn’t aware of this parameter and so I couldn’t try and see if, after changing the parameter, the query would’ve worked.

Anyhow, like all good problems the question just popped up again and I had an opportunity to look into this topic once more.

“Smarter” at last…

And lo and behold, with SAP HANA SPS 11 the PLACEHOLDER syntax works like a charm even for virtual tables.


SELECT -- local execution ---
     "D10_VAL",
     "D100_VAL",
     sum("KF1") AS "KF1",
     sum("KF2") AS "KF2",
     sum("CC_KF1_FACTORED") AS "CC_KF1_FACTORED"
FROM "_SYS_BIC"."devTest/stupidFactView"
    ('PLACEHOLDER' = ('$$IP_FACTOR$$','34'))
WHERE "D10_VAL" = 'DimValue9'
and "D100_VAL" = 'DimValue55'
GROUP BY
     "D10_VAL",
     "D100_VAL";


/*

D10_VAL     D100_VAL    KF1         KF2         CC_KF1_FACTORED

DimValue9   DimValue55  -1320141.70 525307979   -44884817     

successfully executed in 352 ms 417 µs  (server processing time: 7 ms 385 µs)

successfully executed in 356 ms 581 µs  (server processing time: 8 ms 437 µs)

successfully executed in 350 ms 832 µs  (server processing time: 8 ms 88 µs)

OPERATOR_NAME       OPERATOR_DETAILS                                         EXECUTION_ENGINE

COLUMN SEARCH       ‘DimValue9’,

                     DIM1000.D100_VAL,

                     SUM(FACT.KF1),

                     SUM(FACT.KF2),

                     TO_BIGINT(TO_DECIMAL(SUM(FACT.KF1), 21, 2) * ’34’)

                     (LATE MATERIALIZATION, OLTP SEARCH, ENUM_BY: CS_JOIN)   COLUMN

  AGGREGATION       GROUPING:

                        DIM1000.VAL,

                    AGGREGATION:

                        SUM(FACT.KF1),

                        SUM(FACT.KF2)                                        COLUMN

    JOIN            JOIN CONDITION:

                    (INNER) FACT.DIM100 = DIM1000.ID,

                    (INNER) FACT.DIM10 = DIM10.ID                            COLUMN

      COLUMN TABLE                                                           COLUMN

      COLUMN TABLE  FILTER CONDITION: DIM1000.VAL = n’DimValue55′            COLUMN

      COLUMN TABLE  FILTER CONDITION: DIM10.VAL = n’DimValue9′               COLUMN

*/

See how the SPS 11 SQL optimisation is visible in the EXPLAIN PLAN: since the tables involved are rather small and only two dimensions are actually referenced, the OLAP engine (usually responsible for STAR SCHEMA queries) didn’t kick in, but the execution was completely done in the Join Engine.

Also notable: the calculated key figure was reformulated internally into a SQL expression AFTER the parameter value (34) was supplied.

This is a nice example for how SAP HANA does a lot of the query optimisation upon query execution.

If I had used a placeholder (question mark – ?) for the value instead, this whole statement would still work, but it would not have been optimised by the SQL optimizer and instead the calculation view would’ve been executed “as-is”.

Now the same statement accessing the “remote” view:

 


SELECT -- SDA access ---
     "D10_VAL",
     "D100_VAL",
     sum("KF1") AS "KF1",
     sum("KF2") AS "KF2",
     sum("CC_KF1_FACTORED") AS "CC_KF1_FACTORED"
FROM "DEVDUDE"."self_stupidFactView"
    ('PLACEHOLDER' = ('$$IP_FACTOR$$','34'))
WHERE "D10_VAL" = 'DimValue9'
and "D100_VAL" = 'DimValue55'
GROUP BY
     "D10_VAL",
     "D100_VAL";


/*

D10_VAL     D100_VAL    KF1         KF2         CC_KF1_FACTORED

DimValue9   DimValue55  -1320141.70 525307979   -44884817     

successfully executed in 351 ms 430 µs  (server processing time: 12 ms 417 µs)

successfully executed in 360 ms 272 µs  (server processing time: 11 ms 15 µs)

successfully executed in 359 ms 371 µs  (server processing time: 11 ms 914 µs)

OPERATOR_NAME           OPERATOR_DETAILS                                                       EXECUTION_ENGINE

COLUMN SEARCH           ‘DimValue9’, self_stupidFactView.D100_VAL,

                        SUM(self_stupidFactView.KF1),

                        SUM(self_stupidFactView.KF2),

                        SUM(self_stupidFactView.CC_KF1_FACTORED)

                        (LATE MATERIALIZATION, OLTP SEARCH, ENUM_BY: REMOTE_COLUMN_SCAN)       COLUMN

  COLUMN SEARCH         SUM(self_stupidFactView.KF1),

                        SUM(self_stupidFactView.KF2),

                        SUM(self_stupidFactView.CC_KF1_FACTORED),

                        self_stupidFactView.D100_VAL

                        (ENUM_BY: REMOTE_COLUMN_SCAN)                                          ROW

    REMOTE COLUMN SCAN  SELECT SUM(“self_stupidFactView”.”KF1″),

                        SUM(“self_stupidFactView”.”KF2″),

                        SUM(“self_stupidFactView”.”CC_KF1_FACTORED”),

                        “self_stupidFactView”.”D100_VAL”

                        FROM “_SYS_BIC”.”devTest/stupidFactView”

                            ( PLACEHOLDER.”$$IP_FACTOR$$” => ’34’ )  “self_stupidFactView”

                        WHERE “self_stupidFactView”.”D10_VAL” = ‘DimValue9’

                        AND “self_stupidFactView”.”D100_VAL” = ‘DimValue55’

                        GROUP BY “self_stupidFactView”.”D100_VAL”                               EXTERNAL

*/  

Because of the mentioned parameter setting, SAP HANA now can create a statement that can be send to the “remote” database to produce the wanted output.

Note how the statement in the REMOTE COLUMN SCAN is not exactly the statement we used: the aggregated columns are now the first in the statement and the parameter syntax used is the new “arrow”-style syntax (PLACEHOLDER.”$$<name> $$” => ‘<value>’). This nicely reveals how SDA actually rewrites the statement in order to get the best outcome depending on the source systems capabilities.

For a better overview on what happens in both scenarios please look at this piece of ASCII art in awe 🙂

|[ ]| = system boundaries

local statement execution

|[SQL statement ->    Information view -> Tables +]|

                                                  |

|[       RESULT < ——————————-+]|

SDA statement execution

|[SQL Statement -> Virtual Table -> SDA connection ->]| — ODBC transport –> |[ Information view -> Tables +]|

                                                                                                             |

|[       RESULT < ———————————–]| <– ODBC transport — |[–<  RESULT <—————+]|

For more on SDA, BW on HANA and how both work together have a look here:

And while there, don’t miss out on the other “new in SPS 11”- stuff (if not already familiar with it anyhow)

The Web, Stars and the importance of trying things out

For the question discussed above I of course needed to have a test setup ready.

Creating the SDA remote source was the easiest part here, as I just created a “self” source system (BW veterans will remember this approach) that simply pointed to the very same SAP HANA instance.

In order to emulate a proper SAP HANA live view I needed to create an Information model with Input Parameters, so I thought: easy, let’s just quickly build one in the Web based development workbench.

So far I’ve done most of the modelling in SAP HANA studio, so I took this opportunity to get a bit more familiar with the new generation of tools.

I wanted to build a classic Star-Schema-Query model, so that I could use the Star Join function.

From SAP HANA Studio I knew that this required calculation views of of FACT and DIMENSION to work.

Not a problem at all to create those.

factview_963510

A CUBE type view for the fact table

dimview_963511

One of the Dimension type views

I then went on and created a new calculation view of data type CUBE and checked the WITH STAR JOIN check box.

createstarview_963539

Next I tried to add all my FACT and DIMENSION views to the join, but boy was I wrong…

addtablestostarjoin_963542

Clicking on the ➕ button should allow to add the views.

nofactview_963540

But no option there to add the fact view into the STAR JOIN node – while adding dimension just worked fine:

dimviewssleect_963543

Now I had all my dimensions in place but no way to join them with fact table:

starjoinwithdimviews_963544

After some trial and error (and no, I didn’t read the documentation and I should have. But on the other hand, a little more guidance in the UI wouldn’t hurt either) I figured out that one has to manually add a projection or aggregation node that feeds into the Star Join:

add_aggregation_963546

Once this is done, the columns that should be visible in the Star join need to be mapped:

And NOW we can drag and drop the join lines between the different boxes in the Star Join editor.

mapping_aggr_963545

Be careful not to overlook that the fact table that just got added, might not be within the current window portion. In that case either zoom out with the [-] button or move the view around via mouse dragging or the arrow icons.

joiningfact_963547

After the joins are all defined (classic star schema, left outer join n:1, remember?) again the mapping of the output columns need to be done.

mappingkfs_963551

Here, map only the key figures, since the dimension columns are already available in the view output  anyhow as “shared columns“.

exposedcolumns_963552

For my test I further went on and added a calculated key figure that takes an Input Parameter to multiply one of the original key figures. So,nothing crazy about that, which is why I spare you the screen shot battle for this bit 😉 .

And that’s it again for today.

Two bits of new knowledge in one blog post, tons of screenshots and even ASCII art – not too bad for a Monday I’d say.

There you go, Now you know!

Lars

SAP HANA community alert 260416: new moderator aboard

Dear SAP HANA aficionados

 

If have not been living under a stone lately and visited the SAP HANA corners of SCN (SAP HANA and In-Memory Computing, SAP HANA Developer Center) recently you surely will have noticed that there is currently one particular person putting in effort, patience and willingness to help others in extraordinary measures.

Of course I am writing about Florian Pfeffer here.

 

Florian, as you may know by now, is not only a HDE but also had been SCN Member of the Month just this March.

He also managed to earn the ‘Super Answer Hero’ badge which showcases his commitment to the community.

So, it’s fair to say, that this star is flying high right now.

 

As my interest with SCN is in community development, I took the chance and asked if he would like to become a moderator, which Florian agreed to.

From now on, the SAP HANA and In-Memory Computing space has three permanent moderators assigned:

 

Once again, I would like to thank both Lucas and Florian for their engagement and also encourage others to step up and become more involved with SAP HANA and the community around it. There is always room for more high profile contributors!

 

Cheers,

Lars

 

*click* – *click* – *doubleclick* and nothing happens

Today’s tidbit is one of those little dumb things that happen every now and then and when I think: “Great, now this doesn’t work… WTF…?”

Usually that’s a bit frustrating for me as I like to think that I know how stuff works around here (here, meaning my work area, tools, etc.).

 

So here we go. Since the SAP HANA Studio is currently not “an area of strategic investment” and a the Web based tools are on the rise, I try to use those more often.

I even have the easy to remember user-friendly URL (http://<LongAndCrypticNodeName.SomeDomainname.Somethingelse>:<FourDigitPortNumber>/sap/hana/ide/catalog/) saved as a browser bookmark – ain’t I organized! 🙂

 

And this thing worked before.

I have used it.

So click on the link and logon to the instance and get this fancy “picture” (as my Dad would explain it to me –  everything that happens on the screen is a “picture”, which is really helpful during phone-based intra-family help-desking…):

 

2016_04_14_22_17_36_930047

Pic 1 – The starting ‘picture’, looking calm and peaceful… for now

 

Ok, the blocky colors are due to GIF file format limitation to 256 colors, but you should be able to see the important bits and pieces.

 

There is some hard to read error message, that I choose to ignore and click on the little blue SQL button and then … nothing happens.

I click again and again as if I cannot comprehend that the computer understood me the first time, but no amount of clicks yields to open the SQL editor.

What is going on?

Next step:

 

Do the PRO-thing…

… open Google Developer Tools

… delete session cookies and all the saved information.

… Logon again.

 

Lo and behold, besides the much longer loading time for the page, nothing changed.

 

Great. So what’s else is wrong? Did the last SAP HANA upgrade mess with the Web tools?

2_930049
Pic 2 – wild clicking on the button and visually enhanced error message indicating some bad thing

 

Luckily, that wasn’t it.

Somewhere in the back of my head I remembered, that I had a couple of browser extensions installed.

 

Now I know what you’re thinking: Of course it’s the browser extensions. That moron! Totally obvious.

What can I say? It wasn’t to me.

3_930060

Pic 3 – there’s the culprit, the root cause and trigger for hours of frustration

 

It just didn’t occur to me that e.g. the Wikiwand browser extension that I use to have the Wikipedia articles in a nicer layout would install a browser wide hook to the CTRL+CLICK event and that this would prevent the Web tools to sometimes not open.

After disabling this (there’s a settings page for this extension) the Web tools resumed proper function.

Good job!

 

So is the Wikiwand extension a bad thing? No, not at all. There are tons of other extensions that do the same.

 

While I would really like to demand back the precious hours of my life this little mishap took from me, I assume that this request would be a bit pointless.

To me, at least, this experience, leaves me with the insight, that I clearly thought to simplistic about the frontend technology we use today. Web browsers are incredible far from a standard environment and controlling what the end user finally sees is not easy (of really possible).

 

Ok, that’s my learning of the day.

 

Cheers,

Lars

 

p.s.

the error message “Could not restore tab since editor was not restorable” not only seems to be a tautology, but also had absolutely nothing to do with the problem in this case.

 

Things keep changing around here!

Fellow Community members, readers and otherwise interested!

Today I share a change in the organisation of this community.

Last week I returned the moderator and space editor privileges and the duties that come with those.

Lucas, who has done a awesome job as a moderator since he came aboard last year, will take it from here.

If this was a live audience I would ask for a round of applause for him – here instead I can only wish him the best.

As to why I took this decision (in case you’re wondering – this will take moment, so grab your coffee or ):

When we started the SAP HANA journey in 2010 internally and then 2011 with the global availability of the product, there was no community around it.

In fact, there was very little at all around it. Barely any documentation, nearly no SAP notes and only a few people that had a clue what this new thing did and how it did it.

Finding the right information, people who knew something and correct answers actually was hard.

I had some experience with SDN and sites like AskTom.com, so I knew that answering questions, explaining concepts and discussing specific questions can go a long way when it comes to “enablement” and knowledge sharing.

So, with the hope to spark active community participation I volunteered to do just that in 2011.

Fast forward five years on, there are now around 50 blog posts on SAP HANA, several topic leader batches, other communities (SAP internal JAM, stackoverflow) and a SAP Press book ‘under my belt’. I managed to talk a couple of SAP colleagues into spending their time here and provided answers as well as opinions to thousands of discussion threads. And let’s not forget the countless emails I wrote to colleagues, partners and people I don’t even know who asked me questions about HANA – only to be redirected to this very forum 🙂 .

And the community is thriving! There’s a good set of regulars (both SAP employees and externals) who share their expertise on a daily basis – just look at the current 12-month leadership board. Every week several blog posts and documents are created and shared.

Of course there’s more:

SAP notes on SAP HANA had been massively improved (thanks to Martin for his fabulous FAQ series), the SAP HANA docu. development team churned out guide after guide (Small surprises, well documented, a tip to the hat to Ralph Schroeder and the whole team) and the SAP HANA Academy brought video learning into the SAP HANA area – awesome!

In short, the community has grown and developed into a full fledged developer and admin community.

While I like to think that I had a share in that, I also think that my definition of “done” has been reached.

That bridge got built – let’s move on“, if you see what I mean.

In the wake of the upcoming conversion to 1DX the technical platform for this community will change and that is something that others will do better than I would. I assume that the SAP HANA product management and development teams might continue to provide more and more official material into via this channel. Also, there’s tons of high quality user generated that will be converted and available in the new forums.

All in all really nice times for anyone starting off with SAP HANA nowadays.

https://i0.wp.com/s2.quickmeme.com/img/4f/4f15ca326d661fd71177bcd909ec6023cc4b1688aee152ebb815e15a8fc380f8.jpg?w=474

And that’s all folks!

See you around.

Lars

 

HANA dude analyses Python error and this happened next… (plus: something with Andy Warhol)

Fresh out of the Christmas/New Year vacation I yesterday received this question from one of my colleagues:

Hi Lars

I am trying to set up a demo to show MDC at FKOM. I am trying to configure MDC via the Python script method – although I have an issue where I am missing a file – the error from the convertMDC.Py script says “no module named ConfigMgr.Py”

I remember you demoed this method at the Architecture Summit in Sydney – did you come across this issue?”

(He was talking about this nice event and the Multi Database Container feature of SAP HANA.)

Now as I was on my way to see the very nice Andy Warhol – Ai Weiwei exhibition currently on display in Melbourne, I couldn’t logon to my test system to look into the issue.

Due to my “root-cause-analysis reflex” from my stint in SAP support, I followed up today and was able to verify my initial hypothesis: there must be a search path setting not quite right.

These are the steps I took:

1. check the convertMDC.py file:

The first two lines in this Python file are


import os, time, sys, subprocess, getopt
import ConfigMgrPy

which is where the error message comes from.

Python is trying to load the listed modules but fails with the ConfigMgr module.

Checking the Python documentation I figured out that there are a couple of places Python looks for the modules.

One option to specify these places is the environment variable PYTHONPATH. Therefore I checked this variable on my system:

2. check PYTHONPATH environment variable:


>echo $PYTHONPATH
  /usr/sap/T07/SYS/global/hdb/custom/python_support:/usr/sap/T07/HDB07/exe/python_support:/usr/sap/T07/HDB07/dewdftzldc05:/usr/sap/T07/HDB07/exe:/usr/sap/T07/HDB07/exe/testscripts

Reformatting this string gave me this list:


/usr/sap/T07/SYS/global/hdb/custom/python_support
/usr/sap/T07/HDB07/exe/python_support
/usr/sap/T07/HDB07/dewdftzldc05
/usr/sap/T07/HDB07/exe
/usr/sap/T07/HDB07/exe/testscripts

Nice, but didn’t really gain a lot of insight here…

3. Checking the path from within Python:

In hindsight this should have been the first thing to do, but hey, I am thinking and learning on my feet here…


> python
Python 2.7.10 (sap:1, Jul  6 2015, 10:21:20)
[GCC 4.7.2 20130108 [gcc-4_7-branch revision 195014]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "/etc/pythonstart", line 7, in <module>
import readline
ImportError: No module named readline

I decided to ignore this error message. Not sure why this happens on my system!

From the Python documentation I knew that I need the sys module for further investigation, so I load this and print the current search path via sys.path:


>>> import sys
>>> sys.path
['', '/usr/sap/T07/SYS/global/hdb/custom/python_support', '/usr/sap/T07/HDB07/exe/python_support', '/usr/sap/T07/HDB07/dewdftzldc05', '/usr/sap/T07/HDB07/exe', '/usr/sap/T07/HDB07/exe/testscripts', '/usr/sap/T07/HDB07/exe/Python/lib/python27.zip', '/usr/sap/T07/HDB07/exe/Python/lib/python2.7', '/usr/sap/T07/HDB07/exe/Python/lib/python2.7/plat-linux2', '/usr/sap/T07/HDB07/exe/Python/lib/python2.7/lib-tk', '/usr/sap/T07/HDB07/exe/Python/lib/python2.7/lib-old', '/usr/sap/T07/HDB07/exe/Python/lib/python2.7/lib-dynload', '/usr/sap/T07/HDB07/exe/Python/lib/python2.7/site-packages']
>>>

Ok, so there we find all our PYTHONPATH folders as well as some other folders that seem to belong to the Python installation.

4. Load the missing module

Now I thought: “Hey, lets try and load this module manually!


>>> import ConfigMgr

Ok, no error message so far.

Then the following should work!


>>> dir (ConfigMgrPy)
['CUSTOMER', 'Configuration', 'ConfigurationException', 'HOST', 'LayeredConfiguration', 'READONLY', 'RemoteConfig', '__doc__', '__file__', '__name__', '__package__', 'createUUID', 'expandProfileVars', 'getDaemonInfo', 'getLinkInfo', 'getRootPath', 'sapgparam', 'stringToBool']

Geeze, I am such a great Python hacker… *cough* 😐

But where does the module come from?

Easy as pie:


>>> ConfigMgrPy
<module 'ConfigMgrPy' from '/usr/sap/T07/HDB07/exe/ConfigMgrPy.so'>

5. ALL WRONG – this only works on a system where you DON’T have the problem!

Clearly, my fancy schmanzy root cause analysis would not have helped my colleague, because on his system Python didn’t find the module in the first place.

So what to do in this case?

I’d say, looking for the ConfigMgrPy file or folder or whatever Python loads for modules might be a good idea.

Since there is no hint in the Python documentation that ConfigMgr belongs to the standard Python stuff, I have to assume that it belongs to SAP HANA.

Which limits the search area for the object to the SAP HANA installation folders.

On my system that would be the /usr/sap/<SID>/ folder.

Searching for files on Linux always makes me google for examples and this is what I came up with this time:


find -L /usr/sap/T07 -name 'ConfigMgr*'

So far so good, but what’s the -L for?” you might ask.

When trying this thing out on my system, I didn’t use the -L immediately and found… nothing at all.

Only after trying a couple of name pattern that definitively should have yielded some hits it dawned on me that there might be something else preventing the find program to find the files.

And in fact, find does not follow symbolic links in files systems by default.

Looking into the installation folder on my system I can see that symbolic links are what we use for SAP HANA:


ls -la /usr/sap/T07
total 16
drwxr-xr-x  4 t07adm sapsys 4096 Apr 15  2015 .
drwxr-xr-x 17 root   sapsys 4096 Dec 10 18:30 ..
lrwxrwxrwx  1 t07adm sapsys   22 Apr 15  2015 HDB07 -> /hana/shared/T07/HDB07
[...]

Means: without the -L option, find doesn’t go down this path.

Having figured this out, find of course gave me what I was looking for:


find -L /usr/sap/T07 -name 'ConfigMgr*'
/usr/sap/T07/SYS/exe/hdb/ConfigMgrPy.so
/usr/sap/T07/HDB07/exe/ConfigMgrPy.so

And with this information my colleague could have modified the PYTHONPATH variable on his system to fix the problem (which he wrote me is what he has done meanwhile anyhow…).

There you go – now you know.

Happy New Year everyone and have a great 2016!

 

Changes, News and Annoucement

Dear all

Since there had been quite big changes happening to me (which will affect my engagement here on SCN) I thought I pen down a few lines about that.

Relocation (again)

Looks like I am not the person to stay put in one place for a long time.

But after my move from Austria to Germany in 2013 it was time for me to pack up everything again and board the plane with a one-way ticket to stunning Melbourne, Australia.

Besides the obvious change in my awake-and-working-time zone there’s this view when having a walk during lunch break:

😎 I love that!

My new (corporate) home: Custom Development

This relocation happened with the support and endorsement of SAP, the company I’ve been working for the last 12 years.

My current job here is actually not in SAP HANA development or support or anything like that, but on the other side of the solution building process.

SAP Custom Development is a software development organization within SAP. Sort of a sub-marine team 🙂

Where the large SAP standard software development builds highly standardized, reusable and to-be-configured solution platforms, we’re dealing with the customer requirements that need a different kind of care.

We do the customer specific innovation and deliver software just for that single customer.

Just like a tailer-made suite or a bespoke pair of shoes.

This is very different from the standard business but extremely exciting as we get to look into all the stuff our customers are interested in that is just not possible with the standard solution. We get to use the latest and greatest platforms and processes from SAP – everything from design thinking, over Fiori or Big Data IoT technology. And we go back to the colleagues from standard development to align with them on features and road maps.

The outcome is high quality software that does exactly what you need and fits seamless into your SAP/IT landscape including support and maintenance.

I guess it’s obvious why I find this interesting.

Come and see me!

While I likely will have less time now to answer a lot of questions in the SAP HANA spaces (SAP HANA and In-Memory Computing, SAP HANA Developer Center) I don’t give up on knowledge sharing.

As TechEd’s are happening a bit far off from the Australian continent, there’s going to be a smaller more local event, that still covers the important topics and speakers.

I’ll have the pleasure to step in line with some awesome presenters (Bjoern Goerke, Thorsten Franz or Rich Heilman to just drop a few names) and will deliver some lectures.

If you happen to be in Sydney around 24/25. November 2015 and you’re interested to meet and talk to all those famous folks 🙂 follow the link and register!

 

Ok, that’s for today.

Enjoy your weekend folks!

Lars

Fiddle riddle – Amazing: stuff gets faster and takes less space!

In Quick notes during a fiddle-session I played around with a table that contained all possible values for a date column.

To fill it I used a FOR LOOP with an INSERT and a commit every 10000 records.

That took 9:37 minutes for 3.652.061 records.

That’s terrible performance and nothing that needs to be accepted with SAP HANA!

I found a way to do the same much quicker:

Statement ‘insert into rmanydates (select * from all_days)’

successfully executed in 6.120 seconds  (server processing time: 4.630 seconds) – Rows Affected: 3652060 ;

Question 1:

The dates inserted above have not been pre-computed in this example.

How do I do this?

——–

Answer 1

 

The 6,120 seconds to create every possible date once and insert it into the row table is a lot faster than the one-by-one looping approach I took in the procedure (manly for clarity purposes).

But, as Fernando Da Ros demonstrated, there are other options to do that.

The one I chose was part of a rather new feature in SAP HANA: SAP HANA Series

SELECT to_date(GENERATED_PERIOD_START)
    FROM SERIES_GENERATE_TIMESTAMP('INTERVAL 1 DAY', '01.01.0001', '31.12.9999')

——–

I also looked at the memory consumption for the stored date tuples.

In the blog post I just used a row store table, because I thought: well, there’s not going to be any column store compression anyhow.

(you know, because column store compression mainly builds on compressing duplicate column values. But with every possible date once, there are no duplicates, so no compression – or is there?)

However, I managed to get the data loaded into a column store table and use a lot less memory.

Question 2:

How to get from

 

BEFORE

— General —

Total Memory Consumption (KB): 25.719

Number of Entries: 3.652.061

Size on Disk (KB): 23.152

Memory Consumption in Main Storage (KB): 25.700

Memory Consumption in Delta Storage (KB): 19

Estimated Maximum Memory Consumption (KB): 25.719

to

AFTER

— General —

Total Memory Consumption (KB): 1.645

Number of Entries: 3.652.061

Size on Disk (KB): 12.912

Memory Consumption in Main Storage (KB): 1.626

Memory Consumption in Delta Storage (KB): 19

Estimated Maximum Memory Consumption (KB): 1.645

Both tables were fully loaded when the space consumption was analyzed.

if you know how that works, put your answers into the comments section! 😀

——–

Answer 2

This one is again a feature from SAP HANA Series: the Series Table,

 

create column table smanydates (dday date)
SERIES (  EQUIDISTANT INCREMENT BY INTERVAL 1 DAY
     MINVALUE '01.01.0001'
     MAXVALUE '31.12.9999'
     PERIOD FOR SERIES (dDAY));

The above CREATE TABLE statements, specifies that the timestamp information in column DDAY are not explicitly stored.

Instead the internal storage is merely a calculation formula that “knows” how every date between MIN/MAXVALUE with the granularity of 1 DAY can be computed.

Instead of storing a full date now, only a number uniquely identifying a day needs to be stored.

Given the starting date ‘01.01.0001’ the current day would then be stored just as the integer 735.832 (20 bits).

Of course there are also limitations to this type of data storage but for ever increasing regular date or timestamp information, this is really an option to save a lot of memory.

For more information around SAP HANA series please check

——–

– Lars

Quick notes during a fiddle-session

For a while now I am working as a Solution Architect for the SAP Custom Development organisation.

This is the team you’d call when you want custom-built solutions that perfectly fit into your standard solution landscape.

My job there is not coding or implementing systems, but rather designing Solutions and turning those designs into commercial offers.

Obviously, it seems like a good idea to every now and then go and check that what I remember of doing “real work” is actually still true. and so I try to keep up with our tools and platforms as well as anyone.

While trying out different stuff for something that might become a blog post, later on, I came across some things I considered noteworthy, “interesting” or otherwise out of the expected.

All this is based on SAP HANA rev. 101 and SAP HANA Studio 2.1.6.

Change of default statement preparation in SAP HANA Studio

In Issue: MDX Query not executing in Hana Studio 2.0.8 Version and Re: Regarding “Generate Time Data” Option in HANA folks complained about the fact that MDX statements suddenly started to return error messages even though the worked in older SAP HANA Studio versions.

The error message was:

SAP DBTech JDBC: [2]: general error: MDX query doesn’t support prepared statement

I first proposed to circumvent this by creating a little wrapper procedure but this is inconvenient.

Shady Shen provided the correct hint: un-check the checkbox “Prepare statements before execution”.

This seems to have been set to active by default in the more recent versions of SAP HANA Studio. Once again a change that doesn’t occur in any documentation or whats-new note… 🙁

At least it’s not necessary to close/re-open the SQL editor to get this changed setting active.

No DEFAULT value specification in INSERT/UPDATE/UPSERT possible

Let’s say you were diligent with your table column specification and not only provided a name and data type but also a NOT NULL constraint and a DEFAULT value clause.

Something as extravagant as this:


create table test (id int not null default 0);


Now you want to make use of this and by following SQL standard you use the DEFAULT keyword to specify that you want to have the pre-defined default value:


insert into test (id) values (default);


SAP DBTech JDBC: [260]: invalid column name: DEFAULT: line 1 col 31 (at pos 30)

This is not to say SAP HANA doesn’t support default values.

If you want to use default values, you mustn’t include them in the DML statement at all.

So the following works without problems:


alter table test add (id2 int not null default 1);
insert into test (id) values (1);
select * from test;


ID ID2
1 1

Would be lovely to find such nuances in the documentation.

Limits of date operations

When working with date data types it’s real easy to forget about the limitations of this data type.

One example could be that non-existing dates like the February 30 or dates between 05.10.1582 and 14.10.1582 are rejected or mapped to the correct dates.

Another limitation I actually ran into while generating random garbage data for a date column is the smallest/largest date.

This time the documentation is clear about this: The range of the date value is between 0001-01-01 and 9999-12-31.

Usually I wouldn’t think much about this, but when using expressions like the following to create random dates, I hit an error:


select add_days (date'01.01.1980', rand()*10000000) from dummy;


[303]: invalid DATE, TIME or TIMESTAMP value: internal error in AddDays() at function add_days()

In my example, the 10000000 was, in fact, a sequence number in a rather large data set.

So, the quick fix would be to set an upper limit for the value that is fed into add_days().


select add_days (date'01.01.1980', least(rand()*10000000, 3560) from dummy;


The least() function comes in real handy here at it makes sure that the largest possible value returned will be 3560 (roughly 10 years worth of days).

Together with its sister the greatest() function, it’s easy to create upper and lower limits for generated values.

But, what if we don’t want to arbitrarily set the limit below what would be possible with this data type?

Easy too:


select days_between (date'31.12.9999', current_date) from dummy;


DAYS_BETWEEN(31.12.9999,CURRENT_DATE)

-2916233

Gives you the number of days between today and the end of times – huh… gotta hurry!

Admittedly this example is super artificial and has no connection to real-life dates.

However, it shows that there are limitations to the date data type in SQL that it’s not too difficult to step into them.

There are of course other examples where a richer semantic for this data type would be useful.

Being able to have a MIN_DATE or MAX_DATE entry that would explicitly show up like this would be

Wandering off…

The date topic side-tracked me at this point and I looked into some other points, too:

I’ve shown the “generator query”-trick (cross join of objects system table to generate millions of rows) before.

This is of course not the only way to create a lot of rows.

Another option is to write SQL Script and this was the first time I used the new SPS 10 capability to run SQL script directly in the SQL console without the need for a wrapper procedure:


create table manydates (dday date);
do
begin
    declare i bigint;
    for i in 0 .. 3652060 do
        insert into manydates (dday) values (add_days(date'01.01.0001', :i) );
        if (mod(i, 10000) = 0 )
        then
            commit;
        end if;
   
    end for;
end


This little gem runs for a while and fills a table with all possible date values.

Statement ‘do begin declare i bigint; for i in 0 .. 3652060 do insert into manydates (dday) values …’

successfully executed in 9:37.694 minutes  (server processing time: 9:37.681 minutes) – Rows Affected: 0

Now, about we claimed that no “no-dates” entries would be created.

And checking for duplicates via


select count(distinct dday) from manydates;


and


select dday, count(*) from manydates
group by dday having count(*) >1;

show that there aren’t any.

But how’s this?

We are looking at how many years worth of days here? 9999 exactly.

So, leaving out leap years we should end up with 365 * 9999 = 3.649.635 separate days.

That number is 2.426 days short of what we actually find in our table.

Alright, so then just taking 366 days per year will help us, maybe?

Nearly: 3.659.634 days result here, which is 7.573 days too many.

Maybe trying to account just for the leap years separately can help here.

Leap years occur roughly every four years, so for 9999/4 = 2.499 years, we should add one day.

That gets us to 3.652.134, which is just 73 days off from the actual number of days in this dates table.

We can even check this in SQL again:


select count(distinct year(dday))  from manydates
where mod(to_integer(year(dday)), 4) = 0;

COUNT(DISTINCT YEAR(DDAY) )

2499

But what leap years didn’t we cover with our simple arithmetic?


(select year(dday), count(dday)
    from manydates
    where  mod(to_integer(year(dday)), 4) = 0
    group by year(dday))
     
EXCEPT
(select year(dday),  count(dday)
    from manydates
    group by year(dday) having count(dday) > 365
)
order by year(dday);

Fetched 63 row(s) in 21 ms 617 µs

YEAR(DDAY) COUNT(DDAY)
1700 365
1800 365
1900 365
2100 365
2200 365
2300 365
2500 365
2600 365
2700 365
2900 365
3000 365
3100 365
3300 365
3400 365
3500 365
3700 365
3800 365
3900 365
4100 365
4200 365
4300 365
4500 365
4600 365
4700 365
4900 365
5000 365
5100 365
5300 365
5400 365
5500 365
5700 365
5800 365
5900 365
6100 365
6200 365
6300 365
6500 365
6600 365
6700 365
6900 365
7000 365
7100 365
7300 365
7400 365
7500 365
7700 365
7800 365
7900 365
8100 365
8200 365
8300 365
8500 365
8600 365
8700 365
8900 365
9000 365
9100 365
9300 365
9400 365
9500 365
9700 365
9800 365
9900 365

If you followed up to here, you’ll notice, that we were looking for 73 “missing days”.

Comparing the set of years where the number of the year is divisible by 4 without remainder and the set of years where the number of days is larger than 365 revealed 63 days.

So, where are the 10 last days here?

Remember that odd reference on the Gregorian Calendar reformation?

That took ten days away in 1582!


select count(distinct dday) from manydates
where year(dday) = 1582;

COUNT(DISTINCT DDAY)

355

So far, so good.

We found all the “missing records”, but why aren’t these counted as leap years anyhow?

That’s because I used a too simple rule here.

Leap Year nearly every 4 years explains that a leap year is only counted as such if

  • The year is evenly divisible by 4;
  • If the year can be evenly divided by 100, it is NOT a leap year, unless;
  • The year is also evenly divisible by 400. Then it is a leap year.

Let’s run that against our dates table:


select year(dday) yr, count(dday) day_cnt
from manydates
where
       mod(to_integer(year(dday)), 4) = 0
       and
       not (mod(to_integer(year(dday)), 100) = 0)
      or
        mod(to_integer(year(dday)), 4) = 0
        and
        mod(to_integer(year(dday)), 400) = 0
group by year(dday)
;

Fetched 2424 row(s) in 123 ms 147 µs (server processing time: 1 ms 665 µs)

Alright – the rule gives us 12 years less than what is in our dates table.

Which ones?


select mod_y.yr mod_yr, long_y.yr long_yr
from
  (select year(dday) yr, count(dday) day_cnt
  from manydates
  where
        mod(to_integer(year(dday)), 4) = 0
        and
        not (mod(to_integer(year(dday)), 100) = 0)
       or
         mod(to_integer(year(dday)), 4) = 0
         and
         mod(to_integer(year(dday)), 400) = 0
        group by year(dday)
  ) mod_y
     
FULL OUTER JOIN
  (select year(dday) yr,  count(dday) day_cnt
     from manydates
     group by year(dday) having count(dday) > 365
  ) long_y
  on mod_y.yr = long_y.yr
WHERE
   mod_y.yr is null or long_y.yr is null;

MOD_YR LONG_YR
? 100
? 500
? 900
? 1300
? 200
? 600
? 1000
? 1400
? 300
? 700
? 1100
? 1500

Can you spot the reason?

All of those records violate the rule of not being divisible by 100 except when they are divisible by 400.

Why had those years been generated with one day too many by our add_days() function?

The Gregorian Calendar Reformation clearly did change history but it didn’t change the past 🙂

Before this regulation got active, the leap years didn’t follow those rules (which called for the Reformation in the first place).

And as we see there are no false leap years after the year 1500.

That’s it for today.

Funny things come up when one spends some hours on a rainy Sunday afternoon fiddling with SAP HANA SQL.

There you go, now you know!

– Lars