Fiddle riddle – Amazing: stuff gets faster and takes less space!

In Quick notes during a fiddle-session I played around with a table that contained all possible values for a date column.

To fill it I used a FOR LOOP with an INSERT and a commit every 10000 records.

That took 9:37 minutes for 3.652.061 records.

That’s terrible performance and nothing that needs to be accepted with SAP HANA!

I found a way to do the same much quicker:

Statement ‘insert into rmanydates (select * from all_days)’

successfully executed in 6.120 seconds  (server processing time: 4.630 seconds) – Rows Affected: 3652060 ;

Question 1:

The dates inserted above have not been pre-computed in this example.

How do I do this?

——–

Answer 1

 

The 6,120 seconds to create every possible date once and insert it into the row table is a lot faster than the one-by-one looping approach I took in the procedure (manly for clarity purposes).

But, as Fernando Da Ros demonstrated, there are other options to do that.

The one I chose was part of a rather new feature in SAP HANA: SAP HANA Series

SELECT to_date(GENERATED_PERIOD_START)
    FROM SERIES_GENERATE_TIMESTAMP('INTERVAL 1 DAY', '01.01.0001', '31.12.9999')

——–

I also looked at the memory consumption for the stored date tuples.

In the blog post I just used a row store table, because I thought: well, there’s not going to be any column store compression anyhow.

(you know, because column store compression mainly builds on compressing duplicate column values. But with every possible date once, there are no duplicates, so no compression – or is there?)

However, I managed to get the data loaded into a column store table and use a lot less memory.

Question 2:

How to get from

 

BEFORE

— General —

Total Memory Consumption (KB): 25.719

Number of Entries: 3.652.061

Size on Disk (KB): 23.152

Memory Consumption in Main Storage (KB): 25.700

Memory Consumption in Delta Storage (KB): 19

Estimated Maximum Memory Consumption (KB): 25.719

to

AFTER

— General —

Total Memory Consumption (KB): 1.645

Number of Entries: 3.652.061

Size on Disk (KB): 12.912

Memory Consumption in Main Storage (KB): 1.626

Memory Consumption in Delta Storage (KB): 19

Estimated Maximum Memory Consumption (KB): 1.645

Both tables were fully loaded when the space consumption was analyzed.

if you know how that works, put your answers into the comments section! 😀

——–

Answer 2

This one is again a feature from SAP HANA Series: the Series Table,

 

create column table smanydates (dday date)
SERIES (  EQUIDISTANT INCREMENT BY INTERVAL 1 DAY
     MINVALUE '01.01.0001'
     MAXVALUE '31.12.9999'
     PERIOD FOR SERIES (dDAY));

The above CREATE TABLE statements, specifies that the timestamp information in column DDAY are not explicitly stored.

Instead the internal storage is merely a calculation formula that “knows” how every date between MIN/MAXVALUE with the granularity of 1 DAY can be computed.

Instead of storing a full date now, only a number uniquely identifying a day needs to be stored.

Given the starting date ‘01.01.0001’ the current day would then be stored just as the integer 735.832 (20 bits).

Of course there are also limitations to this type of data storage but for ever increasing regular date or timestamp information, this is really an option to save a lot of memory.

For more information around SAP HANA series please check

——–

– Lars

Leave a Reply