Tag Archives: data modeling

Conditionally unique

A common requirement for SQL DB modeling is to have something that could be called conditional unique constraint.
The idea is that certain combinations of column values should only occur once if at all, while other combinations are allowed to occur in any number.

That certainly sounds a bit theoretic, so let’s have an example.
Say we have a table STUFF in which we track the status of, well, stuff.
Each individual item of stuff can be identified by its ID value.
For every entry we store a STATUS value and a transaction number TX_SEQ.
Now, the rule is that any item can have any status value multiple times but status 100 should only ever occur once per item. Let’s pretend status 100 represents the initial creation of the item or something similarly unique.

The common unique constraint cannot be a solution for this, because it would require all combinations of ID and STATUS to be unique.

Several RDBMS offer check constraints for this kind of model, but SAP HANA currently (HANA 2 SPS03) only support simple search conditions for those and no sub-queries.

Trigger action

If you’ve been reading any of my comments and posts including triggers, you probably noticed that I try to avoid using them whenever possible.
Tom Kyte of Oracle made really good arguments against triggers https://blogs.oracle.com/oraclemagazine/the-trouble-with-triggers and these hold true on other DBMS.

For the conditional unique constraint on SAP HANA, however, using triggers is one possible solution.

Let’s have a table

create column table stuff 
       (id integer, status integer, tx_seq integer);

Insert some data

create sequence tx_seq; 

insert into stuff values (1, 10, tx_seq.nextval);
insert into stuff values (1, 100, tx_seq.nextval);  -- <- final state = 100

insert into stuff values (2, 10, tx_seq.nextval);
insert into stuff values (2, 10, tx_seq.nextval);  -- <- duplicate state = 10

insert into stuff values (1, 80, tx_seq.nextval); 

Create the trigger

The trigger should ensure that the data in the table cannot be changed so that the “conditional unique” rule is broken. That means it has to run before anything is written into the table. It also needs to run for each and every record for both UPDATE and INSERT commands.
When the trigger finds that the table already contains an entry for the current ID and status 100 we want a proper error message for this.
The trigger for that could look like the following:

create trigger stuff_unique_status
    before update or insert  
    on stuff
    referencing new row as new
    for each row
begin
declare cnt integer;
declare non_unique_final_status condition for SQL_ERROR_CODE 10001;
declare exit handler for non_unique_final_status resignal;
   
    -- only check if status is 100
    if :new.status = 100 then
    
        select count(*) into cnt 
        from stuff
        where status = 100
        and id = :new.id;
    
        if :cnt > 0 then
            SIGNAL non_unique_final_status 
                   set message_text ='Final status 100 must be unique per ID. Affected ID: '||:new.id;
        end if;
    
    end if;
end;

Try out if the trigger works

Let’s insert data that would violate the condition:

insert into stuff values (1, 100, tx_seq.nextval);

/*
Could not execute 'insert into stuff values (1, 100, tx_seq.nextval)' in 53 ms 755 µs . 

SAP DBTech JDBC: [10001]: user-defined error: "DEVDUDE"."STUFF_UNIQUE_STATUS": line 5 col 50 (at pos 274): 
[10001] (range 3) user-defined error exception: Final status 100 must be unique per ID. Affected ID:1 
*/

The same happens if we try to update data so that a duplicate would occur:

update stuff 
    set status = 100 
    where id = 1
    and tx_seq = 1;
/*
Could not execute 'update stuff set status = 100 where id =1 and tx_seq=1' in 37 ms 66 µs . 

SAP DBTech JDBC: [10001]: user-defined error: "DEVDUDE"."STUFF_UNIQUE_STATUS": line 5 col 50 (at pos 274): 
[10001] (range 3) user-defined error exception: Final status 100 must be unique per ID. Affected ID:1 
*/

We see that it paid off to create a custom condition and to include the ID value for which the exception condition was raised. It’s easy not to go back and review what caused the faulty data insert.

What about performance ?

When we have the DBMS run code for every record that gets modified in the table, that of course leads to longer processing times.

In order to get a feeling for this difference, let’s just insert some more data into our table and see how long it takes. Then we repeat the insert with a table with no trigger on it.

 -- some data  
 insert into stuff 
    (select top 100000
             row_number() over()
           , to_integer(floor (rand()*100) )
           , tx_seq.nextval
      from objects cross join objects );

/*
Statement 'insert into stuff (select top 100000 row_number() over() , to_integer(floor (rand()*100) ) , ...' 
successfully executed in 29.261 seconds  (server processing time: 29.259 seconds) - Rows Affected: 100000
*/

compared to

create column table stuff2 
    (id integer, status integer, tx_seq integer);

 insert into stuff2
    (select top 100000
             row_number() over()
           , to_integer(floor (rand()*100) )
           , tx_seq.nextval
      from objects cross join objects );
/*
Statement 'insert into stuff2 (select top 100000 row_number() over() , to_integer(floor (rand()*100) ) , ...' 
successfully executed in 6.071 seconds  (server processing time: 6.070 seconds) - Rows Affected: 100000 
*/

That’s 6 seconds vs. 29 seconds with the trigger; quite a difference and if you’re doing mass data loading or high-speed transaction capturing it might not be viable to keep the trigger active while loading the data.

What about adding an index?

Having a trigger active on a table adds runtime in itself and there does not seem to be a way around this. Even if I only load records with a status different from 100, the trigger is executed (to check whether or not the status is 100).

We can, however, allow for a potential speedup of the COUNT query by adding an index.
Since the query always targets one specific combination of ID and STATUS in an EQUAL comparison, I use the inverted hash index type here to safe memory.

create inverted hash index unique_final_status  
        on stuff (id, status);

Running the aggregate query in PlanViz confirms that the index is used for this query:

Name: Basic Predicate
ID: cs_plan2_hxehost_39003_qoTableSearch1_qoTableSearchNode1
Summary: (ID,STATUS) = (2500,100)
Schema: DEVDUDE
Execution Time (Inclusive): 5.925 ms
Execution Time (Exclusive): 5.925 ms
Execution Start Time: 7,127.854 ms
Execution End Time: 7,133.791 ms
Estimated Result Size: 1
Estimation Time: 0.011 ms
Evaluation Time: 5.914 ms
Index Usage:: using INDEX ON (ID,STATUS)
Inverted Index: used

That’s it.

There you go, now you know!