Verification Speed….

I recently had to handle a case where a big and important system was rendered unusable by database corruptions. Due to some bad luck and faulty procedures, there was no good backup available. Even worse: the corruption had been detected via transaction abort in the SAP.

This case reminded me of the importance to perform regular consistency checks on the database – AT LEAST ONCE IN THE BACKUP CYCLE!

Why is this so important? Well, as soon as there is no backup available any more than a possibly detected corruption may be hard or impossible to solve.

Anyhow: many customers don’t do consistency checks and the main argument is, that it puts too much additional pressure on the system.

Therefore I decided to go for a quick test. What option of block consistency checks to the least harm to the system?

I tested two methods: dbverify, the old standard approach and rman validate check. The test machine is rather untypical: my Laptop (Pentium M@1.6 GHz, WinXP). To get realistic results for your environment just go ahead and perform the tests yourself.

With the recent version of the BRTOOLS (see note 1016173) both approaches are supported so it may be interesting to know which to choose.

The test case is build up by having one session doing some database work that is both I/O and CPU intensive (see scripts below). In a second command prompt, I started the database verification and compared the timings. These are the results:

                      no check     dbv     rman
      script duration     8:48    9:31     10:34
verification duration        -    1:42      5:32

In terms of runtimes of course not checking anything is the fastest option.
The second place goes to the verification with dbv as well for the script duration as for the check itself.
Last is the verification with rman.

So you should go for dbv, shouldn’t you? It’s unfortunately not that easy.

DBV always has to be executed as an additional operation and reports also errors on blocks that are currently not used by the database and thus won’t do any harm.
The reporting facility of DBV is rather weak – if there are several corrupt blocks found, it can get very uncomfortable to create an overview of affected segments.

RMAN, on the other hand, provides easy to query from database views that make it a “walk in the garden” to query views like V$COPY_CORRUPTION, V$BACKUP_CORRUPTION and V$DATABASE_BLOCK_CORRUPTION.
The downside is – beside the longer runtime – the fact that RMAN needs the DB_BLOCK_CHECKSUM parameter to be set to (at least) TRUE and that it only checks blocks that already got a checksum. If the database got created on Oracle 10g this is always the case but for databases with lower start releases, there still can be blocks without checksums.
The major advantage of RMAN is that it can perform the checks when backups are taken on the fly. That does not make restore/recovery tests with subsequent consistency checks superfluous but it gives a higher security level about the blocks read for the backup.
Here is a similar script like the one I used:

set timing on
create table test as
select a.object_type||a.owner||a.object_name as DATA
from dba_objects a, dba_objects b where rownum <=100000;
alter table test add (data2 char(200));
update test set data2=data;
create index t1 on test (data);
create bitmap index t2 on test (data2);
exec dbms_stats.gather_table_stats(user, 'TEST');
drop table test;
For the verification runs I used
brbackup -c force -w only_dbv

and

brbackup -c force -w only_rmv

Have fun trying it out on your systems!

Best regards,

Lars

Get me my data back! QUICK!

Ok, so your data is gone. You/the report you ran executed a DELETE statement and performed a COMMIT. What to do now?

It is the productive system. So there is no place to discuss that it may have been a great idea to test the report first here. What’s needed is a solution.

The overall solution: point-in-time (p.i.t.) recovery of the database. But this has MANY disadvantages. Think of downtimes, think of work to be redone, think of all the other connected systems that won’t get set back in time. A whole bunch of inter-system-consistency problems will rise.

If you want to “undo” just the changes to the single table you perhaps may have a chance to get the system back to a consistent state again. The idea here is: there are several thousand tables in the SAP database. Only one of them got messed up, so the problem is rather local. Perhaps the solution can be local as well.

An often heard solution for this “one-table-recovery”-problem is: do a p.i.t.-recovery of the database to a second instance up to the time before the data got deleted, export the table from there and import the data back into the productive database. This procedure works (often) but is rather complicated and complex. This is a task of several hours.

We will try to reduce this time to a few minutes now.

We will try to create a new table with the same structure than our messed up table and copy all the data we can get from old data block versions into this new table. To do so we will utilize the ORACLE FLASHBACK QUERY functionality.

STEP ONE

Keep calm. Keep calm. Keep calm! One cannot overstate the importance of keeping his mind focused in a situation like this.

STEP TWO

Stop changing data on the table – right away! Best would be: stop changing in the database at all. To get the data back we will need the OLD block versions of the table, that is: the UNDO blocks of this table. If there is further work on the database – even if not on our table – it may overwrite the necessary UNDO blocks.

STEP THREE

a) Logon to the database as the SAP SCHEMA owner, e.g. SAPR3, SAPERP, SAPPRD

b) Create a copy of our table structure. I will call the table “AAA” for the sake simplicity here:

create table "AAA_COPY" as SELECT * FROM "AAA" where rownum<1;

c) Copy the data from the old block versions to the copy table. For this example let’s assume the data was deleted today at 13:45.

Get current date:

select to_char(trunc(sysdate), 'dd.mm.yyyy')  as time from dual;

TIME
----------
08.05.2007

Insert old data into copy table:

insert into "AAA_COPY" as select * from "AAA" as of timestamp to_timestamp('08.05.07 13:45:00', 'dd.mm.yy hh24:mi:ss');

If no error like ORA-1555 came up, then: cheer up!

d) VERY important: Don’t rename the copy table now to replace the original one. It currently just “looks” like the original one, but it is not similar to it. We defined no Constraints, no Defaultvalues no indexes. We’ve to copy over the data back into the original table.

Pay attention to the fact that this may lead to duplicate entries, depending on how keys for the specific table are generated.

If the original data was not deleted, but updated in a wrong way, then you will have to remove the wrong rows first or write a UPDATE statement to change the data back.

So as you can see, it is fairly easy to get back the accidentally deleted or changed data back.

Nevertheless this is not a silver bullet for all “user errors” concerning data in tables. Plus it has many constraints like: not useable on tables with LONG or LOB columns, depends on UNDO-information that might have already been deleted, no automatic consistency check from application side.

Nevertheless: it’s a quick and easy solution that seems to be used not too often, since it’s not so well known yet. Best thing to do to get used to this: try it out!

Use your test system, go and create a dummy table and fill it with data. Delete this data and get it back. Once you get used to it, it’s really no big deal when the next emergency call about deleted data comes in.

Best regards,

Lars

p.s.: Here is a link for further reading:

Oracle® Database Application Developer’s Guide – Developing Flashback Applications\  \

Things that happen in support messages that I don’t like… Part II

++ 07.03.2011 ++ UPDATE ++ UPDATE ++ UPDATE ++

The mentioned error message had been improved with Oracle 10.2.0.5 (“Things change for the better” or “How I enhanced the Oracle DBMS software”). Thus, the content of this blog post is only valid for older versions of the Oracle database software.

++ UPDATE ++ UPDATE ++ UPDATE ++

There are dozens of support messages on the BC-DB-ORA component each year that have been opened because of only one error message:

ORA-01113: file string needs media recovery

Usually customers see this error when trying to startup a db instance that was shut down “the hard way” (that is shutdown abort, a power outtage or something similar).

The next thing that is usually done is typing in

RECOVER DATABASE...

That could be the worst thing to do! Why?

Let’s have a look how Oracle describes this error:

Cause: An attempt was made to online or open a database with a file that is in need of media recovery.
Action: First apply media recovery to the file.

Huh? What is wrong here? Haven’t I just written that a recovery is not the right thing to do?
Yes, I did.

The problem with this error message is: it tells us only the half of the truth.
It’s true – if there is a file that needs recovery then this error will appear.
But in most of the cases when this error occurs there is no need to recover anything.

All that has to be done is to update the fileheader.
And this is done by

ALTER DATABASE DATAFILE '' END BACKUP;

Unfortunately a datafile in ONLINE BACKUP mode looks very much the same to the Oracle startup procedure as a restored datafile does.
Both have lower system change numbers (SCN) in the datafile header than the current SCN.
To catch up redo-log information is necessary, but only for the restored file.

The file in ONLINE BACKUP mode just needs to get the datafile header ‘unfreezed’ so that it will be updated again. That’s an action of some minutes if not only seconds.
But due to the misleading message text, people try the recovery over and over again. If something goes wrong with it, restore is tried and so times flies…

In fact I once (and only once!) fell for this misleading error and supported a customer in trying the recovery.
The backup/recovery setup at the customer side was not well tested… in the end the customer had to work the whole weekend to get his database online again.
(To my excuse I was quite new in database support back then!)

All that just because the Oracle developers decided to write the error-message as they did.
Why couldn’t they change it to something like this:

ORA-01113: file string needs media recovery or is in online backup mode

?
They would save so much time of customers with that…

KR Lars
p.s.
Of course SAP has a note for this error:
Go ahead and check this one: #4162 – Missing “end backup”

Things that happen in support messages that I don’t like…

After some years working in support solving messages everybody will have a list of things that do occur over and over again and that never seem to get better.
Today I will start with the most anoying one (at least to me):

No remote connection possible!

Imagine you experience a heart attack. What you usually need is somebody trained to help you in that situation and do at least the primary steps to save your life.
At best this person is a paramedic so you can be rather sure that you’re in good hands.
Now imagine you have this person at your front door, when the heart attack occurs but when he wants to enter the room where you’re lying at the floor you just tell him:
“You cannot enter! Our companys security policy denies access by people not working for us… Please tell me how I can save myself!”

Althought this situation might be overbooked it is basically what I (and many of my collegues) face every day.
Customers facing a complete system down situtation are not able to provide TELNET access to their database servers. They are just not prepared to do so.
Of course TELNET is not state of the art concerning security on connections over the internet, no question about that.
But it’s possible to make up something like logging on to a remote connection server via Windows Terminal Services (for example) and from there via SSL to the target machine. So there are options to make this access secure.

And even if customers make the (correct!) decision that for this special situation the potential risk of beeing hacked while the TELNET connection is open is – by far – less costly than having a whole shift of workers leaving early due to the system down situation, even than it’s often just not possible to use the connection.
The companys firewalls block the TELNET port, the SAP service connection is not maintained or (this is the real classic): people ignore the description how to set up such a service connection and forget to allow port 23 (TELNET) on their SAPROUTER. So the infamous “route permission denied” error is just right on its way.

My recommendation: make plans how support should access your systems when they are down!
Set the connections up.
Have them tested – just open a support message and ask for somebody to logon!
This may save you precious HOURS the next time the database won’t come up.

As there are plenty of SAP notes describing the various remote connections I won’t fail to list the most important ones:

#35010 – Service connections: Composite note (overview)

Specific connection types:
Telnet Connection: Note 37001
R/3 Support: Note 812732
SAP-DB Connection: Note 202344
WTS Connection: Note 605795

Official statement concerning SSH-connections:
#789026 – Remote Connection: SSH
For semi-automatic line-opening see:
#797124 – LOP – Line Opener Program

By the way: the connection types listed in the overview (35010) note are exclusive. Whatever remote access tool or technology you are using usually in your company – when it’s not on the list, support cannot use it. So better don’t waste time and ask for such things to be tried.

Best regards
Lars

We all hate the BR*TOOLS, don’t we?

We all hate the  BR*TOOLS, don’t we?

You won’t believe how often we in support get asked about if SAPDBA is available for use with Oracle 10g. Even more customers tell us, how they dislike the BRTOOLS and want the SAPDBA back.

The plain answer they all get is: SAPDBA is out of maintenance now for ages – use the current BRTOOLS.

But what is behind all this? Is SAP really ignorant against customer wishes? What are the main reasons for customers to reject the BRTOOLS.

I believe the biggest issue for customers is the changed user-interface. While SAPBA used a rather classic-approach with a central starting menu from which the DBA could jump between functions as he liked the BRTOOLS are designed by a task-oriented way.

What does that mean? The idea is, that the DBA does not use the BRTOOLS for “just looking” or “trying out things”. When the DBA starts the BRTOOLS he already knows exactly, what he wants to do, which task he wants to accomplish. To support this approach, most menus in the BRTOOLS do appear in a wizard-like-sequence, usually starting with rather general settings ending up with the options for choosing details on the desired function.

Let’s check this on an example: Let’s create a UNDO-tablespace.

   > brtools
   BR0651I BRTOOLS 7.00 (26)
    
   BR0280I BRTOOLS time stamp: 2007-06-18 12.05.08
   BR0656I Choice menu 1 - please make a selection
   ---------------------------------------------------------------
   BR*Tools main menu
    
    1 = Instance management
    2 - Space management
    3 - Segment management
    4 - Backup and database copy
    5 - Restore and recovery
    6 - Check and verification
    7 - Database statistics
    8 - Additional functions
    9 - Exit program
    
   Standard keys: c - cont, b - back, s - stop, r - refr, h - help
   ---------------------------------------------------------------
   BR0662I Enter your choice:
   2

As you may have noticed I can either select a function from the menu or choose ‘c’ for continue. Whenever there is a ‘=’ between the menu number and the description that’s the default choice of the current menu and you can just use ‘c’ to go ahead.

In this case the ‘Instance Management’ is the default choice – but I like to add a UNDO tablespace so I go for ‘2 – Space Management’.

   BR0280I BRTOOLS time stamp: 2007-06-18 12.05.59
   BR0663I Your choice: '2'
    
   BR0280I BRTOOLS time stamp: 2007-06-18 12.05.59
   BR0657I Input menu 82 - please check/enter input values
   -----------------------------------------------------------------------------
   BRSPACE options for create tablespace
    
    1 - BRSPACE profile (profile) ...... [initTDB.sap]
    2 - Database user/password (user) .. [/]
    3 ~ Tablespace name (tablespace) ... []
    4 - Confirmation mode (confirm) .... [yes]
    5 - Scrolling line count (scroll) .. [20]
    6 - Message language (language) .... [E]
    7 - BRSPACE command line (command) . [-p initTDB.sap -s 20 -l E -f tscreate]
    
   Standard keys: c - cont, b - back, s - stop, r - refr, h - help
   -----------------------------------------------------------------------------
   BR0662I Enter your choice:

In that menu I can enter additional information like the name of the tablespace I’m about to create, but I don’t have to. That’s visible by the ~ (tilde) sign at the menu point.

Due to the modular concept of the BRTOOLS the tool BRSPACE will now be called. You can see the command line parameters used for this at menu entry 7. If you would start BRSPACE with these options, it will work the exact same way.

Anyhow, I continue with ‘c’.

   #####################################################################
    
   BR1001I BRSPACE 7.00 (26)
   BR1002I Start of BRSPACE processing: sdvmycvk.tsc 2007-06-18 12.47.48
   BR0484I BRSPACE log file: C:ORACLETDBsapreorgsdvmycvk.tsc
    
   BR0280I BRSPACE time stamp: 2007-06-18 12.47.52
   BR1009I Name of database instance: TDB
   BR1010I BRSPACE action ID: sdvmycvk
   BR1011I BRSPACE function ID: tsc
   BR1012I BRSPACE function: tscreate
    
   BR0280I BRSPACE time stamp: 2007-06-18 12.47.53
   BR0656I Choice menu 304 - please make a selection
   ---------------------------------------------------------------------
   Tablespace creation mainmenu
    
    1 = Create tablespace
    2 - Show tablespaces
    3 - Show data files
    4 - Show disk volumes
    5 * Exit program
    6 - Reset program status
    
   Standard keys: c - cont, b - back, s - stop, r - refr, h - help
   ---------------------------------------------------------------------
   BR0662I Enter your choice:

We see a line of hashes (‘#######…’) indicating hat an external tool – BRSPACE in this case – has been called and that we now see the output of this tool. Additionally we see the BRSPACE header, so we can be pretty sure, that we’re now using BRSPACE.

The default choice is now automatically set to ‘create tablespace’, so if we already know what we want, we can just move on with ‘c’. Otherwise we can check what tablespace are already there, how much space on the physical disks are left and so on with the other menu entries.

Let’s move on with ‘c’.

   BR0280I BRSPACE time stamp: 2007-06-18 12.51.05
   BR0657I Input menu 305 - please check/enter input values
   ---------------------------------------------------------------
   Main options for creation of tablespace in database TDB
    
    1 ? Tablespace name (tablespace) ......... []
    2 - Tablespace contents (contents) ....... [data]
    3 - Segment space management (space) ..... [auto]
    4 # Database owner of tablespace (owner) . []
    5 ~ Table data class / tabart (class) .... []
    6 - Data type in tablespace (data) ....... [both]
    7 # Joined index/table tablespace (join) . []
    8 ~ Uniform size in MB (uniform) ......... []
    
   Standard keys: c - cont, b - back, s - stop, r - refr, h - help
   ---------------------------------------------------------------
   BR0662I Enter your choice:

Now I’ve to enter the name for the new tablespace – the ‘c’ won’t work until all menu entries marked with ‘?’ have been filled. For the contents and segment space management part there are default choices that should fit the majority of all use cases. Anyhow for my test, I’ll change the contents-type to UNDO and go ahead with ‘c’

 

   BR0280I BRSPACE time stamp: 2007-06-18 12.53.19
   BR0657I Input menu 306 - please check/enter input values
   -------------------------------------------------------------------------------
   Space options for creation of tablespace PSAPUNDOX (1. file)
    
    1 - Tablespace file name (file) .......... [C:undox.data1]
    2 # Raw disk / link target (rawlink) ..... []
    3 - File size in MB (size) ............... [10]
    4 # File autoextend mode (autoextend) .... [no]
    5 # Maximum file size in MB (maxsize) .... []
    6 # File increment size in MB (incrsize) . []
    7 - SQL command (command) ................ [create undo tablespace PSAPUNDOX datafile 'C:undox.data1' size 10M autoextend off]
    
   Standard keys: c - cont, b - back, s - stop, r - refr, h - help
   -------------------------------------------------------------------------------
   BR0662I Enter your choice:

The last screen before the actual command is send to the database give the option to setup the sizes and locations of data files for the tablespace. Again default values are given (especially the filename) and can be used by just entering ‘c’.

One nice feature is that the SQL command is visible AND editable at menu entry 7. So if you want to edit it manually, you can do this easily (ever tried that with SAPDBA?).

The next menu is a hurdle for many BRTOOLS users:

   BR0280I BRSPACE time stamp: 2007-06-18 12.56.35
   BR1091I Next data file can be specified now
   BR0675I Do you want to perform this action?
   BR0676I Enter 'y[es]' to perform the action, 'n[o]/c[ont]' to skip it, 's[top]' to abort:

The question is: do you want another data file right now or not? If so, enter ‘y’es. If not, you can simply use ‘c’ again to go on. Sometimes this is over read and DBAs end up with far more data files than they wanted – because they always entered ‘y’, ‘y’, ‘y’… So better stick to ‘c’!

 

   BR0280I BRSPACE time stamp: 2007-06-18 12.58.39
   BR0370I Directory C:\ORACLE\TDB\sap\reorgsdvmycvk created
    
   BR0280I BRSPACE time stamp: 2007-06-18 12.58.41
   BR0319I Control file copy created: C:ORACLETDBsapreorgsdvmycvkCNTRLTDB.old 22036480
   BR0252W Function CreateFile() failed for 'C:\hiberfil.sys' at location BrFileStatGet-9
   BR0253W errno 2: No such file or directory
   BR0252W Function CreateFile() failed for 'C:\pagefile.sys' at location BrFileStatGet-9
   BR0253W errno 2: No such file or directory
    
   BR0280I BRSPACE time stamp: 2007-06-18 12.58.41
   BR1089I Creating tablespace PSAPUNDOX...
    
   BR0280I BRSPACE time stamp: 2007-06-18 12.58.43
   BR1016I SQL statement 'create undo tablespace PSAPUNDOX datafile 'C:\undox.data1' size 10M autoextend off' executed successfully
   BR1060I Tablespace PSAPUNDOX created successfully with file: C:\undox.data1 10M
    
   BR0280I BRSPACE time stamp: 2007-06-18 12.58.44
   BR0340I Switching to next online redo log file for database instance TDB ...
   BR0321I Switch to next online redo log file for database instance TDB successful
    
   BR0280I BRSPACE time stamp: 2007-06-18 12.58.47
   BR0319I Control file copy created: C:\ORACLE\TDB\sap\reorgsdvmycvk\CNTRLTDB.new 22036480
    
   BR0280I BRSPACE time stamp: 2007-06-18 12.58.47
   BR0256I Enter 'c[ont]' to continue, 's[top]' to cancel BRSPACE:

 

Ok, we see some warnings here – they are the result of the non-standard, non-sensible and non-recommended location of my data file. I just put it into c: – so at least I get a warning when I do such nonsense.

To finish the action, you guess it, press ‘c’.

   BR0280I BRSPACE time stamp: 2007-06-18 13.00.41
   BR0656I Choice menu 304 - please make a selection
   ---------------------------------------------------------------
   Tablespace creation main menu
    
    1 + Create tablespace
    2 - Show tablespaces
    3 - Show data files
    4 - Show disk volumes
    5 = Exit program
    6 - Reset program status
    
   Standard keys: c - cont, b - back, s - stop, r - refr, h - help
   ---------------------------------------------------------------
   BR0662I Enter your choice:

We’re back at BRSPACE menu and since we’re rather finished with our action, the exit is already the default choice, so ‘c’ is again the correct key.

   BR0292I Execution of BRSPACE finished with return code 1
    
   BR0668IWarnings or errors occurred - you can continue to ignore them or go back to repeat the last action
   BR0280I BRTOOLS time stamp: 2007-06-18 13.01.45
   BR0670I Enter 'c[ont]' to continue, 'b[ack]' to go back, 's[top]' to abort:

Now, what’s that? Errors? We just have successfully created a new UNDO tablespace and now there should have been errors? But wait a minute! It’s just to inform you that there have been WARNINGS or ERRORS. If you’re already aware of them – ignore them by using ‘c’. If you don’t know what errors could have been there: check the scroll back buffer of your terminal or – better – the log file of your current action. With BRGUI (the recommended way to use the BRTOOLS) it’s easy to display the log file – just click on log file and you even get a coloured output so finding warning and errors is really easy.

With the command line usage you’ve to check the log file yourself – just scroll up to where BRTOOLS has called BRSPACE. In that very step you’ll find the line

   BR0484I BRSPACE log file: C:\ORACLE\TDB\sap\reorgsdvmycvk.tsc

That’s the file you should have a look into.

But let us go ahead for now.

   ###############################################################################
    
   BR0292I Execution of BRSPACE finished with return code 1
    
   BR0668I Warnings or errors occurred - you can continue to ignore them or go back to repeat the last action
   BR0280I BRTOOLS time stamp: 2007-06-18 13.01.45
   BR0670I Enter 'c[ont]' to continue, 'b[ack]' to go back, 's[top]' to abort:
   c
   BR0280I BRTOOLS time stamp: 2007-06-18 13.09.34
   BR0257I Your reply: 'c'
   BR0259I Program execution will be continued...
    
   BR0280I BRTOOLS time stamp: 2007-06-18 13.09.34
   BR0656I Choice menu 5 - please make a selection
   -------------------------------------------------------------------------------
   Database space management
    
    1 = Extend tablespace
    2 + Create tablespace
    3 - Drop tablespace
    4 - Alter tablespace
    5 - Alter data file
    6 - Move data file
    7 - Additional space functions
    8 - Reset program status
    
   Standard keys: c - cont, b - back, s - stop, r - refr, h - help
   -------------------------------------------------------------------------------
   BR0662I Enter your choice:

Ok, again we see the ‘#’-line to indicate that the current program has changed. We also see again that there has been a minor problem or warning within the BRSPACE function we used.

Anyhow we’re now back in the BRTOOLS menu and can either continue with other functions or just leave the program. For that – the first time in the whole procedure – we don’t have to use ‘c’ but ‘b’ instead. Remember we’re still in the ‘space management’ menu so we have to go back to the main menu.

 

    
   BR0280I BRTOOLS time stamp: 2007-06-18 13.25.10
   BR0663I Your choice: 'b'
   BR0673I Going back to the previous menu...
    
   BR0280I BRTOOLS time stamp: 2007-06-18 13.25.10
   BR0656I Choice menu 1 - please make a selection
   ----------------------------------------------------------------
   BR*Tools main menu
    
    1 = Instance management
    2 + Space management
    3 - Segment management
    4 - Backup and database copy
    5 - Restore and recovery
    6 - Check and verification
    7 - Database statistics
    8 - Additional functions
    9 - Exit program
    
   Standard keys: c - cont, b - back, s - stop, r - refr, h - help
   ----------------------------------------------------------------
   BR0662I Enter your choice:
   9
   BR0280I BRTOOLS time stamp: 2007-06-18 13.25.23
   BR0663I Your choice: '9'
   BR0280I BRTOOLS time stamp: 2007-06-18 13.25.23
   BR0680I Do you really want to exit BRTOOLS? Enter y[es]/n[o]:
   y
   BR0280I BRTOOLS time stamp: 2007-06-18 13.25.25
   BR0257I Your reply: 'y'
    
   BR0280I BRTOOLS time stamp: 2007-06-18 13.25.25
   BR0653I BRTOOLS completed successfully with warnings

Obviously, ‘exit’ is not the default choice in the main menu – so choosing it with ‘9’ lets us out. Again we see the information concerning the warnings that appeared so that we would have to be deaf andblind to miss them.

What have we learned here? Opposed to SAPDBA the BRTOOLS are far more modular. For nearly every step we take a log is written to make it possible to analyse errors. There are far more security questions. An unwanted ‘drop tablespace’ should not happen that way. Most of the action paths you usually have to take as a DBA a predefined and linked via the ‘c’-continue access.

So, yes the BRTOOLS do work different than the SAPDBA.

But they are more secure and it’s obvious to work even with new features of the database since the modular concept allows a quick integration into the tools – it just follows the same pattern of menus.

From the trainings I’ve given my advice concerning the BRTOOLS is: try them. Use a test system and try your most used actions. Get used to them – these are tools that will be there for the next years.

And of course even the next generation is already in pole position…(see BRTOOLS STUDIO in SDN SAP on Oracle ).

KR Lars

Related SAP notes:

#647697 – BRSPACE – New tool for Oracle database administration

#012741 – Current versions of SAPDBA and BR*Tools

A useful hint on notes

From time to time I get customers asking: ”When will the patch mentioned in note XYZ be available?”.

Most often the disappointing answer has to be: “The release date won’t be announced until the software is released.” The reasoning behind this policy of information can be disputed – anyhow it is how SAP and its partners currently handle it. We don’t promise dates since there might come up very good reasons to spend some more time on producing the bugfix.

So how can the customer then know when the requested software will be available? Actually, whenever a patch is released this is usually updated in the corresponding SAP note. So to know about the software release is to know about the changes of the note. Ok, but really nobody wants to check all the note all days just to find out one day: the note was modified…

To save much time and more nerves SAP provides a handy feature in the Service Marketplace: the note subscription.

When you display a note in the Service Marketplace you get this little link in the upper right corner named “SUBSCRIBE”. Click it and you’ll be asked if you really want to subscribe. A click on “YES” and the next time the note is changed you’ll get an e-mail notification about it.

To unsubscribe just redisplay the note in the Service Marketplace again and at the very same location where you found “SUBSCRIBE” now an “UNSUBSCRIBE” can be found.

Voilà! Notes tracking made easy.

For those of you that are further interested in the notes-search functionality in the SAP Service Marketplace there is a Learning Map available:

SAP Service Marketplace -> notes.

Just click “Learning Maps” on the left-hand menu and browse through the PowerPoint presentation.

Suck that data out! Connect from Oracle to MaxDB

In one of my former jobs, I had been in charge of a rather small project tracking tool for the engineering department. As buying equipment and ordering services were two of the duties of these engineers these tasks had also been modelled into the tool. Of course, the orders had to be transported into the central financial application. Therefore a meeting with the consultants for that ERP system (no, not ours… they had got the wrong one there… ;-)) was set up.

After I explained what data our tool can deliver and offered to create some database views for that purpose, I asked for what information the ERP system would need to be able to process the data correctly. The consultant smiled and told me “Don’t bother – just give us direct access to the database and we will suck the data out!”

It’s unnecessary to tell that this interface was never brought fully productive…

Anyhow: connecting systems is one of the major problems bigger IT-landscapes face. SAP Netweaver it a tool specifically designed for this task. With it, it’s possible to define system interfaces that support each of the attached systems to “grow” and develop further and still being able to use the interface. (XI would be the technology in charge).

But there is the downside of complexity. Sometimes you just want to get some data from one database of a system into the one of a different system without having to define a messaging infrastructure first.

With the db-links of Oracle this is quite easy. Unfortunately these links only seem to work with Oracle databases… but only at first sight!

Let’s assume you’ve got a MaxDB and an Oracle Database and want some data from that MaxDB in the Oracle Database. That can be done by either exporting the data from MaxDB into flat files and then import it into Oracle (for which you would have to have some 3rd party tool) or you access the data in the MaxDB directly from Oracle.

Let’s see how this is done.

Oracle comes with a functionality called “Generic Connectivity” (GC). GC enables Oracle to access data via DB-Links that can be reached through OLE DB or a ODBC connection.

In this example, I will set up an ODBC connection to my MaxDB database and perform the necessary steps to access the data out of Oracle. I assume that both databases are already correctly installed and that there is the ODBC driver for MaxDB installed on the Oracle Server.

1. Create a ODBC connection

START -> Programs -> Administrative Tools -> Data Sources (ODBC)
or
START -> Settings -> Control Panel -> Administrative Tools -> Data Sources (ODBC)

Change to “System DSN” and click “Add…”

In the “Create New Data Source” dialogue choose the “MaxDB” ODBC driver and click on “Finish”.

In the upcoming dialogue I enter the following:

Data Source Name:  SDB
Server:            <hostname of MaxDB Server>
Database:          SDB

Then click on “Settings …”  and select the ORACLE sqlmode. Close the dialogue windows by clicking “OK”.

Now a new ODBC data source has been created.

2. Modify the Oracle Net setup

Create a file named init<SID>.ora in the %ORACLE_HOME%/hs/admin-Folder. In my case the file is named initSDB.ora If you look into that folder you’ll find also templates for this file, but the one I use just contains these two lines:

HS_FDS_CONNECT_INFO = SDB
HS_FDS_TRACE_LEVEL = OFF

Basically, this will tell Oracle later on, which ODBC data source should be used. So make sure that the <SID> is correct here.

The next thing is to add an entry into the tnsnames.ora file like this one:

SDB.WORLD =
      (DESCRIPTION=
        (ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1512))
        (CONNECT_DATA=(SID=SDB))
        (HS=OK)
      )

The (HS=OK) part is the key point here! Make sure to check if the Listener port is correct for your system.

As the last step of configuring the listener configuration has to be made aware that it should listen to the new entry. So add this to the SID_LIST_LISTENER list in the listener.ora-file:

(SID_DESC=
      (SID_NAME=SDB)
       (ORACLE_HOME=c:\\oracle\\tdb\\102)
      (PROGRAM=hsodbc)
    )

With that, the Oracle listener will start up a kind of adapter program when a connection to this SID is made. This adapter program then will translate the SQL*NET requests into ODBC calls.

Now restart the Oracle listener:

lsnrctl reload

3. Creating a DB-Link inside the Oracle database

The finishing step to access data in the MaxDB database out of Oracle is to create a db-link that points to the new tnsnames.ora-entry:

sqlplus /
create database link sdb.world connect to lars identified by lars using ‘SDB.WORLD’;

Of course, you will have to put in your username and password in here, but I guess you get how this should work.

4. The first query:

Now we’re already ready to get data out from the MaxDB:

select * from user_catalog@sdb.world;

TABLE_NAME      TABLE_TYPE
--------------- -----------
TEST            TABLE

If you now get a error message like this:

ERROR at line 1:
ORA-02085: database link SDB.WORLD connects to HO.WORLD

Then this is just because the Oracle parameter GLOBAL_NAME is currently set to TRUE. After a

alter system set global_names=FALSE scope=both;

this error will be gone.

So we’ve got an easy way to get data from a MaxDB into an Oracle database. Actually, this will work with any database that you can reach over ODBC.

For further reading have a look at: Oracle® Database Heterogeneous Connectivity Administrator’s Guide 10g Release 2 (10.2) 7 Generic Connectivity (http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14232/gencon.htm)

A Warning

I just presented an easy rather quick and dirty way two make up an interface between to systems, by just accessing the data in the databases of the systems. Please don’t consider this to be the proper way to implement an interface! This is nothing but pure data shovelling and should not even be thought as the right layer on which the system interface is designed. What happens when the data definition in on of the databases change? Do you want to change the interface, too? Every time? Surely not! How do you keep track of problems in the interface? Write a logging facility yourself?

Therefore: Use what you’ve paid for! Use Netweaver! It’s already there and it’s designed just for this kind of tasks.

Hope you have some fun with this technical option anyhow.

Best regards,
Lars

I can repair your corrupt database!

Years in database support, endless experience and access to a whole bunch of internal notes and guidelines have enabled me (as well as all DB Supporters in SAP) to repair corrupt databases. This obscure technique is of course only available to the initiated and should be used with care. But when applied it works just like a wand: puff – the database is OK again.

Now I’m going to share this secret with you. The method is called: Restore and Recovery.

Yes, I know what you might think right now: “Restore and Recovery? That’s no repairing at all and the once wrong data will be wrong again!”

To reply to this, we’ve to think about, what a corruption really is. It may well be that the database handles its own data wrong and produces corruptions. Yes, that’s possible but really seldom. In those cases there will usually be a bug fix or a repair program to fix this. And in some cases the DB-Developers will have to find a custom solution.

But these are really, really seldom cases.

Corruptions WILL appear

The majority of corruptions reported by the database are caused OUTSIDE the database.  Now, what do I mean by that?

The database writes out data to the disk/files usually by calling some OS-function. The OS then hands over the request to the device driver, this in turn accesses the controller. The controller should manage the physical disks in which the read/write heads are moved to the correct positions. Basically it’s a rather long chain of systems calling other subsystems functionality. Once the DB got an “OK” from the OS back for it’s I/O call, than the DB is out of the game. From there the DB software has no influence on what happens to the data.

As any of these systems may (and will at some time) fail, unwanted changes to the data appear – that is: corruptions will be detected by the DB. As you easily can imagine, the more data blocks there are and the more I/O there is done the higher is the probability that data is changed by accident.

This is nothing that we can prevent by any means from happening.

Since we don’t want to loose data the actions taken usually include making copies of the data. In e.g. RAID systems redundancy is used to handle lost data if a disk fails. Unfortunately the error detection of storage systems is rather chunky compared to the checks done by the page-I/O interface of the DB. That’s the reason why in most cases “all lights are green” on the storage system, while the DB already reports corruptions.

What way out?

What do we know until now? Corruptions do happen and cannot be avoided. They most often “happen” outside the database and are usually only discovered by the database.

So how is Restore and Recovery related to this?

Once a corruption has been detected by the database software there are usually two different situations:

  • The corruption appeared on some data that is necessary for each and every transaction in the system. Due to the corruption there is now a production down situation. We need a quick solution.
  • The corruption appeared on some barely used data – no business impact given here. Resolving the issue is important, but not so time related and may be analyzed “in-depth”

What may sound funny is: the action plan for both scenarios is the same.

As we can be pretty sure, that a Restore and Recovery will remove the corruption, it may not be the fastest option to get the database back online. Therefore corruption handling always includes identifying what database objects are broken. If these are secondary information (like indexes or status tables which may be regenerated by the application) then it may be quicker to avoid the recovery.

But again: in most cases Restore and Recovery is the only option to get the correct data back.

Luckily such an opportunity is just what Backups are taken for.

Sadly, taking Backups seem to be trained heavily, Restore and Recovery much less.

From support point of view the real problems start here: Restore and Recovery had NOT been tested and trained for the production system. Nobody at customer side knows how to handle the external backup system. At worst it never had been used for a Restore at all.

And even if the procedure to Restore and Recover the database is clear and works as it should, there are far too many customers that realize in that very moment: all my backups already include the corruption. It was just no problem until now, since the corrupt blocks haven’t been touched for a while.

This is the moment where finger pointing starts andthe manual repair is asked for. But: finding out whom to blame – if possible at all, and that’s rarely the case – won’t help a bit here. No vendor of any of the systems that have touched the now corrupt data can make them right again – they don’t know how the correct block has locked like, how should they?

So keeping a corruption free backup and being able to recover it is the key to solve corruption issues.

Its DBAs responsibility

How can one be sure that the backup is not corrupt? Since it’s unlikely that a corruption simply disappears again, making a backup and run a database structure validation just after it already is a valid proof. Even more secure would be to actually perform the Restore and Recovery on a second system and do all the database consistency checks on the recovered instance.

Basically that’s the core job of the DBA. Being able to Restore and Recover the database. When awakened middle in the night, it’s nice to know how to tune statements but it’s CRUCIAL to be able to Restore/Recover the database blindfolded.

The mantra for each and every DBA therefore should be: I do make backups, I do restore them, I do check them, I do make backups …

Practical Advice

In the more than 350 corruption related messages that had been handled up to date this year in SAP DB-Support, most often these notes have been handed to the customer. As the notes are very instructive and provide a sound knowledge of how to handle corruptions it’s a clear advice that they should be read BEFORE any problem appears:

Oracle:

#540463 – FAQ: Consistency Checks + Block Corruptions
#365481 – Block corruptions
#23345 – Consistency check of ORACLE database

MaxDB:

#940420 – FAQ: Database structure check (VERIFY)
#26837 – MaxDB: Data corruption
#846890 – FAQ: MaxDB Administration
#727099 – Using database structure checks (Verify)

Of course there are many other notes targeting special types of corruptions, but knowing these for your database system will spare you a lot of stress.

A classic in disguise…

In my last blog entry “Starting off with a classic…” I presented a classic issue that may appear when the database software is upgraded.

The problem was that sometimes result sets are delivered to the user in a sorted manner even if no ORDER BY was specified. If the user/developer relies on this order it may be a cause for application errors, as the database don’t guarantee the order if ORDER BY is omitted.

Now I came across a customer message that described a problem after the upgrade from Oracle 9i to Oracle 10g which was quite similiar – but only on the second look.

THE TABLE

To understand this problem it’s handy to have a demo table.
My one looks like this:

create table tt (cdate varchar2(8), info varchar2(10));

For the test, we also need some data:

insert into tt values('20070101', 'TEST-JAN');
insert into tt values('20070104', 'TEST-JAN');
insert into tt values('20070104', 'TEST-JAN');
insert into tt values('20070104', 'TEST-JAN');
insert into tt values('20070104', 'TEST-JAN');
insert into tt values('20070124', 'TEST-JAN');
insert into tt values('20070124', 'TEST-JAN');
insert into tt values('20070124', 'TEST-JAN');
insert into tt values('20070201', 'TEST-FEB');
insert into tt values('20070201', 'TEST-FEB');
insert into tt values('20070203', 'TEST-FEB');
insert into tt values('20070212', 'TEST-FEB');
analyze table tt compute statistics;

Now a quick check, what’s in the table now:

break on cdate skip 1
select * from tt order by cdate asc;
CDATE    INFO
-------- ----------
20070101 TEST-JAN

20070104 TEST-JAN
         TEST-JAN
         TEST-JAN
         TEST-JAN

20070124 TEST-JAN
         TEST-JAN
         TEST-JAN

20070201 TEST-FEB
         TEST-FEB

20070203 TEST-FEB

20070212 TEST-FEB
12 rows selected.

As we see there are several different dates with different numer of entries for the different dates. That’s pretty much like it was on the customer system.

The Query

The customer now wanted to know the newest cdate of the table. That is of course the one that is the highest one.
He complained that after the upgrade to Oracle 10g the database does NOT return the highest cdate value anymore. So, what happened?
What he coded was:

select max(cdate) from tt group by cdate;

The result sets looked like this:

Oracle 9i:
MAX(CDAT
--------
20070101
20070104  >
20070124  >
20070201  >
20070203  >
20070212  >
6 rows selected.
Oracle 10g:
MAX(CDAT
--------
20070203
20070212   >
20070201     <
20070101     <
20070104     >
20070124     >
6 rows selected.

For easier comparisation I marked each line with < or > denoting that the current line cdate is smalle < or bigger > than the preceding one.
So the customer made a correct observation: the result of his query was sorted before the upgrade and is not sorted afterwards. But as we already now, the order of the  result set was not guaranteed since no ORDER BY clause was included.

Anyhow we want to know why this is happening…

The Analysis

With Oracle 10g the query  is executed like this:

set autotrace traceonly exp
select max(cdate) from tt group by cdate;
-------------------------------------------
| Id  | Operation          | Name | Rows  |
-------------------------------------------
|   0 | SELECT STATEMENT   |      |     6 |
|   1 |  HASH GROUP BY     |      |     6 |
|   2 |   TABLE ACCESS FULL| TT   |    12 |
-------------------------------------------

Switching back to Oracle 9i we see that something has changed here:

alter session set optimizer_features_enable='9.2.0';
select max(cdate) from tt group by cdate;
-------------------------------------------
| Id  | Operation          | Name | Rows  |
-------------------------------------------
|   0 | SELECT STATEMENT   |      |     6 |
|   1 |  SORT GROUP BY     |      |     6 |
|   2 |   TABLE ACCESS FULL| TT   |    12 |
-------------------------------------------

While Oracle 9i used a SORT GROUP BY operation to produce the result set Oracle 10g employs a HASH GROUP BY.
As the name already tells, the former one sorts the data for this operation thus producing a sorted resultset. So up to now we know that the result set order changed and why it has changed, but still the question is: why does it produce a wrong result for MAX()?
The answer is simple: it does not produce a wrong result at all.

The Solution

What is wrong in this example is the statement itself. The customer wanted the newest cdate value in the table. But he asked for the highest value for each different value, since he grouped his data over cdate. As he always got back a resultset that included all distinct values for cdate the developer must have scrolled through the result set to get the highest value…

The correct implementation of this question in SQL would of course be this one:

select max(cdate) from tt;

This one would deliver only one single value (SELECT SINGLE!): the highest value for cdate.

No sort order in the result set, no scrolling trough it. Just the query and the result.

As you can see, this “problem” might come up at very different corners of your SQL code.

So better doublecheck it, before assuming a database bug.

Best regards,
Lars

Starting off with a classic…

Every now and then a new version of the database software comes along.
Upgrades of Oracle or MaxDB often include major improvements concerning stability and performance. These improvements can and should be read as: things run different now.
A common example of how different things work and a “classic” issue in support is the following: “We upgraded our database software and now the data is not returned in the same order as before.”

Of course, at first sight, the new and presumely better version of the database software “forgot” how to sort things correctly. Therefore a support message is opened.
But look (and think) again:
If the Statement, that is now giving back data in a different order, does not include the ORDER BY clause, than the database has done nothing wrong.

In each and every database manual and also in the sql standard you’ll find a hint to this:
Unless ORDER BY is supplied, the database does NOT guarantee the order of returned rows.
If your application logic relies on the order, then use ORDER BY.
And really that’s it! Period.
So why can this happen at all?
The answer to this is (most often): a change of data access paths.
Let’s make an example (this time with Oracle 10g):

 

1. We create two tables SMALL and BIG:

 

create table small as (select object_name from dba_objects);
create table big as (select object_name, object_type from dba_objects);

repeat this a few times:
insert into big (select * from big);
Result: there’s now a 1:m relationship between the object_name in SMALL and the object_name in BIG.

2. Create index on BIG and collect cbo statistics:
create index i_big on big (object_name);
analyze table small compute statistics;
analyze table big compute statistics for table for all indexes;

 

 

3. Let’s see which order the data has right now:
select b.* from small s, big b where s.object_name = b.object_name;

 

OBJECT_NAME          OBJECT_TYPE
-------------------- -----------
I_COBJ#              INDEX
C_FILE#_BLOCK#       CLUSTER
SEG$                 TABLE
I_UNDO1              INDEX
I_COL1               INDEX
I_IND1               INDEX
I_COL3               INDEX
I_CDEF2              INDEX
I_USER#              INDEX
[...]

So, this is not ordered anyhow…

 

4. How has this been when the rule based optimizer was used? The data was sorted back then!
select /*+ rule */ b.* from small s, big b where s.object_name = b.object_name;

OBJECT_NAME          OBJECT_TYPE
-------------------- --------------------
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
I_COBJ#              INDEX
C_FILE#_BLOCK#       CLUSTER
C_FILE#_BLOCK#       CLUSTER
C_FILE#_BLOCK#       CLUSTER
C_FILE#_BLOCK#       CLUSTER
C_FILE#_BLOCK#       CLUSTER
C_FILE#_BLOCK#       CLUSTER
[...]

 

 

5. So why is that ? It’s all about access paths !
explain plan for
select /*+ rule */ b.* 
from small s, big b
where s.object_name = b.object_name;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------
Plan hash value: 1302638698
----------------------------------------------
| Id  | Operation                    | Name  |
----------------------------------------------
|  0  | SELECT STATEMENT             |       |
|  1  |  TABLE ACCESS BY INDEX ROWID | BIG   |
|  2  |   NESTED LOOPS               |       |
|  3  |    TABLE ACCESS FULL         | SMALL |
|* 4  |    INDEX RANGE SCAN          | I_BIG |
----------------------------------------------

With the rulebased optimizer the available index has been used to make up the join. As a coincidence the data has been delivered the way it was picked up: sorted in the index sort manner.

When the RBO-Usage vanished with Oracle 10g (at the latest), other join strategies have been considered:

 

explain plan for select  b.* from small s, big b where s.object_name = b.object_name;

select * from table(dbms_xplan.display);
------------------------------------
| Id  | Operation          | Name  |
------------------------------------
|   0 | SELECT STATEMENT   |       |
|*  1 |  HASH JOIN         |       |
|   2 |   TABLE ACCESS FULL| SMALL |
|   3 |   TABLE ACCESS FULL| BIG   |
------------------------------------

*) I omitted some information here to make the comparisation easier.

As you can see, now the index is not used at all. The join is done via a HASH join. This hash join does not take care about the way the data is sorted, but is pretty quick for the join itself.
The result is a quicker join but the “order” of rows has completely vanished.

 

6. Now, what do you guess will happen to the access path if we ask for a sorted result?

Index access? Guess again!

 

explain plan for select  b.* from small s, big b where s.object_name = b.object_name order by b.object_name;
-------------------------------------
| Id  | Operation           | Name  |
-------------------------------------
|   0 | SELECT STATEMENT    |       |
|   1 |  MERGE JOIN         |       |
|   2 |   SORT JOIN         |       |
|   3 |    TABLE ACCESS FULL| SMALL |
|*  4 |   SORT JOIN         |       |
|   5 |    TABLE ACCESS FULL| BIG   |
-------------------------------------

A SORT-MERGE-Join is done. This join method does not only care about the sort order, it relies on it.

As these examples are valid for Oracle the same effects can be demonstrated with different versions/releases of MaxDB (former SAP DB).

For the SQL-User (usually the application or report developer) knowing this effect should lead to the conclusio:

IF THE DATA SHOULD BE SORTED, ASK FOR IT. USE ORDER BY.
NO ORDER BY – NO GUARANTEED ORDER.

Best regards,  Lars

SAP HANA, databases and computers