Today I spoke with Jörg Hoffmeister (Product and Development Manager of MaxDB and liveCache).
He told me that finally the “MaxDB Internals” Documentation has been released to the public.
Many thanks for that Jörg!
On the MaxDB-site there are also some presentations about MaxDB/liveCache Administration, so make sure you check if there are some helpful information for you.
As these are just the presentation slides I still would recommend to attend the classroom courses. The explanations of the trainer usually lighten up some rather complex theoretical background concepts and take them a bit more into the real life.
In any case, you shouldn’t miss the other documentation/information that are already available for some time:
What’s also worth noting is that recently the new licence for the MaxDB Community Edition has been release (available here). This should clarify many open questions about the MaxDB/OpenSource approach.
Ok, today I’ll show you how to install, setup and use the Windows Debugger to get information about hanging MaxDB kernel tasks.
The first thing we need is obviously a hanging MaxDB session, so let’s produce one!
Setting up the test case
My MaxDB instance is a 7.6.03 Build 9 instance with only one 50 MB data volume. The goal of the test setup is to create a DB FULL situation, since this is a rather common MaxDB hanging situation when DATA AREA AUTOEXTEND is not active.
Therefore I create a simple table in SQL Studio:
create table test (col1 int primary key)
and fill it with LOTS of data (notice the Cartesian product of the three-way join with the files table):
insert into test (select rownum from files a, files b, files c)
After a short amount of time my DB will be filled up. This is something the DBMGUI really tells you:
For our analysis of this situation we need some more information, especially we need the Windows thread ID of our hanging session. This information is available via the x_cons tool (you may as well use the dbmcli-command ‘db_cons’ or the DBMGUI function CHECK -> Database Server -> ACTIVE:
x_cons NDB show active:
ID UKT Win TASK APPL Current Timeout Region Wait
tid type pid state priority cnt try item
T17 7 0x4EC User 3004 DB FULL (197) 0 0 791(s)
T33 4 0xD1C Savepnt Vsleep 0 0 262(s)
As we see only two tasks are currently ACTIVE. One (T33) is the SAVEPOINT that is triggered automatically whenever a LOG FULL/DB FULL situation occurs.
The second one (T17) is our hanging session and we see it’s waiting on the DB FULL (197) suspend reason. Note the ‘Win tid’ column; it’s what we’re looking for right now. Our user task has the Windows thread ID 0x4EC in Hex-Format or 1260 in decimal format.
Installing the Windows Debugger
Now we’ve two options to get a call stack. The first one is to use the MS Windows Debugger. This tool is available on the Microsoft Website for free download.
Depending on your Platform either choose “Install Debugging Tools for Windows 32-bit Version” or “Install Debugging Tools for Windows 64-bit Versions” and install the tools on the machine where our database runs.
I installed mine into the standard path ‘C:\\Program Files\\Debugging Tools for Windows’.
The next things we need are the symbol files.
These file contain the information that make it possible that the debugger can tell us, what function is at what memory address of any windows program.
These symbols can either be downloaded completely via the ‘Download Windows Symbol Packages’ or we can tell the debugger to load them on-demand while we debug. As my machine is connected to the Internet.
I go for this option and show you how to configure in the next few paragraphs.
Debugging our user task
Now, we just start the WinDbg.exe (after the installation there is a new entry in the START-menu).
In order to analyze anything, we’ve to tell the Debugger where to find all the symbols. Via the Menu ‘File’-> ‘Symbol File Path’ we get a dialogue like this:
We’ve to enter a PATH Variable here, so we can put in several paths limited by the semicolon sign ‘;’. Since we want to have the Windows symbols automatically downloaded from the MS website the first path we enter is this one:
The bold printed part denotes the local folder where the debugger should store the downloaded symbol files. Next, we need to supply the paths to the MaxDB symbols. Since these are delivered by default in each installation I just have to put in these paths on my machine:
As you see there is a ‘symbols’-Folder in each DEPENDENT folder and one in the INDEPENDENT-folder.
Now the debugging can start.
To do so we need to attach the debugger to the process we want to debug. This can be done via ‘File’ -> ‘Attach to process [F6]‘.
We get a list of running processes and choose our ‘KERNEL.EXE’:
Make sure you check the box ‘Noninvasive’ since we want our process to run further after our debugging and all we want to do now is ‘just looking’!
If you’re unsure which one of the possibly several KERNEL.EXE-processes on your machine would be the right one, just look up the process ID in the knldiag-File in the startup part:
INFO Starting SERVERDB: 'NDB'
INFO SERVERNODE: 'xxxxxxxxxnotvisibletopublicxxxxxx'
INFO Process ID: 3028
INFO Date: 2007-11-14
INFO Owner: 'sdbUser'
INFO Machine: 'Intel IA32 level 6 revision d06'
INFO Processors: 1
INFO Fiber: 'YES'
INFO Max virtual memory: 2047 MB
INFO Total physical memory: 1022 MB
INFO Available physical memory: 419 MB
Ok, once the debugger is attached the rest is really easy.
We click on ‘Display Threads’  first and choose a thread in the list. Now we need to remember our Windows thread ID from above: 0x4EC.
After we select our thread we click on ‘show Callstack’  et voilà:
That’s the call stack of our hanging session.
To read it more easily or to hand it over to support for analysis we want a text-version of it. Nothing easier than that: right-click on the Windows-Title bar and click ‘Copy stack to clipboard’ in the context menu. Now paste it into notepad and we’re done.
To stop our debugging session we just select ‘Debug’ -> ‘DETACH DEBUGEE’ from the menu.
Process Explorer Approach
What we’ve right now can also be easily archived with the Sysinternals tool Process Explorer. All I can say about the Sysinternals tool suite for Windows is: get it onto your servers!
Whenever there is anything weird to analyze these tools have been reliable sources of in-depth system information.
So double recommendation: install them! Today!
The tools are also available for free at the MS website:
To be able to actually use symbols and produce readable call stacks we need to setup the process explorer a little bit.
It also needs to have the Windows Debugger tools installed, since it makes use of the special Dbghelp.dll that comes with them.
All we need to do now is to tell process explorer where this DLL is stored and enter our symbols path (see above):
Now we just browse through the process overview and double-click on our KERNEL.EXE.
The process ID from the knldiag-file can again help us here to find ‘our’ process:
Now we browse through the threads until we find thread ID 1260 (= Hex 0x4EC) and click on [Stack]:
Again we can use ‘copy’ to save this information to a text file or a support message.
Interpreting the call stack
The call stack we just created looks like this on my test machine:
To ‘read’ the call stack we start from the button and move upwards (that’s how a stack works). I will just comment on a few of these lines now.
We see at line 2d that this is a user task ‘kernel!SQLTask+0x7b’ that currently handles a statement that changes the DB content ’27 kernel!a501exec_with_change_rec+0x299′.
Since the statement is not a single-table statement but involves a sub-select we see ’26 kernel!a502complex_execution+0x407′ before that.
A bit further up we see that the table-content will be changed ‘1f kernel!k61table_ins_del_upd+0x34′ and also we see that the select of the subquery is handled ‘1e kernel!k720_single_select+0x24’ and later it’s inserted into our target table ’17 kernel!kb61insert_rec+0xa7′.
An insert need, of course, some work on the b*tree in which the table is stored so we see ’15 kernel!bd30AddToTreeWithBeforeImage+0x153‘ and know by now that this transaction will be rollback-able.
After all Undo- and Log-Information has been written (yes, we follow the WAL Write-Ahead-Log approach) a new page needs to be allocated, to hold our data ‘0d kernel!Data_PageAccessManager::NewPage+0x1e1′.
As this page has to be provided by the Converter we find‘0a kernel!Converter_Converter::RequestNewPageNo+0x18a’.
The Converter than realizes that there is no space available anymore, so it suspends our task and lets it wait for the notification that the DB FULL situation is gone ’05 kernel!__os74WaitForWakeup+0xa4‘.
As we see it’s not too difficult to understand what tasks are waiting for once we have the necessary information (the call stack).
This was just a demo situation but in other hanging situation this can
help to find the cause. So better install the Windows Debugger and the Sysinternals tools (plus the Symbols if no Internet is available) on your Windows server today!
But one last thing: how to solve our Data Full Situation?
In MaxDB you HAVE (sorry no more ways to stress this via formatting…) to add a data volume now. There’s no way to cancel or kill the user task (that is the user session) that causes the big data load even if it’s not wanted beforehand.
So what to do when we don’t really want the data or cannot provide much storage space right now?
Well, we kill the user task anyhow!
<Local>:NDB> db_cons kill T17
Now, we’ll see the following with ‘show active’
<Local>:NDB> db_cons show active
ID UKT Win TASK APPL Current Timeout Region Wait
tid type pid state priority cnt try item
T17 7 0x4EC User 3004 DB FULL (197) ! 0 0 793(s)
T33 4 0xD1C Savepnt Vsleep 0 0 2358(s)
The ‘!’ sign denotes that this user task has been marked to be killed.
Due to the cooperative multitasking model on user task level, the user task is not killed by some authority but has to commit ‘suicide’.
It first has to be able to read the KILL flag and react on this.
But to be able to do so, it must get out of the DB FULL situation.
So we really need to provide a new data volume here to get the task actually killed.
Therefore I add just a very small volume. Since the task will be killed the transaction that loads the data will be rolled back. In effect, the database will have much more free space again afterwards.
After the rollback is done I can simply drop the data volume again – this is the point where we are happy that this database is already on 7.6 😉
MaxDB Versions <7.5 are not able to drop volumes!