Archive for May, 2008

May 26, 2008: 5:29 pm: Dan.Net, IBM Universe (U2), Programming

Having made some progress on the .Net memory usage issue, my UniOLEDB application’s biggest performance problem is now the time required to retrieve large amounts of data from the Universe database. After working on this with Universe’s tech support team for a couple of weeks I was hoping I’d be able to post a list of performance optimization tips, but I’ll have to settle for a list of tips which might help in your case, but didn’t do a whole lot for us.

First, let me define “large amounts of data”. There is a grid in the application which lists media buys in a campaign — one row per “airing”. A campaign often has hundreds of thousands of airings, and millions of airings isn’t unheard of. In order to populate the grid the application may need to read tens of thousands of Universe records. Due to the multi-valued nature of a Universe record, each of these contains sets of up to 100 subvalues, roughly equivalent to a 1-to-many relationship to another table. The table from which this data is being retrieved contains billions of records – we ran our benchmarks with a table that was 11G in size and contained 11.6 billion records. Because this table only has one index, the primary key, the fastest way to retrieve records is by listing each and every key in the WHERE clause. The resulting SELECT statement is massive. The one which I used when benchmarking contained over 29 thousand keys in the WHERE clause and was 740K in size. Not the data, but the SELECT statement — 740K of text!

Given all this, I wasn’t terribly surprised that, prior to optimization, UniOLEDB performance was pretty poor. The first time I tried it, the 29,000-key SELECT statement required about 12 minutes to run.

From the .Net application’s point of view, this 12 minutes was spent waiting for 1 instruction to complete:

dataAdapter.Fill(dataSet);

Aside from the time spent transferring the data from the server to the desktop, which appears to be about 20 seconds based on the server and desktop CPU activity, all of this time was spent by Universe server process. This, then, appeared to be a Universe performance issue rather than an issue with the .Net code or with the various API layers that sit between .Net and Universe (ADO.NET, the ADO.NET Data Provider for OLEDB, and UniOLEDB).

If 12 minutes is clearly unacceptable, what response time should we reasonably expect for this type of query? Our Universe programmer tried running the same query (or so we thought) in Universe Basic code and it took 45 seconds to complete. So, we had plenty of room for improvement and a reasonable expectation that we could cut the response time by 90%.

Here’s a list of what I’ve tried, and the result:

1. Increase the Universe MAXSELBUF parameter. This is a server-side configuration parameter which defines the size of the buffer used to hold the results of SELECT statements. The default is only 4K, and Universe tech support recommended increasing it to 8K. This seemed a little like using a tablespoon instead of teaspoon to empty out the ocean, but like most Universe configuration parameters it was easy to try — just update the uvconfig text file on the server and restart the Universe service. Result: response time dropped from 12 minutes to 8. Whoa! If an 8k buffer made things that much better, how about a 16K buffer? Oddly, this had no effect on the response time. Maybe if we get out the big guns and and bump it up to a “whopping” 128K? No effect. I began to get suspicious and changed the setting back to its original 4K — response time remained at 8 minutes. I ran the full application rather than just my benchmarking code and retrieved the same set of data — response time was still 8 minutes. Result: no effect, despite initial appearances.

2. Restarting Universe. As you probably figured out, this gets credit for dropping response time from 12 minutes to 8. Of course, this isn’t a practical optimization technique in a production environment — I can’t have my code restart the database before each large query. Result: 33% improvement, but not a practical technique.

3. Change the UVTSORT parameter. This is another server-side configuration parameter — changing it from 0 (the default) to 1 turns on multithreading support for sorting data. This seemed promising, since my query included an “ORDER BY” clause and I had noticed that only 1 of our server’s 4 CPUs was being used for most of the time the query was being processed. But turning on this support had no apparent effect on either response time or CPU usage for this query. Result: no effect.

4. Remove the ORDER BY clause. The UVTSORT parameter did get me thinking about the ORDER BY clause in the SELECT statement. It was originally put in there to improve performance — when processing the results of the query, the code could reduce the number of lookups it had to do if the data was returned in a certain order. But in the course of processing a query only a dozen or so such lookups were required, even for a 29,000+ row resultset. Their cumulative impact on response time would be a couple of seconds at most, and how much processing time was that ORDER BY clause costing me? The answer, it turns out, is “a lot”. Result: 33% improvement.

5. Run the SELECT statement as a Universe paragraph. This idea came from Universe support, and was intended to see if the bottleneck was in one of the API layers between the .Net code and the database. By packaging the SELECT statement as a Universe package, the 740K of SQL text was replaced by the following: “CALL TEST1″, where TEST1 was the name of the paragraph. From the .Net application’s point of view, TEST1 is an ADO Command object that returns a resultset, kind of like a stored procedure:

OleDbCommand cmdTest = new OleDbCommand(“CALL TEST1″, conn);
cmdTest.CommandType = CommandType.Text;
dataAdapter = new OleDbDataAdapter(cmdTest);
dataAdapter .Fill(dataSet, “Test1″);

The 740K of SQL text is stored in the Universe master dictionary, or VOC. Getting it into the dictionary is actually easier said than done, since the usual method of editing a file is a Telnet session and Telnet wasn’t intended to handle 740K strings. Fortunately, to Universe the VOC is just another table, so I wrote a bit of UniObjects code to insert a new record in the VOC with a 740K column. Surprisingly, Universe has absolutely no problem with a 740K field — to Universe, every column is just a string, and 740K is well within its comfort zone for string lengths.

Unfortunately, while the operation was a success the patient died a bloody, inglorious death. Not only did the SELECT statement take over 10 minutes to complete, but it returned the data as strings formatted for a Telnet display, one field per row, 1.3 million rows, totaling 38M of string data. Result: don’t go there.

6. Use the Universe Basic SQLExecDirect API. This idea is taken from an IBM Technote. The .Net code is similar to that in the previous example, but now the CALL statement is invoking a Universe Basic subroutine which passes the 740K SQL statement to the SQLExecDirect API. The .Fill method returns the same rows and columns as you get if you submit the SQL statement to UniOLEDB, so it was a seamless change for the .Net application. Unfortunately, for us it was only marginally faster than UniOLEDB, reducing response time from 5:25 to 5:10. Result: 5% improvement.

So, the search for the holy grail continues. IBM’s new ADO.Net driver for Universe beckons on the horizon.

However, my faith in this crusade is wavering. I’m starting to wonder whether this 45-second goal is a reasonable expectation. It is based on a “query” done using native Universe Basic code that bears faint resemblance to a SQL query.

The apples-to-oranges comparison of Telnet to .Net is something that is never easy to reconcile: users love the extra freedom of the interface, but begrudge the extra sluggishness. It’s DOS vs Windows all over again. Maybe there’s a primal love for the command-line deep inside every user? Or maybe I need to more closely embrace the strange ways of Universe Basic, tightening the link between .Net and Universe.

To be continued.

May 4, 2008: 1:49 pm: DanProgramming, Software Tools

The other night I had one of those “senior developer’s moments”. After checking some files into SourceSafe from Visual Studio I noticed a red checkmark next to the project file. I hadn’t touched the project file for days, but thinking that I had inadvertently checked it out I right-clicked on it and selected “Undo checkout”, then clicked OK on the dialog warning me that I would lose my changes.I then scrolled down to the file that I had been working on all day, and was startled to notice that it wasn’t checked out. My heart sunk, then raced, as I scrolled madly through the code looking for some assurance that I hadn’t done what I was afraid that I’d one. So such luck. Thanks to a misplaced checkmark, Visual Studio’s hierarchical source file management and my own inattention, I had just thrown away a day’s worth of changes to this file.

I don’t suppose SourceSafe puts the file it overwrites in the Recycle Bin? Nope. I rather hopelessly ran UnDelete Plus — this is a fast, simple and, for the present, free file recovery tool, but it wouldn’t be able to retrieve the file if SourceSafe had just overwritten it. Naturally, it had. I frantically searched Google for “Sourcesafe undo delete”, but those who had previously made the same dumb move weren’t admitting it to the world.

That’s when it hit me: I did have a backup, of sorts. One of my favourite open source software gadgets is an expanded clipboard named Ditto. This loyal sidekick silently keeps track of everything that you’ve put in the Windows clipboard for days – by default, the last 500 items.

Almost any coder will accumulate a lot of stuff this way. Any time you cut a piece of code to move it to another class, or reposition it within the class, or even just to get rid of it, Ditto saves a copy for you.

Of course, this sort of snapshot tool is really handy for things other than coding too. In fact, once you get used to the fact that Ditto is there, you routinely make a copy of text before changing it even if you don’t need it in the clipboard, just in case. Ditto allows you to find data either chronologically or by searching for a string, so it’s usually quite easy to find what I want later.

For things like screenshots I use a commercial tool called SnagIt, which can be configured to automatically store screenshots to its Catalog. SnagIt also has a cool feature that allows you to extract the text from a screenshot. When I come across a bug, invariably while working on something more urgent, I just take some snapshots using Ditto and SnagIt then carry on with the job at hand. This ensure that when I later get around to reporting the bug, I have more to offer than a vague memory.

So, with the help of Ditto I was able to piece together most of the code that I had deleted, an exercise which set me back by only an hour rather than a full day. For me, Ditto was a tool that I didn’t think I needed until I gave it a try — now, I’d hate to have to get by without it. If the phrase “clipboard extension” makes you think of the clumsy and intrusive feature that Microsoft added to Office then you have the wrong idea. Ditto is more like Google Desktop for all the stuff that you didn’t save to a file, but now wish you had.