Archive for December, 2007

December 17, 2007: 7:52 pm: Dan.Net, IBM Universe (U2)

Our .Net app has been running for over a week now with its new “persistent database connection” architecture intended to reduce the # of IBM Universe licenses (see “Failing to Let Go” ).

The new design is having the desired effect, but it resulted in one new problem which looks like a bug in the UniOLEDB / .Net Provider for OLEDB stack. A tester found that “booking” (one of our application’s functions) a certain type of document without first saving it consistently resulted in a “Not all parameter markers have been resolved” error. This exception was thrown by the Fill method which ran a SELECT statement on this document’s record. The same action definitely worked fine with the previous design, where the .Net Data Provider would automatically open the database connection before running the Fill method, then close it again afterwards.

This bug was somewhat puzzling, since there had been no code changes specific to this SELECT statement, and the SELECT statement was pretty mundane: just one parameter, which was the key for the record. Running the statement in the debugger confirmed that the parameter was, indeed, being set.

When UniOLEDB errors occur for reasons that aren’t obvious from the coder’s viewpoint, it can be helpful to look at the problem from UniOLEDB’s viewpoint. This can be done by turning on the UniOLEDB trace facility, by setting the UniOLEDBTRACELEVEL and UniOLEDBTRACECATEGORY environment variables: I use settings of 4 and 4095, respectively. A DRITrace.txt file will then be written to the folder that the .Net application is running in, chock full of mostly cryptic information about UniOLEDB’s activities:

00022990 CArICommandTextImpl(73761420)::SetCommandText
00022991 CDRImCommandUCI::SetCommandText(SELECT …)
00022992 CArCommand(73761420)::Execute()
00022993 CDRImAdmin::ClearError
00022994 CArCommand(73761420)::InternalSetParameters()
00022995 CArCommand(73761420)::ParamData2DRIBuffer()
00022996 CDRImCommandUCI::Execute(SELECT …)

When I looked at the .Execute that failed,, I noticed that the trace entries which indicate a parameter (such as InternalSetParameters in the above example) were not present. Or, as some would, say the “parameter marker” hadn’t been “resolved”.

I went back and the looked at a previous parameterized SELECT statement that succeeded, to confirm that setting the parameter did result in a trace file entry. It did, but I noticed something else that I hadn’t expected: the previous SELECT statement was exactly the same as the one that had later failed. Same command text, same parameter. A check of my own log confirmed that it was the same parameter value too.

This turned out to be the root cause of the problem. The only thing that we had (intentionally) changed between versions was the point that the database connection was opened — it used to be closed after each SELECT statement, and re-opened for the next one. Now, it was left open between SELECTs. So, some piece of code in the stack (maybe the .Net Provider for OLEDB, maybe UniOleDB) wasn’t impressed that we were asking for the same thing twice, and ignored part of the 2nd request. Closing the database connection between these two requests solved the problem, and that’s the solution we went with for now.

In case you’re wondering, running the same SELECT statement twice wasn’t a bug on our part. Booking an unsaved document required that the .Net application 1) save the document 2) invoke a “Universe Basic API” (basically, a stored procedure) to fill in some missing fields in the document record 3) read the document record (the first SELECT) 4) invoke another Universe Basic API to “book” the document 5) read the newly booked document record (the second SELECT). Not the optimal way of handling this action, perhaps, but not such as unlikely series of events either.

I have a feeling that there are other parts of our .Net application which can run the same SELECT statement twice — if there aren’t, then there almost certainly will be some in the future. So, a more generalized solution is going to be required. I’m inclined to just do away with parameterized SELECT statements, embedding the parameter values into the SELECT statement instead. While this will, in theory, adversely affect both the performance and security of the application, conventional coding wisdom doesn’t necessarily apply when sailing the mostly uncharted waters of UniOLEDB development.

December 8, 2007: 4:31 pm: DanGadgets, Software Tools

I finally found a fix to a longstanding problem I’ve had with OpenWRT. While the solution was a simple one that seems obvious in retrospect, I’ll record it here in case it helps someone Googling for a fix to the same problem.

If you have read this far, I’ll assume that you know what OpenWRT is. Having found it to be blazingly fast and rock steady on my main router, a WRT54GL, when I came across a good deal on a second WRT54GL I decided to use it to extend the wireless range in my home. I’d set it up in the living room, have it wirelessly connect to my main router in the office, and give me a strong Internet connnection from even the remote reaches of my home.

It seemed like an obvious usage for a second router, so I didn’t expect to have any difficultly finding instructions on how to do this on OpenWRT’s excellent but sprawling wiki or one of the many other sites that cover the software. Unfortunately, what I found was too much information — there are umpteen different ways to connect two routers, and OpenWRT supported pretty much all of them. I found myself switching from site to site, poring through descriptions of repeaters, wireless bridges, bridged clients, split bridges on different subnets. But, but, all I wanted to do is be able to read my e-mail on the toilet!

I opted for what seemed like the most straightforward solution, a Wireless Distribution System (WDS) configuration. It’s not the fastest or most secure way of extending a WAN, but the instructions for configuring it in OpenWRT were short and simple. One major downside to WDS in its relatively limited encryption support, as mentioned on the OpenWRT Wiki and somewhat less cryptically (ha!) in the Wikipedia entry on WDS. The vanilla standard for WDS doesn’t support rotating keys, and is therefore limited to WEP encryption rather than the far more robust WPA. However, the Wiki said that the OpenWRT implementation supported WPA1 with pre-shared keys (PSK), a fact that was confirmed on various blogs. Since I was already using WPA-PSK on my main router, this seemed to be the way to go.

And it did go, most of the time. But the connection between the routers would occasionally drop, leaving the client with “limited connectivity” as Windows diplomatically put it. The problem could be easily fixed by just rebooting the 2nd router, so it was more of a nuisance than a stumbling block. However, since I already felt a little guilty about taking the easy way out with WDS, I took a crack at figuring out what was wrong. Googling “WDS” and “OpenWRT” merely confirmed that this combination worked well for most people. One page mentioned that you need to be sure that all the wireless settings exactly matched on the two routers (SSID, channel, wireless mode, key, ec.). OpenWRT actually has a lot of wireless LAN settings, some of which are little known and little used, but I checked all of the things that seemed to be important. I found a few minor discrepancies (like the time zone) and one major one (the wireless mode was mixed on one router and G on the other), but fixing these settings didn’t fix the problem.

After putting up with this arrangement for a few months, and after the connection had a particularly shaky Saturday, I decided to try switching back to WEP instead of WPA for encryption. WEP has long been considered crackable so I hadn’t used it in years, but it was one obvious and important setting that I hadn’t tried changing. Sure enough, it is now 2 weeks later and my WDS configuration has been steady as a rock.

So, having publicly confessed to using WEP, I have to decide whether to let roving bands of wardrivers snoop on my online washroom activities, switch back to WPA and reacquaint myself with the router’s power cord, or figure out what the heck a split bridge is and whether I want one.

While I’m writing about OpenWRT and the WRT54G, I should mention an excellent but apparently overlooked resource, “Linksys WRT54G Ultimate Hacking”, written by Paul Asadoorain and Larry Pesce and published by Syngress. While there are hundreds of web sites that cover pretty much every facet of the WRT54G hardware and OpenWRT software, it can be hard to know where to start. WRT45G Ultimate Hacking does a great job of documenting the hardware design of the router, various step-by-step procedures for installing OpenWRT (and its cousin DD-WRT) and doing the initial configuration. This is a book written by geeks for geeks, so it describes in loving detail the process of configuring OpenWRT from the command line as a DHCP server, a DNS server, even a Samba server. Then it (grudglingly, I think) mentions that there’s also a web interface for doing these things before launching into a chapter of “Fun Projects” like wardriving and running Asterisk on your WRT54G for VOIP. Later chapters get you into even more fun (and perhaps deep trouble) by giving step-by-step instructions on how you can install software on OpenWRT for password sniffing (feel free to use my wimpy WEP-protected for practice), or open up your WRT54G and solder in an SD card reader.

My only complaint about the book is that it doesn’t cover some of the more common usages of the router, such as configuring your router’s Internet connection, setting up a firewall, or (ahem) using a second router to extend your home WAN. Perhaps the authors figured that, having shown us the web interface, us geeks could figure the hum-drum stuff out for ourselves.

December 6, 2007: 7:50 pm: DanIBM Universe (U2)

Predictably, the first client roll-out of our new .Net / U2 (aka IBM Universe) application hit a snag and, predictably, it wasn’t one that I saw coming.

And, boy, what a snag. The first time we tried to login to our application on-site, U2’s UniOLEDB driver threw a “Native Error 930065″ Exception. The very first time, an error that I couldn’t remember ever seeing before. Talk about a worst case scenario!

Like many things in the U2 world this particular error code isn’t documented, nor Googleable. (Go on, try Googling “UniOleDB” and “930065″. Feel Lucky? I bet you end up back on this page). However, Google Desktop came through when my real memory failed by dredging up an old e-mail from IBM support stating (inaccurately) that a 930065 error that we hit when evaluating their U2 10.2 version meant that we were out of licenses.

This time, the “out of licenses” scenario seemed more plausible, since this client only had a few spare licenses. But the very first user? While logging in? Our application is a strictly two-tiered architecture, and given that our clients have to pay for database licenses, we are reluctant to use U2’s add-on server-based connection pooling feature. So our database connections are old school — the user desktop connects, does its stuff, and immediately disconnects to free up the license.

UniOLEDB actually co-operates fairly well with the .Net Data Provider for OLE DB. You create your OleDBConnection object, set the connection string, login to the database, then immediately close the connection. Thereafter, the connection is automatically reopened when needed, such as when you call a .Fill method to run a SELECT statement, and (so I thought) automatically closed afterwards . This arrangement looked great during our many months of internal testing. Connection management was never a concern — it just worked.

It turns out that the arrangement wasn’t working as well as I thought. We have plenty of database licenses on our test server and we never come close to reaching the limit. As a result, I never noticed a bug/feature concerning the way U2 handles the database connections. In the course of handling a particular action (e.g. logging in), we might read from a dozen tables, resulting in the database connection being opened and closed a dozen times. There is only one OleDBConnection object on the client end (a public static variable), which means there is only 1 database connection per client application. But closing that connection does not immediately cause the database license to be freed up. Depending upon the speed of the desktop PC and the load on the server, those dozen connections might result in 8-10 database licenses being briefly devoted to a single client.

Actually, much of this is conjecture on my part. Since I couldn’t find any confirmation of this behaviour in IBM’s “Using UniOLEDB” manual, I e-mailed IBM support asking “when I close a database connection is it possible that the Universe license is not freed up instantaneously?” I received a rather philosophical response that “In reality, there is no such thing as instantaneous. Everything takes a period of time, no matter how tiny a period.”

The IBM tech support person went on to say that he tried closing his test application, then entering the U2 license command into a Telnet session, and he was never able to catch the license still being used. He suggested that, when the code encounters a 930065 error, it should pause to let Universe “reclaim” the license, then try to connect again.

I wasn’t keen on the concept of adding speed bumps to the application, but his use of the word “reclaim” was enough of a hint for me that closing the connection doesn’t free up the license synchronously. (Like that word better, Oh Zen Master of IBM?) There was a period of time that would elapse before the license was freed up again, somewhere between 0 and the amount of time required for an IBM tech support guy to type “uvlictool report_lic” into a Telnet session.

While a dozen connections might chew up 12 licenses during that time, 1 connection couldn’t possibly chew up more than 1 license. So, rather than relying on the (admittedly undocumented) ability of UniOLEDB to automatically connect and disconnect as needed, the safest solution seemed to be to take control of the connections in our code. This meant a long evening of furiously adding calls to our connect and disconnect method at the start of and end of each user “action”, and a long day of testing that code, but it fixed the problem. No more 930065 errors!

What I am still seeing, though, is that the license occasionally isn’t freed up at all. The UniOLEDB object gets closed, disposed and set to null and the license is still being used until the next database connection is made from that client. This isn’t a garbage collection timing issue, but a failure to let go. With so many layers of code sitting between the user and the server, it’s hard to know which layer is failing to pass along the request to drop the license.