Archive for April, 2008

April 28, 2008: 5:52 pm: Dan.Net, Programming

Back when I was starting my career as a developer at IBM, the department’s guru (bearded, of course) gave me some advice which has stuck in my mind: “first you make it work, then you make it fast”. I found the advice dubious at the time, but it was a commonly held belief at IBM and I eventually adopted it.

Clearly, this advice dates from a much quainter time. IBM’s metric back then was 100 lines of code per day (anything more and you clearly weren’t testing properly), and we were actually given time to make it fast after we made it work.

Today, the practice is more likely to be “first you make it work, then you move on to the next project”. Performance optimization is something that needs to be done the first (and only) time you write the code. This is, of course, easier said than done, and despite our best intentions version 1.0 of any sizable application is likely to leave room for improvement when it comes to performance.

A couple of weeks ago, a user of Version 1.0 of our .Net application called the support line to report that one of our screens took half an hour to load. I knew our performance wasn’t that bad, so I tried loading the same data on my PC and, while it took a little over 3 minutes, I noticed that memory usage shot up to over 1 gigabyte. This user had only 1 gig of RAM on her PC, so the mystery was solved. Personally, I was kind of proud that the application diligently toiled away until it was able to fit a gig of data into a 1 gig PC, but the account executive thought it might be nice if we could shrink that gig of data into something more managable. After spending a couple of weeks giving my app a liposuction, I thought I’d share some a few tips.

1. Before firing, aim. That is, before trying to solve the problem, be sure you know what the cause is. The culprit may surprise you — in fact, assuming that you’ve done a fairly good job of writing memory efficient code, the culprit will almost certainly surprise you. I found the approach described in this article to be a quick way of getting to the root of the problem, without the need to buy and learn a commercial profiling tool. In a nutshell, you can use the Windows Performance Monitor to confirm that it is .Net code that is gobbling up the memory, as opposed to native code such as a 3rd party tool or a legacy DLL. If the problem is in managed code, you can use Microsoft’s WinDbg tool, and in particular its dumpheap command, to see what types of data are using the most RAM.

In the case of my application, as shown by the Performance Monitor snapshot at the right, the native code (in blue) used memory sparingly, but managed code (in yellow) shot to the moon. The dumpheap command uncovered the following list of suspects (where the 3rd column is memory usage, in bytes):

04eaec84 429440 29201920 TV8_Demo.clsRating

05f9092c 429440 30919680 TV8_Demo.BLstBaseRating

07f8dcfc 336132 44369424 Infragistics.Win.UltraWinGrid.UltraGridCell

790fd8c4 1310318 52389724 System.String

07e1fcb4 339952 89747328 Infragistics.Win.Appearance

7912d8f8 690781 96179680 System.Object[]

093d522c 102219 136564584 TV8_Demo.clsBookingDetail

010c340c 858880 147727360 System.Nullable`1[[System.Single, mscorlib]][]

2. Everything counts in small amounts. This isn’t news to any experienced programmer, but your choice of data types matters when your code base grows to tens of thousands of variables. It is very easy to get in the habit of choosing variable data types for comfort rather than speed. If you make all your numeric variables “int” or “decimal”, and all your text variables “string”, your coding will be a lot easier and your app will be a lot fatter. It is far better to get in the habit of using byte, sbyte, float and char variables and doing the extra casting and conversion code that they require — it’s a pain, but that’s why they pay us the big bucks!

Sometimes a more imaginative solution is called for. One of my classes, clsBookingDetail (the 2nd worst offender identified by dumpheap), is used as the basis for a weekly grid and therefore contained 52 float variables. Since only a small percentage of weeks contained values in each record, I was able to save a lot of memory by replacing the 52 variables with get/set fields, where the get and set method retrieved the value from a hashtable.

Another heavily used class contained an array of float? variables. This is the “System.Nullable” object that was chewing up 147M of heap space, and my first thought was that using a nullable type must be much more expensive than the underlying value type. It was, but savings weren’t dramatic. The real problem here was the size of the arrays — they, too, were sparsely populated, and a large proportion of these arrays contained nothing but nulls. A Hashtable would have been a better choice here, as well, but I decided to do a bit of extra work and rewrite the code so that the arrays would be dynamically sized and contain only the non-null values. These changes reduced that 147M down to 19M.

3. Don’t ignore 3rd party tools when optimizing your application. Here’s a nightmare that haunts any developer: a 3rd party tool that covertly causes performance problems. It’s never fun to have to respond to user problems by pointing your finger at someone else, especially if you can prove that you’re right. If the 3rd party tool consists of managed code, then dumpheap makes this scenario much less scary. When you can tell the developer of the 3rd party tool exactly what the problem is, they are likely to have heard about it from other customers and will be happy to provide you with a solution. In my case, 2 of the 6 largest users of the heap were objects that are part of the Infragistics WinAdvantage suite: the grid’s UltraGridCell object, and the Appearance object that is used to set a cell’s display attributes. A visit to the Infragistics support forum uncovered a few suggestions for more efficiently using these objects, reducing the number of Cell objects on the heap by 1/3, and the number of Appearance objects to almost zero.

4. Sweat the small stuff. That is, be very, very careful about the size of small objects and structures that are heavily used in your application. Early in the development in our application I had created a simple structure that was used to associate variables with their respective table, record and column of the Dataset. I had used a few “int” and “enum” variables, then largely forgot about this structure since it served its purpose well. This structure was so insignificant that it didn’t come immediately to mind when I saw the results of the dumpheap command — structures and value types aren’t identified separately in the heap, but are lumped into the objects which own them,such as clsBookingDetail, BLstBaseRating and clsRating in the above list. When I finally turned my attention to this piddly little structure, replacing the “int” variables with “sbyte” and “int16″ variables, and assigning a “byte” data type to its enums, I was startled to find that this minor changes reduced the size of these 3 classes on the heap by about 25%.

There is a lot more to be said on this topic, but in the case of my application these “low hanging fruit” accounted for an improvement of about 50% in memory usage. Dumpheap and the other parts of the Windbg toolset are something that is particularly worth learning, and they deserve a much higher profile than they get in .Net development books. For a top notch tutorial on how to use Windbg to diagnose and fix application performance problems, see the .Net Debugging Tutorials in the “If broken it is, fix it you should” blog.

April 7, 2008: 5:36 pm: DanProgramming, Software Tools

It has been many a year since I’ve darkened the door of a public library. It’s not just that I object to my tax dollars being used to organize the siphoning of profits from publishers and authors by — to steal a line from Stephen Colbert’s I Am America (And So Can You!) — card-carrying library card carriers. (Note that I said “steal“, not “borrow“.) Libraries are objectionable on so many other levels: long waiting lists for anything popular, computer books that date from a previous millennium, librarians who wear rubber gloves (whether they are afraid of germs or paper cuts, I object!), and the fact that all their books come in just one form: paper. It’s all so analog.

I was therefore surprised and highly-skeptical when I read an article on TeleRead (great blog, terrible name) about a library service named Overdrive. Overdrive is launching a publicity campaign at libraries across the US to show people how they could borrow (their word, not mine) ebooks and audiobooks over the Internet. My skepticism turned to envy when a Google search showed that Overdrive offers more than the usual “Project Gutenberg” public domain books by long dead authors and government agencies. My envy turned to glee when I found out that Overdrive is also available in Canada, including my home base of Toronto.

Now, I’m a little unusual for a developer because I a) read things other than software manuals and Slashdot, and b) don’t have a problem with DRM copy protection. Other developers might be put off by the fact that Overdrive doesn’t offer programming or IT books, and its ebooks and audiobooks are available only in secure formats that require special readers or players. (I was pleasantly surprised that most of their ebooks and all of their audiobooks are in formats that are compatible with Windows Mobile Pocket PCs. Linux zealots, on the other hard, are out of luck).

The real prize for developers, though, is what I noticed when I visited the Toronto Public Library’s Overdrive page. Jump up one level from Overdrive and you’ll find a “Download Books, Music, and Video” page. On there are links to 2 other sources for ebooks: NetLibrary and (gasp!) Safari Books Online.

I’ve written rapturously of Safari before. Toronto Public Library offers their academic version, which compared to the commercial version offers a much smaller selection (about 340 books, vs. 5300) but a much bigger bookshelf (unlimited, vs. the 10 titles per month for an entry-level paid subscription). You also have to do without a few small amenities: downloadable PDF chapters, online notetaking, and bookmarks. (Actually, you can create bookmarks, but you share them with everyone else using the service — weird!). Otherwise, the layout and features are pretty much the same as the paid version: titles are easy to find, either by subject matter or using its search facility, and books are displayed in a browser using standard HTML unencumbered by any DRM restrictions.

NetLibrary is a mixed bag. The version offered on the Toronto Public Library’s site includes 7000 ebooks, many of them technical non-fiction titles. It’s hard to say how many of these cover software development since the site sorely lacks a list of books by subject matter, but a search for “programming” as a subject yielded about 80 books, “linux” 16 books, “XML” 10. Unfortunately, many of these are as obsolete as the tomes weighing down the shelves in your local branch. There are a few pearls in there, though, and your access to them is unlimited: as with the academic version of Safari, you can read any book in a browser anytime you like, without any time or usage restrictions.
NetLibrary
And the best thing about it? It’s all ours! Mention these services to any non-geek, and you’ll get a quizzical look followed by mutterings about having to print out and carry around hundreds of 8-1/2 x 11 sheets. They don’t get it! Geeks ride for free, everybody else has to stand in line and wait their turn.

Sorry, Stephen, but I’m now a shameless card-carrying library card carrier. I love your book, but the irony of reading “for the record, we’re not offering this book to libraries” in an ebook downloaded from Overdrive is just too sweet.