Monday, August 31, 2009

Rich support for multiple (versions of) OSes: autoconf?

After many years of working on multi-platform applications (and now also multi-platform platforms), I find it incredibly difficult to try to be rich on all of them, and not either drastically increase the cost of production (with regards to testing resources) or reduce overall quality (screwed up boundary cases, where the “boundary” is much, much closer than one’d like to customer uses). One of the longer term issues that especially affects the Mac platform for Silverlight is our mechanism for building the CoreCLR.

What we’re building today is highly related to Rotor v1 and Rotor v2. The Rotor project was the shared source release of the Common Language Infrastructure, aka SSCLI. Rotor v1 was fairly multi-platform because Microsoft really wanted to show that their new CLI was a realizable technology on alternative platforms. It was released under the shared source license so that academics could peer under the covers and that, despite a facile JIT-compiler and garbage collection mechanism, the system would work and also how it would work. Figuring they’d proved their point1, the Rotor update to v2 was hamstrung further — they needn’t have it work on all platforms. The nice framework they had built and managed to allow for building Rotor v1 on multiple platforms had suffered bitrot and would build on nothing but Windows.

Fast forward several years and past an internal project that never saw the light of day, and you get to the inception of the Silverlight project. They had a mechanism that would (mostly) build something for FreeBSD, and with some minor tweaks, would build for Mac OS X2. Furthermore, another (defunct) team had gone through the effort to port the commercial JIT-compiler for x86 and the commercial garbage collector to GNU, so they had most of what they needed to start working3. Several teammates and I worked on this for a couple of years before it was picked up by the Silverlight team for the 2.0 release. Over the course of that time, some effort went into not completely destroying the ability to build on other OSes. That said, the only shipping product that was using the Rotor project at all was the Mac version of the CoreCLR4. I am positive that not only that you could not build our CoreCLR on other platforms using our source, but that I, myself, included code and/or improper #ifdefs that would make it not work. Not by design, but simply by not having a product for that platform.

Mono/Moonlight is both a blessing and a curse in this regard. As much as I might have wanted a business proposition that would have put a Microsoft-written CoreCLR on more platforms than just MacOS, the environment was/is not ripe for such an idea. The great deal we have with the Mono project means we’re likely to get the platform on many, many more OSes than Microsoft proper would have been willing to fund. On the other hand, the “curse” side is that there really is no other platform than MacOS for the autoconf-based/multi-platform-aware/multiplatform-capable build system to build. No reason at all to have all this extra gook.

In fact, this gook gums up the works somewhat. We broke a whole bunch of original assumptions when we finally released the Mac OS CoreCLR. We went from the autoconf premise that you’re building on the machine that the built code is meant to run on. Instead, we wanted to run on systems 10.4 and up, independently of whether we were building on 10.4, 10.5, or even pre-releases of 10.65. Furthermore, we wanted to be warned of potential future issues on 10.5 and later. As this to the idea, before it was deprecated, that we might build x86 on ppc and vice versa, that there are build tools that we need to create to actually build the product, and then there’s the product itself. The former need to run on the current operating system (even if through some kind of emulation — e.g., if on x86_64, ppc (if Rosetta is installed) and x86 are valid build tool architectures) and the latter need to have the cross-OS-version behavior we want (i.e., not using any deprecated APIs for one of the later OSes, and selectively using new-OS APIs via dlsym or CFBundleFunctionPointerForName or weak-linking. If we had gotten it right (bwahaha), we’d’ve cached config.guess files for other architectures and made sure the built products would Actually Run™ on the platforms for which they were built6. As it stands now, we have this overly complicated system that, yes, allows us to actually use 10.5 compilers to build the stack canary support into the applications we use on 10.4 (and not when we're building using the 10.4 compilers, which is purely an internal testing mechanism), but it also means we pass all sorts of extra grik around:
  • Mac OS SDK
  • Min ver (currently 10.4)
  • TBD: Max ver (as max as we can get it)
  • -arch flag for gcc (since autoconf cannot guess this with any utility uniformly across 10.4/10.5/10.6)
Plus, we use this external mechanism to enforce these same things on our partners’ Xcode projects (not everyone made the decision to use the same build system for both Window & Mac, much less the NTBuild system that we inherited) — we invoke xcodebuild with the specific SDK and various other #defines we want.

However, even when we get this right, and we use the old APIs for older OSes and the new APIs selectively, i.e., on the OSes where they’re supported, there’s no default mechanism for demonstrating that we’re doing it right. Nothing to call out that we have these references to specific APIs that are deprecated, but only in OSes where we expect them to be usable. No handy mechanism to segregate out these uses so that they can be deprecated when we change our own internal requirements to support a newer OS. It’s all internal manual review. I suppose things move slowly in the OS world, but I’d prefer that we’d be able to qualify all of these call sites with the right metadata — use this on downlevel OSes and this on modern OSes and effectively remove it when it’s no longer necessary, with some kind of deprecated code not included warning to let us know, so we can remove it at our leisure.

Notwithstanding these cross-OS-version issues, there’s still the issue that the autoconf-centric mechanism gets stale by default. We regularly create or view new Xcode projects in newer versions of the Xcode toolsets just to see what flags get sent down to gcc/ld so we can emulate the new behavior in the configure.in scripts. There’s a fairly rational argument to be made that we’d require less intervention if we just used Xcode projects for everything. The counter-argument is that if we did, we probably would not have to be as introspective as we are about how projects are built to get them to build, or, more importantly, to Work™ — we’d blithely build and sort out the bodies later.

There’s still no real solution to having multiple config.guess files for the multiple OS versions your app supports and some summarizer that converts that into static code changes for the things that are determinable (like all versions of the OS support dlfcn.h functions), and into runtime code checks plus disparate behavior for things that came into existence with a particular OS version. At this point, is autoconf too unwieldy to keep? Should we move to a simpler mechanism and live with the stuck-to-this-OS-ness it implies? Hard to tell. Any change will require some work. The only question is, is it more work than we would otherwise need to do over the long haul.
--
1Actually, I don’t actually know what they were thinking; this is supposition on my part. It predates my joining the CLR team by a couple years.
2In fact, one of the reasons I got the job is that I complained to them saying that their Rotor v2 release had broken the Mac OS X build, and that I had some patches that would fix some of the damage. This put me in contact with one of cross-platform devs (from which I inherited most of the Mac responsibility ultimately) and the dev lead of the project, who was willing to give me contribute access, even though I was from another Dev organization. When they realized they needed a Mac guy, they talked to me. As it turns out, they might have benefited more from a Darwin/FreeBSD guy, since my knowledge was more old-skool Toolbox than a born-again *nix guy (since I had done *nix work in college when I wasn’t working on 68k code). At least at the time. No longer. Plus now they know more about CoreFoundation than they ever wanted to know.
3Sadly, they didn't have a commercial-quality JIT-compiler for the PowerPC, so they just recirculated the good ole FJIT. (The “F” stands for “fast”, and by “fast”, I mean JIT-compiler throughput, not JIT-compiled-code execution speed.) The results were pretty shoddy — it worked but the word “performant” wouldn’t be seen in the same room as it.
4The build system for Windows Desktop CLR maintained the rotor project for quite some time beyond its last release, in the event it would be shipped again. However, in the spirit of “testing just enough” (i.e., testing sparse points of the matrix), we stopped building the Window rotor project some time ago, presuming we were doing better testing by having a real Mac rotor project that we had to actually ship.
5Don’t even get me started on that you cannot code on Mac OS X to be OS SDK-independent without Major Hackery™. Regular changes to function prototypes break our warnings-as-errors compilations.
6In the days before PowerPC was removed from our list of shipping targets, we had the capacity to build the opposite architecture on the same machine to make sure you hadn’t hosed the other side due to your changes. Well, it was all fine and good, except that it would not run at all. It did a fine job of catching compiler warnings, but at the top of the world, the endien #defines were all wonky because the values for the current build machine had bled into the target architecture. In the event that PowerPC was ever made little endien (not just little endien mode, like the G4 and earlier), perhaps this might have worked.

Tuesday, August 04, 2009

Failure to clean

While considering what to do for MQ1, I ran across a presentation (video) that Peter Provost gave for NDC 2009. He has an awesome analogy between coding and cooking at 31m into the presentation:
I was very fortunate to work for a good chef, and one of the things that he taught me was to be constantly cleaning up as I was cooking… [If you are forced to have the big clean up at the end,] as you’re constantly piling on the scraps of food all over the counter, and the dirty knives and plates, it totally gets in your way. At the end you stop and say, “We can’t cook any more food for an hour; we got to clean up the kitchen.”
If you can imagine that a version six product (where a product cycle is about two years long) is like a kitchen where people have been constantly working in for twelve years, and at no time has anyone actually cleaned it up all the way, and finding clean space to work in and clean tools to do the work with is getting progressively harder and harder, then you have a very good idea of what writing software2 is like.
--
1MQ is Microsoft parlance (perhaps others too) for a “quality” milestone. Milestones is one way to divide up a large set of tasks that you want to work on during the course of a single product cycle, and the ones which add features are generally M1, M2, etc. MQ (or M0) generally starts (or is in the interstice between) product cycles, and generally focuses on infrastructure and code-quality improvements—things that only have a secondary impact on whether customers will want our software, i.e., that we are efficient in making those other changes.
2Well, writing software in a non-agile or limited-agility way.

Tuesday, July 21, 2009

Entourage is taking a work break

I honestly don’t remember when I first started using Entourage. I suspect it was back before Project Athena was named Entourage, and was simply the Mac version of Outlook Express. I’d hooked up my work Exchange server account via IMAP, and later Entourage’s built-in Exchange support. Yesterday, for the first time in many moons, I can’t use Entourage to access my work e-mail.

The reason: We’re moving on up. Many Microsofties’ accounts are getting migrated to Exchange 2010 so we all can dogfood, dogfood, dogfood. And WebDAV in Exchange 2010 has gone the way of the dodo1.

Never fear, intrepid Mac Office users: in lieu of switching to the aging, legacy, Windows-centric MAPI, the Mac Office folks are designing forward to the newfangled Exchange Web Services (EWS), the faster replacement for WebDAV. EWS works on Exchange 2007, and will have expanded capabilities in 2010.

Nonetheless, we dogfooders are stuck choosing between hot dogfood-on-dogfood action (i.e., the latest builds of Entourage EWS against the pre-release Exchange 2010) or waiting until Mac Office makes their release. Since I’m no longer on the team (i.e., not building and debugging it day in and day out), and it’s both about my home and my work data, I’m inclined to sit this one out, and wait on pins and needles until they come out with the new coolness.
--
1A better list of changing APIs for Exchange 2010 can be found here.

Saturday, July 11, 2009

Home networking and its woes

A couple weeks ago, I updated all of the Ethernet hubs in the house to support gigabit ethernet. It’s not that I have many devices that would take advantage of this, but I suspect more and more as time goes on will. Nonetheless, this most recent upgrade still didn’t quite work out perfectly: the Netgear ProSafe GS108 in the office connecting to the Netgear ProSafe GS105 up inside the Leviton Structured Media™ box won’t train to gigabit speeds, always falling back to 100 megabit. Grr. Not sure exactly how to diagnose the problem. (The same problem doesn’t occur between the 105 and an identical 108 unit in a different room.)

At the same time I updated the hubs, I decided it’d be nice to actually have more than one working telephone jack available. (I mean, it’s downright tragic to have this awesome patch system in the Leviton box and yet only have one real phone line—connected to the modem itself.) I could put little DSL filters in front of them, but that posed a problem for the only other (prospective) phone in the house — a wall-plate-mounted kitchen phone. We didn’t have a DSL filter that would fit that and have it stay on the wall. So, I did some research and found that of course Leviton makes a DSL filter board (47616-DSF) to stick into the box. When I got it, I realized that I didn’t have (nor really know how to use) a punchdown tool. Fortunately, one of my friends who is a hardware geek did, and gave me the explanation on how to use it. I rewired the phone to go through the board first, but got zero love -- no signal seemed to come out of the “to modem” (or at least it never reached the modem), even though the phone lines still seemed to work correctly. Double grr. Now it’s re-patched to the original configuration, and there’s still no phone in the kitchen.

After all this, we started noticing that our internet service seemed seriously degraded. We have Qwest “Platinum Package” using Drizzle as our ISP. Today, I looked at the modem’s web interface, and it said that our 7 Mbps connection was connected at 3360 Kpbs. That seemed rather unreasonable, so I did some research and found some ugliness — the current description of “Platinum Package” advertises “up to 7 Mbps” (and that makes sense to some extent, seeing as though they could have the fastest modem connection ever, and the ISP may still be the slow link), but they only guarantee at least 3 Mbps! I don’t remember that being part of the deal when I signed up; perhaps they changed the policy? (It’s not like they publish the historical policy changes so that you can see when the terms of the service changed out from under you.) I thought that it was possible that it was my forays into rewiring the punchdowns that caused the problem, but after connecting the modem directly into the telephone test jack where the phone comes into the house, it got even worse training. Putting it back where it was retrained it to 4400 Kbps, but then I updated the firmware on it (which required a reset and retrain), and now it’s back to 3700 Kbps. Let’s just say that our instant-watch Netflix movies on the Xbox 360 went from 4 bars, sometimes dropping to 3, to starting at 2 bars and bailing out entirely within a couple minutes. Really rather frustrating. Qwest’s web page talks about a “Quantum Package” (aka Fastest) that goes “up to 12 Mbps” (with a guarantee of what?), but their availability query suggests it’s not available for my phone/address.

I am tempted to jump the Qwest DSL ship (or at least them running the show — using Speakeasy or something) and venture into cable-internet. I just really want to have something akin to a guaranteed 1 MB/s (8192 Kbps) down and not have it preclude the ability to connect from the Internet into a machine on the home network.

With all of these sequential failures, I’m even less inclined to continue to plan an updated wireless network, complete with a guest VLAN. I had hoped that perhaps the ReadyNAS NV+ would support some RADIUS service so I could just make the denizens use 802.1x, and route guests only to the internet. Maybe someday.

Friday, July 10, 2009

Silverlight 3

Our partners over in the Silverlight Runtime (SLR, formerly Jolt) have done a bang-up job working on Silverlight 3, which is released to the public today. There are scads of new features. Go check it out!

We did some very targeted features in the CoreCLR for Silverlight 3, but it is largely the same engine as it was before. One of the important parts for Mac users is that there are some changes to be compatible with Snow Leopard in there. (Silverlight 2’s CoreCLR mostly works, but there are some edge cases that might show up issues, depending on the Silverlight application.)

Over here, we have our heads down for the most part, putting the finishing touches on the Visual Studio 2010 (aka Dev10) release. We’ve already released a Beta 1 of the new .NET Framework v4 (including our CLR bits). Finally, the desktop CLR will see some of the stuff that we’ve been showcasing in the CoreCLR! Furthermore, Visual Studio will have several improvements to support the design and debugging of Silverlight content.

It’s always nice to see one’s work finally make it to the public.

Thursday, July 09, 2009

Wednesday, May 20, 2009

Back in action

Just received a replacement hard drive, the 500GB Seagate Momentus 7200.4, and after spending 40m yesterday taking out the old 320GB 5400RPM drive that would periodically stop responding, I’ve been spending time getting my MacBook Pro back into action.

Transferring my Mac OS X partition with Disk Utility and my WinXP bootcamp partition with WinClone went very smoothly, and now I’m finally at the point where I'm going to put a code enlistment back on it. DevDiv for this release is using Team Foundation Server for our source control, and for the Mac side, we’re using the Teamprise client to access the server. It’s churning along in the background considering what files it’ll need to pull down to replace my deleted enlistment.

On the cool side, due to our data-at-rest policies for laptops, I’ve recreated an encrypted sparsebundle disk image for the purpose of storing my source. Historically, I would have just used the -encryption flag to hdiutil and relied on the security of my local machine’s keychain to keep the secret. However, since Gemalto has released Mac OS X tokend plugins for its .NET v2+ cards, I can use the certificate off of my Microsoft badge as the security, now requiring thieves to either crack the stock encryption or to also steal my badge and my PIN. (Good luck with that.)

Building services are going to be replacing the carpet and putting a fresh coat of paint on the walls this evening, so I’m putting all my books and peripherals into boxes. I only hope that my sync is done by the time my office is packed so I can go work in the new commons.