Thursday, August 19, 2010

Unexpected Mach port refcounts

I recently found two cases where it was not obvious to me that I would have had ownership of a reference of a Mach port, and resulted in a Mach port “leak”:
  1. mach_thread_self() implies adding a MACH_PORT_RIGHT_SEND on the thread port. It must be followed with a matching mach_port_deallocate call when the reference is no longer needed.
  2. Mach messages received from the kernel after registering with thread_swap_exception_ports may contain Mach ports, depending on the flavor of message. Any such ports must be deallocated by the receiver. If you pass the message along, then you can get away with just transferring ownership along as well, presuming that the next mach port in the chain wants a sufficiently similar flavor of exception message as you do.
In our case, the extra un-deallocated references to the thread port would eventually turn into dead names after the thread died, but would never go away. As a result, if you went through enough threads over enough time, you’d starve your application for Mach ports and it would go unresponsive, crash, or otherwise terminate. In addition, we would incur (via the exception messages) ever increasing refcount on the task port, with somewhat unclear implications.1

Although I was able to use the dtrace example earlier to do some early diagnostics, since mach_port_allocate was not the only generator of Mach ports in the process (e.g., the kernel when sending us exception messages), it was not sufficient. Instead, I ended up using Apple’s own sample, MachPortDump, in combination with some strategically placed breakpoints in gdb to try and narrow down when the ports were coming into existence, and what the ports actually were.2

It really is too bad that Apple deems these APIs to be unworthy of documenting. Admittedly, only a few developers really need to get down into this nitty-gritty, but if the only choices are reading the 17-year-old Programming Under Mach whose API references are out-of-date and whose code samples don’t deallocate these ports either, reading the darwin source (not an option for me), pestering ADC, or stumbling around in the dark, I can wager what will happen, and what the quality of the resulting products will be on the OS. Perhaps the consumers will lay the blame at the application developer’s feet, or at Apple’s, or both. In any case, it makes it hard to do things the right way, which I’m pretty sure we all want. (Or at least want slightly more than doing it the wrong way — not doing it at all isn’t an option.)
----
1What does happen when you wrap the refcount number on a Mach port? Hm…
2MachPortDump just lists what ports are in the process and what kind/rights it has, not what the port is related to. You have to be a little creative to figure out what the port being incremented belongs to, e.g., (mach_port_t)pthread_mach_thread_np((pthread_t)pthread_self()) to figure out what the current thread’s port is. AFAICT, there’s no Mac OS X analog to netmsgserver to annotate ports.

Wednesday, August 11, 2010

Recording the thread ID in Mac OS X Instruments

Hidden in the depths of the man page for dtrace(1), there is a reference to a new built-in variable, tid:
tid A uint64_t thread ID of the currently executing thread…
What this means is that it’s trivial to log the thread ID associated with a call, which is incredibly useful for multi-threaded applications. (Can you say, “GCD?” I knew you could.) In a custom probe, you can record it as follows:

Monday, August 09, 2010

Thread-based thread-level Mach exception handler and debugging

Apple implemented an awesome feature in their version of gdb in Xcode 3.2 (SnowLeopard is minimum OS requirement) for those (I hope few) of us1 who (1) have implemented a thread-based Mach exception handler, (2) actually use it with a mask for EXC_BREAKPOINT to conditionally self-handle breakpoints, and (3) still want to be able to debug the result: thread dont-suspend-while-stepping. The problem that occurs is that when gdb is stepping (either by user specified command or in an attempt to get to a safe-to-read-objc state), it freezes all the other threads in your task. Normally, this is exactly what you want—isolation from the other aspects of your program while you inspect and complete something under the watchful eye of the debugger.2 However, if one of those threads is the thread that is set up to handle a thread-level exception by looping receiving messages off of a Mach port, its Mach port gets sent the message (before gdb’s, which is a task-level exception handler, which gets second shot at such exceptions), and the thread never is woken up to handle it, so gdb deadlocks. Enter thread dont-suspend-while-stepping. You tell it to not bother suspending your exception thread, and now it is awake to handle the exception message, deal with it, and respond back to the kernel. Then, unless it intercepts a temporary breakpoint or a single-step exception that wasn’t set by it, the Mach exception message will then be sent to gdb’s handler, and then gdb will resume control nicely. If you want to do this automagically, you can set yourself a breakpoint on thread entry function for your exception-handling thread, have it prevent its own suspension, and then continue on. The general code to run would be:
thread dont-suspend-while-stepping on -port ((mach_port_t)pthread_mach_thread_np((pthread_t)pthread_self()))
pthread_self() will return the current pthread, and then pthread_mach_thread_np will return the Mach port given a pthread.
----
1 At the moment, I suspect that the people who are actually doing this are
  • us, i.e., Mac CoreCLR
  • Java
  • Flash
It’d be interesting to know if there were yet others, especially if there’s any likelihood you’ll find yourself in a browser process-space. There are some unfortunate interactions that occur when each of these apps either stomp on each others’ exception handling registration, or try to forward messages along the “chain” of handlers. To wit, if you’re calling thread_set_exception_ports, you’re definitely doing it wrong, and if you’re calling thread_swap_exception_ports, it’s only that it’s very likely that you’re doing it wrong. Less so if you’re doing it on entry/exit of your special code. Much more so otherwise.
2 It’s not a panacea, in that if you have complicated timing issues between two threads, you’ll need to be a little more inventive.

Thursday, August 05, 2010

Dipping my toes into Cocoa again

Last time I did any significant Cocoa programming was prototyping a Installer-plug-in-style modular pane system for implementing one’s own Setup Assistant, in our case, a Mac Office Setup Assistant. I think it was a toss-up between the Hypercard-like architecture of the system or the product ID field editor which I liked more. But I left the team to head to the CLR before it was complete, and it’s unclear how much of that work was used, cherry-picked, or thrown out for Mac Office 2008.

Since then, the clients of my code have almost all been C++, and my memories of .nibs and actions and outlets have faded. Last week, though, one of the partners using Cocoa reported an issue. I sat down to try and write a simple Cocoa version of our C++ client (with some actual UI) and knew enough to know I didn’t remember enough to sit down and do it. Perhaps if I had spent more than a few months developing Cocoa UI a few years back, it would have stuck better. I popped open my Cocoa® Programming for Mac® OS X (2nd ed.), and bogged down nigh immediately: quite a bit had transpired in the Interface Builder world since 2004. I don’t remember any @prototype or @synthesize. So, I shelved the project, ordered the 3rd edition (published in 2008), and worked on other projects.

Now that it’s arrived, I’m looking forward to sitting down with it, Xcode, and Interface Builder and making a simple CoreCLR loading application. It will have use beyond this simple reproduction case, I wager.

Wednesday, August 04, 2010

No such thing as a Microsoft permalink, part 2

While I’m at it, let me remind us all of some interesting HTTP result codes1:
  • 301: Moved Permanently
  • 307: Temporary Redirect, or alternatively 302: Found (elsewhere; continue to use this link in the future though, as it may get moved)
  • 410: Gone
Microsoft has enough storage capability and page-identity capability (GUIDs anyone?) that we could always respond with one of the above HTTP results for any link that we have ever put up on our public websites. The first would be in the event of a reorganization of content. The second would be for whatever forward links we’d created, even if “Temporary” would be a bit of a misnomer. (i.e., it’s rather “perhaps temporary” or “not guaranteed to be permanent”). The last was for content that ended its lifecycle and it and its forward links have finally become deprecated.

In fact, we could go one further and provide warnings in some fashion to “near permanent” and forward links that are about to become deprecated. We could take advantage of the §14.46 Warning header, e.g., Warning code 299: Miscellaneous persistent warning, where the warning text warns (in the character set of the sender) that this particular URL will “die” on such-and-such a date, and give another constructed URL to use to get more information. (And that URL could give the Microsoft’s content deprecation policy, if any, along with perhaps a form where you can give it a URL and it can return its deprecation date.) Alternatively, we could do something more simply, and have any such to-be-deprecated links redirect (via 307) to a URL whose namespace specifically indicates its imminent deletion, e.g., http://www.microsoft.com/to-be-deleted-20010804/someidentifier.aspx. Either of these two mechanisms could be used by external parties who wanted to ensure the persistence of Microsoft content upon which they depend.

Heck, we could go even one further, and spin off a 3rd party Windows Live/Azure service that did the persistence automatically for a fee, so that Microsoft is no longer responsible for ancient, deprecated content, except in that it supports our own cloud services: Microsoft Web Vault.

In any case, 3rd party publishers could use these mechanisms if2 they wanted to ensure their publications don’t become obsolete too early.
----
1 See also RFC 2616 §10.
2 Not always a good assumption. Publishers like people to buy successive editions of their software documentation at least as much as software publishers like people to buy successive editions of their software.

Tuesday, August 03, 2010

No such thing as a Microsoft permalink

As far as I can tell, anyway. While reading 3rd party documentation1 about our own technology, I often run into links to msdn.microsoft.com that are 404s. I know that internally, we use redirection links when we expect content to move over the course of the lifespan of a product that might need to use it (e.g., for help support), but it seems silly to force publishers to do the same. Should it also be their responsibility, in the case where MSDN does a content reorg and deprecates old content, keep a local cache on their own website to support their own books? It would be seriously nice if there were (1) a way to request a permalink2, and (2) be able to instantly tell whether a URL link was a permalink or an ephemeral link, to know whether or not it should go into a publication / blog post / etc.
----
1 e.g., Professional JavaScript, which I ended up borrowing from one of the authors. Thanks, Stuart!
2 Or at least one that had a specifically determinable lifetime that the publisher could then caveat in the introduction, or take steps to do the caching I mentioned before.

Monday, August 02, 2010

Monday is the day to take on the hard task

Having read recently about The Top Idea in Your Mind (thanks McMichael!) and reading about getting things done, I am positing that the best day to work on the hard task is Monday. (And yes, I realize that I’m procrastinating right now—baby steps.)

I’ve gotten to the point in my work/life-balance where I can actually put my work down at the end of the day on Friday (eventually) and largely not consider it until Monday. This is an improvement over having my wife wondering why I’ve sequestered myself in the office to connect to work over VPN and twiddle some bits. On the other hand, since I don’t do a prep-time for work Sunday night (and I may yet adopt this behavior), on Monday morning, I’m generally thinking only of personal Deep Thoughts™ rather than the work break-through I might be needing.

This comes to, until now, what had been my general Monday workday makeup: 75% reading through new e-mail and re-reading through last week's e-mail and piece-wise following up on the low-priority but small-cost items that I could accomplish. 25% contemplation of the actual task for which I’m either planning, or worse planning to plan. My estimation is that I don’t actually “page in” enough context by the end of the day for good analysis, and so it’s only Tuesday where I actually start useful work on the hard problems.

Thus, I have the intuition that pushing the small bits until Tuesday (or later) is the right way. Even if I don’t make real progress on Monday, spending enough time to fully consider the problem allows for a Monday night dream-solve and Tuesday morning shower-revelation, and probably sets the stage for even better progress on Tuesday.

The little bits are always something that can be dealt with in the interstices that inevitably occur during planning and development. (There’s a reason to have multiple machines and/or multiple source control enlistments simultaneously—parallel processing for the win!) And since they don’t need the same context switch, or have the same priority, best to postpone them until you’re already on a roll.

Now, to buckle down and implement this plan…