Thursday, August 19, 2010

Unexpected Mach port refcounts

I recently found two cases where it was not obvious to me that I would have had ownership of a reference of a Mach port, and resulted in a Mach port “leak”:
  1. mach_thread_self() implies adding a MACH_PORT_RIGHT_SEND on the thread port. It must be followed with a matching mach_port_deallocate call when the reference is no longer needed.
  2. Mach messages received from the kernel after registering with thread_swap_exception_ports may contain Mach ports, depending on the flavor of message. Any such ports must be deallocated by the receiver. If you pass the message along, then you can get away with just transferring ownership along as well, presuming that the next mach port in the chain wants a sufficiently similar flavor of exception message as you do.
In our case, the extra un-deallocated references to the thread port would eventually turn into dead names after the thread died, but would never go away. As a result, if you went through enough threads over enough time, you’d starve your application for Mach ports and it would go unresponsive, crash, or otherwise terminate. In addition, we would incur (via the exception messages) ever increasing refcount on the task port, with somewhat unclear implications.1

Although I was able to use the dtrace example earlier to do some early diagnostics, since mach_port_allocate was not the only generator of Mach ports in the process (e.g., the kernel when sending us exception messages), it was not sufficient. Instead, I ended up using Apple’s own sample, MachPortDump, in combination with some strategically placed breakpoints in gdb to try and narrow down when the ports were coming into existence, and what the ports actually were.2

It really is too bad that Apple deems these APIs to be unworthy of documenting. Admittedly, only a few developers really need to get down into this nitty-gritty, but if the only choices are reading the 17-year-old Programming Under Mach whose API references are out-of-date and whose code samples don’t deallocate these ports either, reading the darwin source (not an option for me), pestering ADC, or stumbling around in the dark, I can wager what will happen, and what the quality of the resulting products will be on the OS. Perhaps the consumers will lay the blame at the application developer’s feet, or at Apple’s, or both. In any case, it makes it hard to do things the right way, which I’m pretty sure we all want. (Or at least want slightly more than doing it the wrong way — not doing it at all isn’t an option.)
1What does happen when you wrap the refcount number on a Mach port? Hm…
2MachPortDump just lists what ports are in the process and what kind/rights it has, not what the port is related to. You have to be a little creative to figure out what the port being incremented belongs to, e.g., (mach_port_t)pthread_mach_thread_np((pthread_t)pthread_self()) to figure out what the current thread’s port is. AFAICT, there’s no Mac OS X analog to netmsgserver to annotate ports.

Wednesday, August 11, 2010

Recording the thread ID in Mac OS X Instruments

Hidden in the depths of the man page for dtrace(1), there is a reference to a new built-in variable, tid:
tid A uint64_t thread ID of the currently executing thread…
What this means is that it’s trivial to log the thread ID associated with a call, which is incredibly useful for multi-threaded applications. (Can you say, “GCD?” I knew you could.) In a custom probe, you can record it as follows:

Monday, August 09, 2010

Thread-based thread-level Mach exception handler and debugging

Apple implemented an awesome feature in their version of gdb in Xcode 3.2 (SnowLeopard is minimum OS requirement) for those (I hope few) of us1 who (1) have implemented a thread-based Mach exception handler, (2) actually use it with a mask for EXC_BREAKPOINT to conditionally self-handle breakpoints, and (3) still want to be able to debug the result: thread dont-suspend-while-stepping. The problem that occurs is that when gdb is stepping (either by user specified command or in an attempt to get to a safe-to-read-objc state), it freezes all the other threads in your task. Normally, this is exactly what you want—isolation from the other aspects of your program while you inspect and complete something under the watchful eye of the debugger.2 However, if one of those threads is the thread that is set up to handle a thread-level exception by looping receiving messages off of a Mach port, its Mach port gets sent the message (before gdb’s, which is a task-level exception handler, which gets second shot at such exceptions), and the thread never is woken up to handle it, so gdb deadlocks. Enter thread dont-suspend-while-stepping. You tell it to not bother suspending your exception thread, and now it is awake to handle the exception message, deal with it, and respond back to the kernel. Then, unless it intercepts a temporary breakpoint or a single-step exception that wasn’t set by it, the Mach exception message will then be sent to gdb’s handler, and then gdb will resume control nicely. If you want to do this automagically, you can set yourself a breakpoint on thread entry function for your exception-handling thread, have it prevent its own suspension, and then continue on. The general code to run would be:
thread dont-suspend-while-stepping on -port ((mach_port_t)pthread_mach_thread_np((pthread_t)pthread_self()))
pthread_self() will return the current pthread, and then pthread_mach_thread_np will return the Mach port given a pthread.
1 At the moment, I suspect that the people who are actually doing this are
  • us, i.e., Mac CoreCLR
  • Java
  • Flash
It’d be interesting to know if there were yet others, especially if there’s any likelihood you’ll find yourself in a browser process-space. There are some unfortunate interactions that occur when each of these apps either stomp on each others’ exception handling registration, or try to forward messages along the “chain” of handlers. To wit, if you’re calling thread_set_exception_ports, you’re definitely doing it wrong, and if you’re calling thread_swap_exception_ports, it’s only that it’s very likely that you’re doing it wrong. Less so if you’re doing it on entry/exit of your special code. Much more so otherwise.
2 It’s not a panacea, in that if you have complicated timing issues between two threads, you’ll need to be a little more inventive.

Thursday, August 05, 2010

Dipping my toes into Cocoa again

Last time I did any significant Cocoa programming was prototyping a Installer-plug-in-style modular pane system for implementing one’s own Setup Assistant, in our case, a Mac Office Setup Assistant. I think it was a toss-up between the Hypercard-like architecture of the system or the product ID field editor which I liked more. But I left the team to head to the CLR before it was complete, and it’s unclear how much of that work was used, cherry-picked, or thrown out for Mac Office 2008.

Since then, the clients of my code have almost all been C++, and my memories of .nibs and actions and outlets have faded. Last week, though, one of the partners using Cocoa reported an issue. I sat down to try and write a simple Cocoa version of our C++ client (with some actual UI) and knew enough to know I didn’t remember enough to sit down and do it. Perhaps if I had spent more than a few months developing Cocoa UI a few years back, it would have stuck better. I popped open my Cocoa® Programming for Mac® OS X (2nd ed.), and bogged down nigh immediately: quite a bit had transpired in the Interface Builder world since 2004. I don’t remember any @prototype or @synthesize. So, I shelved the project, ordered the 3rd edition (published in 2008), and worked on other projects.

Now that it’s arrived, I’m looking forward to sitting down with it, Xcode, and Interface Builder and making a simple CoreCLR loading application. It will have use beyond this simple reproduction case, I wager.

Wednesday, August 04, 2010

No such thing as a Microsoft permalink, part 2

While I’m at it, let me remind us all of some interesting HTTP result codes1:
  • 301: Moved Permanently
  • 307: Temporary Redirect, or alternatively 302: Found (elsewhere; continue to use this link in the future though, as it may get moved)
  • 410: Gone
Microsoft has enough storage capability and page-identity capability (GUIDs anyone?) that we could always respond with one of the above HTTP results for any link that we have ever put up on our public websites. The first would be in the event of a reorganization of content. The second would be for whatever forward links we’d created, even if “Temporary” would be a bit of a misnomer. (i.e., it’s rather “perhaps temporary” or “not guaranteed to be permanent”). The last was for content that ended its lifecycle and it and its forward links have finally become deprecated.

In fact, we could go one further and provide warnings in some fashion to “near permanent” and forward links that are about to become deprecated. We could take advantage of the §14.46 Warning header, e.g., Warning code 299: Miscellaneous persistent warning, where the warning text warns (in the character set of the sender) that this particular URL will “die” on such-and-such a date, and give another constructed URL to use to get more information. (And that URL could give the Microsoft’s content deprecation policy, if any, along with perhaps a form where you can give it a URL and it can return its deprecation date.) Alternatively, we could do something more simply, and have any such to-be-deprecated links redirect (via 307) to a URL whose namespace specifically indicates its imminent deletion, e.g., Either of these two mechanisms could be used by external parties who wanted to ensure the persistence of Microsoft content upon which they depend.

Heck, we could go even one further, and spin off a 3rd party Windows Live/Azure service that did the persistence automatically for a fee, so that Microsoft is no longer responsible for ancient, deprecated content, except in that it supports our own cloud services: Microsoft Web Vault.

In any case, 3rd party publishers could use these mechanisms if2 they wanted to ensure their publications don’t become obsolete too early.
1 See also RFC 2616 §10.
2 Not always a good assumption. Publishers like people to buy successive editions of their software documentation at least as much as software publishers like people to buy successive editions of their software.

Tuesday, August 03, 2010

No such thing as a Microsoft permalink

As far as I can tell, anyway. While reading 3rd party documentation1 about our own technology, I often run into links to that are 404s. I know that internally, we use redirection links when we expect content to move over the course of the lifespan of a product that might need to use it (e.g., for help support), but it seems silly to force publishers to do the same. Should it also be their responsibility, in the case where MSDN does a content reorg and deprecates old content, keep a local cache on their own website to support their own books? It would be seriously nice if there were (1) a way to request a permalink2, and (2) be able to instantly tell whether a URL link was a permalink or an ephemeral link, to know whether or not it should go into a publication / blog post / etc.
1 e.g., Professional JavaScript, which I ended up borrowing from one of the authors. Thanks, Stuart!
2 Or at least one that had a specifically determinable lifetime that the publisher could then caveat in the introduction, or take steps to do the caching I mentioned before.

Monday, August 02, 2010

Monday is the day to take on the hard task

Having read recently about The Top Idea in Your Mind (thanks McMichael!) and reading about getting things done, I am positing that the best day to work on the hard task is Monday. (And yes, I realize that I’m procrastinating right now—baby steps.)

I’ve gotten to the point in my work/life-balance where I can actually put my work down at the end of the day on Friday (eventually) and largely not consider it until Monday. This is an improvement over having my wife wondering why I’ve sequestered myself in the office to connect to work over VPN and twiddle some bits. On the other hand, since I don’t do a prep-time for work Sunday night (and I may yet adopt this behavior), on Monday morning, I’m generally thinking only of personal Deep Thoughts™ rather than the work break-through I might be needing.

This comes to, until now, what had been my general Monday workday makeup: 75% reading through new e-mail and re-reading through last week's e-mail and piece-wise following up on the low-priority but small-cost items that I could accomplish. 25% contemplation of the actual task for which I’m either planning, or worse planning to plan. My estimation is that I don’t actually “page in” enough context by the end of the day for good analysis, and so it’s only Tuesday where I actually start useful work on the hard problems.

Thus, I have the intuition that pushing the small bits until Tuesday (or later) is the right way. Even if I don’t make real progress on Monday, spending enough time to fully consider the problem allows for a Monday night dream-solve and Tuesday morning shower-revelation, and probably sets the stage for even better progress on Tuesday.

The little bits are always something that can be dealt with in the interstices that inevitably occur during planning and development. (There’s a reason to have multiple machines and/or multiple source control enlistments simultaneously—parallel processing for the win!) And since they don’t need the same context switch, or have the same priority, best to postpone them until you’re already on a roll.

Now, to buckle down and implement this plan…

Friday, January 29, 2010

How little Johnny application doesn’t play well with others

Spotted this post on the RSS feed about New World vs. Old World computing, making the argument that Old World desktops are all-purpose machines, vs. New World computers (currently, mainly the iPhone/iPod) are more narrow-purpose machines, and even when they have more than one purpose, the purposes tend to be siloed and sandboxed.

Notwithstanding the arguments about the computational and energy savings of only running one app (and limited system services) at a time, there’s a good reason for the siloed/sandboxed nature of these applications, and that is that by default, applications do not work well together on an Old World desktop. Despite the best attempts by platform designers to engineer a product where you generally don’t step on other apps’ toes, and best practices published that try to bridge the gap between the platform’s strictures and programs’ capabilities, apps step on each other all the time. They incorrectly rely on shared state1, or worse, destructively edit shared state that happens to work fine (for them). They are not uniform in the way that they share data between applications (copy/pasteboards and the formats provided, file system and the file formats of the files, sockets and the protocols that run over them). The data you want in one app might be data (perhaps slightly tweaked) that you want in another app, but unless you’re a Excel or Perl2 wizard, getting from A to B is a whole lot of work. Even in systems designed by the same company to work together sometimes fail in these regards, especially as their feature sets and complexity grow3. Lastly, there is no nice model for assigning execution control to applications; customers rarely know when some hidden plug-in is monopolizing their system and causing scrolling or typing to slow to a crawl. How many times have you just wanted to read one e-mail, but some untold number of system agents conspire to time-share your processor enough to steal time from what you really need to do?

And that’s when you actually trust the application to run on your box. For most software manufacturers, you can generally trust that they didn’t mean to do any harm (and yet they still do). Then there’s the class of app-makers who write apps that are just-useful-enough that you install them and all the malware / exploits / junk that came along for the ride. Since the average Joe doesn’t know the difference necessarily, he’s either afraid to click on the utility that might just solve his problem, or he’s opening up his system to possible attack vectors by clicking on the utility, or he’s incurring the not-insubstantial overhead4 of running anti-virus/anti-spyware programs.

The siloed approach capitalizes on the generally limited way these programs communicate their data by making that the choke point–data flows can be to internet site(s) or the user through UI, and that’s it. No tromping on other apps’ data. No mismatched communication with another app on the same box through the traditional pathways (and even web services don’t work since you won’t have both apps up at the same time). There still may be issues about spyware, and it’s unclear whether an app validation service such as Apple’s could clearly tell whether a user’s privacy would (possibly) be violated. But for the most part, you running an app means pretty much that you can only really take down your own app, and spend as much battery as you want doing so. Siloing means pushing the problem of communicating that data between local services to be between cloud services (e.g., Facebook and twitter integration), where there’s now a whole new playfield where cloud apps won’t play well with others. But it does mean that your local box will do as well as its purpose-driven cloud support will do, and without interference. And that means a lot.
1Can you say, “registry”? (Or machine state, but that’s not a problem solved by New World machines either.)
2Insert any reasonably advanced script language that parses (regexes) strings and byte data and permutes the same.
3Not that Microsoft has any examples of these. ;0
4Both in money and human/computer time.

Running ClickOnce apps at startup

We use an internal tool to keep a watchful eye on the various bug databases we might have bugs in, to get notifications when new ones come in and to provide a fairly reasonable summary UI. This was internally released as a ClickOnce (Wikipedia) application using .NET. I thought it best to put it in the startup group to run on login, but found after working a number of times, it stopped launching at some point. Turns out, this is because you can’t put the application reference in the Startup items since it might move as later versions get installed. Instead, follow this article by Keith Elder to create a regular old internet shortcut and get the right behavior.

Friday, January 08, 2010

Ellen's Black Beans and Rice recipe

Title: Black Beans and Rice
Serves: 8-10

1 lb black beans
1 lb smoked link sausage
0 - ham hocks (optional)
2 - onions (diced)
1 can RoTel tomatoes & green chilis
1 tsp Tony Chachere's Creole seasoning
1 can chopped tomatoes (large can)

Prepare black beans according to instructions on bag for quick cooking, drain, then cook in pressure cooker. Fry/steam the smoked link sausage in a frying pan, drain, and cut into small pieces. To the beans add the sausage, ham hocks, onions, tomato & green chilis, creole seasoning, and chopped tomatoes. Cook long and slow, simmering until sauce thickens. If you use a crock pot, leave beans uncovered so sauce can cook down. Serve over rice with salsa and sour cream.