Monday, June 26, 2006

My love affair with Type Libraries

It's just the registration that's annoying.

OLE Automation is one of those pieces of Mac Office which probably isn't understood by most people, even people here in the company. Strictly speaking, it's not really part of Office per se — the Windows version of Office doesn't include it, as it is a part of the OS1. But what the heck does it do?

Mainly, it allows a programmer to do is to define an application's object model and make that model available to be coded against or scripted. The central mechanism for this is what is called the Type Library. A type library contains a list of classes and their properties and methods, along with some annotations2. It is roughly analogous to an AppleScript dictionary, though there are some marked differences3. Most often, a type library is stored in a file, but it can be implemented in code as well, which allows for some extra dynamism4. A scripting client of OLE Automation can use it to read type information out of the type library to determine what can be called. If you go into Visual Basic for Applications, and choose View/Object Browser, the contents of that browser are just a visual representation of the type library (excluding items that are marked as hidden or restricted). Then VBA knows that if you're working with an object that is the Excel Application object, that only so many properties or methods are allowed, and it can autocomplete your code as you're typing. When you run your code, VBA will call into OLE Automation, and OLE Automation will point it at the code in Excel that needs to run.

The type library provides several ways to be a client of the code which the type library describes. At the base level, you can either call into a method directly through an interface pointer's vtable, or you can call into a method by referring to it by name. This latter functionality is defined by the IDispatch interface, specifically the Invoke method. A class can be set up to allow one or the other or both, and in the last case, it's called a dual interface. If you want to use an interface from C/C++ and it supports vtable-calling, then the mktyplib utility, which generates the type libraries, can also emit .h files for including into client projects. Otherwise, you can call IDispatch::Invoke and have it figure out how to call that function. If you don't know what methods you will need when you're compiling your code or script, it's still possible to perform late-time binding, and ask the type library what's available at runtime. This is what VBA does, and why it can handle scripting arbitrary OLE Automation objects, so long as they can be acquired from one of the Office applications' object models, or the object model of VBA itself.

For years, the eldest Office applications, Word, Excel, and PowerPoint, all had two different object models. One was the OLE Automation / VBA model, which was shared with our Windows counterparts. One was the AppleScript model. They may have had overlap, but they were created largely in parallel, and they did not have any kind of feature parity. As a result, whereas sometimes the OLE Automation model would get new methods, the AppleScript model might not get the same methods, and the disparity would increase. Eventually, we had the idea that we could try and unify these things.

I made our first attempt at it, and it didn't succeed. The OLE Automation to AppleScript mapping proved to be sufficiently inadequate for a simple translation layer. There were enough oddities in the way the applications' type libraries were created (and subsequently had scripts writing assuming those oddities would remain), that the AppleScript experience would have been awful.

For Office 2004, Jim Murphy took a slightly different approach; start with the mapping, and then write some custom code to handle the rough edges. As of this writing, there are still some rough edges left, but ultimately, both VisualBasic and AppleScript now code to a (largely) unified object model, and more importantly to a (largely) unified implementation. Less code and more functionality. What could be better?

Under the covers, OLE Automation looks at the type library and performs calculations to figure out how to form a call to the actual function. Implementing IDispatch::Invoke involves knowing how, for this architecture, ABI, and calling convention, to put arguments on the stack or in registers, call the method, and then retrieve the results of the operation. It's one of the few places in the Office products where hand-written assembly is actually necessary5.

But back to my initial caveat. While type libraries allow you to advertise arbitrary object models, one of the main issues has always been, "How do you create an instance of one of these objects?" On both Mac and Windows, the answer has always lived in the registry. When you register a type library, you place registry entries that list what version the type library is, where to find it on disk, and where to find the application or framework (er, DLL) code that implements the objects and methods. Then later, when asked to create a new object of a particular kind (e.g., when you Insert/Object "Microsoft Excel Worksheet" from within Microsoft Word), OLE can look it up in the registry and launch it (if it's an app) or load the shared library, and call the appropriate object creation code.

Now you get into two sets of annoying problems. (1) The files move. On Windows, if you move files from where they were installed, you'd better make sure to re-register the type libraries, or things will (possibly transparently) break. On the Mac, it's a little more resilient (we have aliases after all), but you can still make it break. (2) You have more than one version or more than one copy of the same version. If you register a type library, you're registering just it and just the application or shared library you pick. There can be only one. If you have two or more, you'll get rerouted to the one that was registered last. This is another form of DLL-hell, and whereas DLLs can be distributed with their applications (removing the sharing aspect of shared libraries) to avoid DLL-hell, type libraries can't be fixed the same way.

Ultimately, the problem is no different than the one that LaunchServices (or previously, the Desktop DB) solves. It keeps tracks of applications it's seen, and when asked for "", it can launch it, and it has disambiguating rules it uses to pick the most appropriate "" if there's more than one. As an aside, there's currently no way to leverage LaunchServices by having it cache additional data about files, so it could also track what info is currently in the registry.

My acquaintance with OLE Automation began when MacBU acquired the privilege of maintaining the Mac version of the OLE code base. I've seen its transition to CodeWarrior, through Carbonization, and now we're working on Xcode/Universalization. My best, and frankly irreplaceable, reference for all of this, beyond having access to the code itself, has been Inside OLE, Second Edition, by Kraig Brockschmidt, ISBN 1-55615-843-2, sadly out of print. It goes into much, much more technical detail about all of this and the rest of OLE, and explains it all very clearly.
1 Windows users will know it by a different name, oleaut32.dll.
2 These annotations include such things as alignment and indications of where to find help on this type or method. There's also a mechanism to add custom annotations.
3 Most OLE Automation types have analogues in the AppleScript world, but one of the hard ones is how to deal with methods that return an interface. There's no idea of returning a pointer in the AppleScript world — you have to return an object reference instead. Unfortunately, in some cases, a given interface won't necessarily have a way to turn itself into an object reference. If you arbitrarily created a "temporary" object store just so an object reference would get returned, the script writer would have to clean up after themselves, making sure to delete that object reference when done. While this is certainly possible, it's goes against the general AppleScript paradigm.
4 A type library could be generated on-the-fly at runtime. Usually, this wouldn't be necessary. But it would allow the type library system to be used to interact with a completely different scripting system whose contents wouldn't be known until runtime (or at least might change between the compile time and runtime). For example, it would be theoretically possible to write code that returned an ITypeLibrary interface which represented the contents of an AppleScript dictionary, and have that code be able to translate between OLE Automation types and AppleScript types so that you could use VBA to script a "normal" Macintosh scriptable application.
5 Another such arcane place involves the code that allows an interface to be hosted in a process other than where the implementation code exists. In these cases, we have to intercept the calls, send them via interprocess communication to OLE Automation running in the other process, and it will perform the call on our behalf. (The reverse process occurs when the result is returned.)


Paul Berkowitz said...

Hi, Nathan

Thanks fr the informative post. I knew some of this, but far from all of it.

Could you please go into some detail - maybe an example - of OLE Automation (and VBA) "returning an interface" which cannot be converted to an AppleScript object reference? What's an "interface"?

In Word, the various AppleScript commands that alter text ranges (e.g. collapse range, expand, etc.) all return a text range which us a "new text range" according to the dictionary. You have to set (or reset) a variable to this result to make any further use of the range, unlike VBA whereas the equivalent Methods modify the Range in place, and you can go on using any variable you have previously set to the Range. Is the AppleScript version perhaps a workaround for this issue? I.e. you don't really have a "dynamic" range to work with, inlike VBA.

(At least the AS commands do return results, unlike their VBA equivalents. I seem to recall some early versions where there was no way at all to get ahold of the range you had just modified.)

Nathan Herring said...

An interface, at least in the OLE/COM world, is a pointer to a data structure, at the top of which is a pointer to a table of virtual function pointers (virtual in the C++ sense). The table is called the virtual table or vtable or vtbl. The idea is that the data structure (and vtable) is laid out so that it could effectively act as a C++ object (even if written in C, and in Excel's and Word's case, it is written in C). The problem with being handed a generic interface pointer is that you don't necessarily know what object is supporting it. In the COM world, since all interfaces inherit from IUnknown, you can always call QueryInterface on your interface, asking for a different type of interface for the same context (or object). Unfortunately, it's a lot like playing 20 questions:

OK, so I have this very generic Object pointer. Now what. Are you an Excel Application? No. Are you an Excel Workbook? No. Are you an Excel Worksheet? No. (etc. etc.)

Theoretically, QueryInterfaces only ever returns you an interface that the same object supports, because it "is" also that other thing. (This effectively provides multiple inheritance without all of the undefined properties of how various C++ compilers would handle multiple inheritance.) However, in some cases, an object really owns one or two helper classes as objects, and returns an interface to them, and unless the developer was smart and allowed for overriding of QI in the helpers, you might be able to QI one way and not backwards.

Another difference between a generic C++ object and an AppleScript object is who owns it. AppleScript defines a specific containment hierarchy: The application has a bunch of elements and properties, the classes of which also have elements and properties and so on and so forth. So too, does an application own all of the C++ objects in it, and yet any Application object it has to represent the AppleScript application class may not contain all of them.

For example, the OLE Automation idea of AppleScript "elements" is a collection bound to a class. However, OLE Automation collections needn't be bound to a class; they can be free-floating. You might create an arbitrary collection and add specific elements to it, and then perform an operation over the entire collection, and toss the collection, since all it ever really did was keep track of other objects that still exist. During that time, that collection had no real "address" in the AppleScript world. The closest thing that could be done is to create a sort of "temporaries" elements of the application class, and "make new collection at end of temporaries". That way, there'd actually be a way to refer to these ephemeral objects. Again, though, temporaries would be a little unwieldy; they go away whenever the app quits, as they have no storage. If the applescript user made use of such objects, they'd have to be careful about cleanup over the lifetime of an application run, without which, they would just use up more and more memory (like a memory leak, only you'd still be able to refer to the data, e.g. "get all temporaries"). Also, a lot of methods return these temporaries. Turning them all into "make new" statements is hard, but having them be new verbs that have a byproduct of making new objects is not very AppleScript-y.

Also, theoretically, we could create some kind of new interface, e.g., "addressable", that all objects that you can get to from the hierarchy off of the application object would implement. That way, we at least wouldn't have to play twenty questions, but it would mean add code to most every object.

As for your text range question, I'd have to look into how it was implemented. If the text range is an ephemeral object (i.e., not addressable from application, and thus unable to return an object reference when necessary) then that might be the issue.

Clarence Glasse said...

Mac OLE and Mac OLE Automaton - ahh, good times. :-)

Good luck with your CLR job.

till busch said...

hi nathan,

i am developing a cross-platform application that uses ms-office for reports.

now i'm looking for a way to automate ms office for mac. unfortunately i did not find any sdk from microsoft.

doing some research i found ms office for mac itself includes MicrosoftOLEAutomation.framework. is there any straight-forward way to access ole functions on mac.

i'd need functions like:

is there a sdk availiable in some hidden place? i'd really like to avoid writing a new interface using applescript from c++. i could probably put together a header file with all the types and functions needed and then just link against MicrosoftOLEAutomation.framework.

thank you for any hints,


Nathan Herring said...


Sadly, the Mac OS X version of Microsoft OLE Automation is not public anymore, which means we do not produce an SDK and we don't support making calls to its (private) APIs.

This sad for multiple reasons, the primary one is that the there's no standard for OLE-style code on OS X. Thus, Mac OS X's own plug-in architecture which uses IUnknown-style COM-like interfaces aren't defined identically with Microsoft OLE's. Furthermore, even Microsoft OLE isn't quite internally consistent with its Windows-self on i386, using cdecl instead of stdcall, which has its own interop challenges with other Microsoft cross-platform software.

On the other hand, not having to perform exhaustive platform-wide test-suites on the code to ensure its perfect operation saved resources and let us get those products out the door to customers faster -- we only had to test the parts of OLE / OLE Automation that Microsoft Office for the Mac used.

That said, runing "nm" on MicrosoftOLE.framework/MicrosoftOLE reveals an internal symbol _CoCreateInstance, and on MicrosoftOLEAutomation.framework/MicrosoftOLEAutomation reveals an internal symbol _DispGetIDsOfNames. So, they're there, but you're basically going to have to reverse engineer how to get these frameworks properly initialized and hack up your own alternate version of the Windows SDK headers to build on the Mac in order to stand any chance of solving your problem in the direct via C++ way. And let me reiterate that this is wholly unsupported.

If you want access to the SDK, you'll have to post it at the Mac Office forums. There doesn't appear to be a (public) Connect entry for Mac Office.