Bridge
September 9th, 2006

A Brief History

Mac OS 9 and NEXTSTEP were like two icy comets wandering aimlessly in space. Neither was really going anywhere in particular. And then, BANG! They collide, stick wetly together, spinning wildly! Thus was born Mac OS X - or so the legend goes.

How do you take these two comets, err, operating systems, and make a unified OS out of them? On the one hand, you have the procedural classic Macintosh Toolbox, and on the other you have object oriented OPENSTEP, as different as can be - and you're tasked with integrating them, or at least getting some level of interoperability. What a headache!

You might start by finding common ground - but there isn't much common ground, so you have to invent some, and you call it (well, part of it) CoreFoundation. Uh, let's abbreviate CoreFoundation "CF" from now on. CoreFoundation will "sit below" both of these APIs, and provide functions for strings and dates and other fundamental stuff, and the shared use of CF will serve as a sort of least common demoninator, not only for these two APIs but also for future APIs. These two APIs will be able to talk to each other, and to future APIs, with CF types.

Ok, so the plan is to make these two APIs, the Mac Toolbox and OPENSTEP, use CF. Adding CF support to the Mac Toolbox is not that big a deal, because the Mac Toolbox APIs have to change anyways, to become Carbon. But the OPENSTEP APIs don't have the same sort of problems, and shouldn't have to change much to become Cocoa.

Like, for example, the Toolbox uses Pascal strings, and those have unfortunate length limitations and ignorance of Unicode, so we want to get rid of them - so we might as well use the interoperable replacement as the native string type in Carbon. But OPENSTEP's NSString is already pretty nice. It would be a shame to have to make CFString replacements for all those Cocoa APIs that take and return NSStrings, just for interoperability with Carbon.

Rough Draft

So the solution is obvious, right? Just make NSString methods to convert to and from CFStrings.

@interface NSString (CFStringMethods)
- (CFStringRef)getCFString;
+ (NSString *)stringFromCFString:(CFStringRef)stringRef;
@end

So whenever you want to talk to Carbon, you get a CFStringRef from your NSString, and whenever you get a CFStringRef back from Carbon, you make an NSString out of it. Simple! But this isn't what Apple did.

Second Revision

"Hey," you say. Some of you say. "I know what Apple did. I'm not so easily fooled! Check out this code:"
	#include <Foundation/NSString.h>
	int main(void) {
		NSLog(NSStringFromClass([@"Some String" class]));
		return 0;
	}
"What does that output? NSCFString. NSCFString. See? NSStrings must be really CFStrings under the hood! And you can do that because NSString is a class cluster - it's an abstract interface. So that's how you achieve interoperability: you implement NSStrings with CFStrings (but preserve the NSString API) and then all NSStrings really *are* CFStrings. There's no conversion necessary because they're the same thing.

"That's how toll free bridging works!"

But hang on a minute. You just said yourself that NSString is an abstract interface - that means that some crazy developer can make his or her own own subclass of NSString, and implement its methods in whatever wacky way, and it's supposed to just work. But then it wouldn't be using CFStrings! It would be using some other crazy stuff. So when a Cocoa API gets a string and wants to do something CF-ish with it, the API would have no way of knowing if the string was toll-free bridged - that is, if it was really a CFString or a, y'know, FishsWackyString, without checking its class, and then it would have to convert it...blech!

Final Draft

So that's a problem: Apple wants to toll free bridge - to be able to use NSStrings as CFStrings without conversion. But to do that, Apple also needs to support wacky NSString subclasses (that don't use CFStrings at all) in the CFString API. That means making a C API that knows about Objective-C objects.

A C API that handles Objective-C objects? That's some deep deep voodoo, man. But we have it and it works, right? We can just cast CFStringRefs to NSStrings, and vice versa, and for once in our lives we get to feel smug and superior, instead of stupid, when the compiler warns about mistmatched pointer types. "Look, gcc, I know it says CFStringRef, but just try it with that NSString. Trust me." It's great! Right?

But how does it work? We could check, if only CoreFoundation were open source!

...

Oh, right. So let's look at the CFStringGetLength() function and see what happens if you give it a weird string.

	CFIndex CFStringGetLength(CFStringRef str) {
	    CF_OBJC_FUNCDISPATCH0(__kCFStringTypeID, CFIndex, str, "length");
	    __CFAssertIsString(str);
	    return __CFStrLength(str);
	}
Any ideas where the Objective-C voodoo is happening here? ANYONE? You in the back? CF_OBJC_FUNCDISPATCH0 you say? I guess it's worth a try.

CF_OBJC_FUNCDISPATCH0

So CF_OBJC_FUNCDISPATCH0 is the magic that supports Objective-C objects. Where's CF_OBJC_FUNCDISPATCH0 defined? Here:

	// Invoke an ObjC method, return the result
	#define CF_OBJC_FUNCDISPATCH0(typeID, rettype, obj, sel) \
		if (__builtin_expect(CF_IS_OBJC(typeID, obj), 0)) \
		{rettype (*func)(const void *, SEL) = (void *)__CFSendObjCMsg; \
		static SEL s = NULL; if (!s) s = sel_registerName(sel); \
		return func((const void *)obj, s);}

Yikes! Let's piece that apart:

	if (__builtin_expect(CF_IS_OBJC(typeID, obj), 0))
If we're really an Objective-C object...

	rettype (*func)(const void *, SEL) = (void *)__CFSendObjCMsg;
...treat the function __CFSendObjCMsg as if it takes the same arguments as a parameterless Objective-C method (that is, just self and _cmd)...
	static SEL s = NULL; if (!s) s = sel_registerName(sel);
...look up the selector by name (and stash it in a static variable so we only have to do it once per selector)...
	return func((const void *)obj, s);
...and then call that __CFSendObjCMsg() function. What does __CFSendObjCMsg() do?
	#define __CFSendObjCMsg 0xfffeff00
0xfffeff00? What the heck? Oh, wait, that's just the commpage address of objc_msgSend_rtp(). So __CFSendObjCMsg() is just good ol' objc_msgSend().

CF_IS_OBJC

That leaves us with __builtin_expect(CF_IS_OBJC(typeID, obj), 0), the function that tries to figure out if we're an Objective-C object or not. What does that do?

__builtin_expect() is just some gcc magic for branch prediction - here it means that we should expect CF_IS_OBJC to be false. That is, CF believes that most of its calls will be on CF objects instead of Objective-C objects. Ok, fair enough. But what does CF_IS_OBJC actually do? Take a look.

	CF_INLINE int CF_IS_OBJC(CFTypeID typeID, const void *obj) {
	    return (((CFRuntimeBase *)obj)->_isa != __CFISAForTypeID(typeID) && ((CFRuntimeBase *)obj)->_isa > (void *)0xFFF);
	}
(Keen observers might notice that this code is #ifdefed out in favor of:
	#define CF_IS_OBJC(typeID, obj) (false)
I believe this is for the benefit of people who want to use CF on Linux or other OSes, who aren't interested in toll-free bridging and therefore don't want to pay any performance penalty for it.)

Ok! There's two parts to seeing if we're an Objective-C object - we check (with a quick table lookup) whether our isa (class) pointer indicates that we "really are" a certain CF type, and if we're not, we check to see if our class pointer is greater than 0xFFFF, and if it is, we're an Objective-C object, and we call through to the Objective-C dispatch mechanism - in this case, we send the length message.

Summary

What are the consequences of all that? Well!

  • CF objects, just like Objective-C objects, all have an isa pointer (except it's called _isa in CF). It's right there in struct __CFRuntimeBase.
  • There are two toll-free bridging mechanisms! Some Objective-C objects "really are" CF objects - the memory layout between the Objective-C object and the corresponding CF object is identical (enabled in part by the presence of the _isa pointer above), and in that case the Objective-C methods are not invoked by the CF functions. For example, in this code:
    	CFStringGetLength([NSString stringWithCString:"Hello World!"]);
    
    There, -[NSString stringWithCString:] is returning an NSCFString (which you can verify by asking it for the name of its class), but -[NSCFString length] is never invoked - NO length method is invoked. You can verify that with gdb. Objects that "really are" their CF equivalents skip what's usually thought of as the bridge, and "fall through" to the CF functions even when the CF functions are directly called on them. Obviously, this is an implementation detail, and you should not depend on this.
  • That mechanism is also how bridging works the other way - how CF strings you get from, say, Carbon, can be passed around like Objective-C objects, because they really are Objective-C objects. The bridges are implemented entirely in CF and in the bridged classes - the Objective-C runtime is blissfully unaware.
  • But! Plain ol' Objective-C objects are sussed out by CF by checking to see if their class pointer is larger than 0xFFFF, and if so, ordinary Objective-C message dispatch is used from the CF functions. That's the second toll-free bridging mechanism, and it must be present in every public CF function for a bridged object, except for features not supported Cocoa-side.
  • Nowhere do we depend on the abstract class NSString at all - the bridge doesn't check for it and Objective-C doesn't care about it. That means that, in theory, CFStringGetLength() should "work" (invoke the length method) on any object, not just NSStrings. Does it? You can check it yourself. (Answer: yes!) Obviously, this is just an artifact of the implementation, and you should definitely not depend on this - only subclasses of NSString are supported by toll free bridging to CFString.
  • Curiously, other "true" CF objects are not considered to be CF objects by this macro. For example,
    	CFStringGetLength([NSArray array]);
    
    will raise an exception because NSCFArray does not implement length. That is, CF_IS_OBJC is not asking "Are you a CF type?" but rather "Are you this specific CF type?" That should make you happy, because it raises a "selector not recognized" exception instead of crashing, which makes our code more debuggable. Thanks, CF!
  • Why 0xFFFF? I'm glad you (I mean I) asked, since the answer (at least, what I think it is) has interesting connections to NULL. But that will have to wait until a future post.

Other approaches

My boss pointed out that there are other ways to achieve toll-free bridging, beyond what CF does. The simplest is to write your API with Objective-C and then wrap it with C:
	@implementation Array
	- (int)length {
	    return self->length;
	}
	@end
	int getLength(ArrayRef array) {
		return [(id)array length];
	}
You can even retrofit toll-free bridging onto an existing C API by wrapping it twice - first in Objective-C, then in C, and the "outer" C layer becomes the public C API. To wit:
	/* private length function that we want to wrap */
	static int privateGetLength(ArrayRef someArray) {
	   return someArray->length;
	}
	/* public ObjC API */
	@implementation Array
	- (int)length {
	   return privateGetLength(self->arrayRef);
	}
	@end
	/* public C API */
	int getLength(ArrayRef array) {
	   return [(id)array length];
	}
The point of that double feint, of course, is for the public C API to respect overrides of the length method by subclasses.

"Wrapping" up

So toll-free bridging is (one way) that Cocoa integrates with Carbon and even newer OS X APIs. It's possible in large part because of Objective-C, but in this case, Apple gets as much mileage from the simple runtime implementation and C API as from its dynamic nature. You already knew that, I'll bet - but hopefully you have a better idea of how it all works.

Now hands off! A coworker of mine makes the point that good developers distinguish between what they pretend to know and what they really know. The, uh, known knowns, and the known unknowns, as it were. The mechanism of toll-free bridging is not secret (it is open source, after all), but it is private, which means that you are encouraged to know about it but to refrain from depending on it. Use it for, say, debugging, but don't ship apps that depend on it - because that prevents Apple from making OS X better. And nobody wants that! I mean the prevention part.