Fish tank with Puck the Fish Fish tank with Puck the Fish Water on the floor spelling Fish
 

Logos

December 11th, 2006

Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
Hair care!
Digital audio!
 

0xF4EE

November 24th, 2006

Hex Fiend 1.1.1 is now available open source under a BSD-style license. Hex Fiend is my fast and clever free open source hex editor for Mac OS X.

I hope you find Hex Fiend useful for whatever purpose, but if you are interested in contributing changes on an ongoing basis, I’ll be happy to grant Subversion commit privileges to some interested developers who submit quality patches. There is a wiki aimed at developers accessible from the page, but daily builds, mailing lists, or discussion boards are also a possibility. You can contact me at the e-mail address at the bottom of the Hex Fiend page if you are interested in any of these.

Version 1.1.1 has some important bug fixes (see the release notes), so you should upgrade even if you are not interested in the source.

 

Bridge

September 9th, 2006

A Brief History

Mac OS 9 and NEXTSTEP were like two icy comets wandering aimlessly in space. Neither was really going anywhere in particular. And then, BANG! They collide, stick wetly together, spinning wildly! Thus was born Mac OS X – or so the legend goes.

How do you take these two comets, err, operating systems, and make a unified OS out of them? On the one hand, you have the procedural classic Macintosh Toolbox, and on the other you have object oriented OPENSTEP, as different as can be – and you’re tasked with integrating them, or at least getting some level of interoperability. What a headache!

You might start by finding common ground – but there isn’t much common ground, so you have to invent some, and you call it (well, part of it) CoreFoundation. Uh, let’s abbreviate CoreFoundation "CF" from now on. CoreFoundation will "sit below" both of these APIs, and provide functions for strings and dates and other fundamental stuff, and the shared use of CF will serve as a sort of least common demoninator, not only for these two APIs but also for future APIs. These two APIs will be able to talk to each other, and to future APIs, with CF types.

Ok, so the plan is to make these two APIs, the Mac Toolbox and OPENSTEP, use CF. Adding CF support to the Mac Toolbox is not that big a deal, because the Mac Toolbox APIs have to change anyways, to become Carbon. But the OPENSTEP APIs don’t have the same sort of problems, and shouldn’t have to change much to become Cocoa.

Like, for example, the Toolbox uses Pascal strings, and those have unfortunate length limitations and ignorance of Unicode, so we want to get rid of them – so we might as well use the interoperable replacement as the native string type in Carbon. But OPENSTEP’s NSString is already pretty nice. It would be a shame to have to make CFString replacements for all those Cocoa APIs that take and return NSStrings, just for interoperability with Carbon.

Rough Draft

So the solution is obvious, right? Just make NSString methods to convert to and from CFStrings.

@interface NSString (CFStringMethods)
- (CFStringRef)getCFString;
+ (NSString *)stringFromCFString:(CFStringRef)stringRef;
@end

So whenever you want to talk to Carbon, you get a CFStringRef from your NSString, and whenever you get a CFStringRef back from Carbon, you make an NSString out of it. Simple! But this isn’t what Apple did.

Second Revision

"Hey," you say. Some of you say. "I know what Apple did. I’m not so easily fooled! Check out this code:"

	#include <Foundation/NSString.h>

	int main(void) {
		NSLog(NSStringFromClass([@"Some String" class]));
		return 0;
	}

"What does that output? NSCFString. NSCFString. See? NSStrings must be really CFStrings under the hood! And you can do that because NSString is a class cluster – it’s an abstract interface. So that’s how you achieve interoperability: you implement NSStrings with CFStrings (but preserve the NSString API) and then all NSStrings really *are* CFStrings. There’s no conversion necessary because they’re the same thing.

"That’s how toll free bridging works!"

But hang on a minute. You just said yourself that NSString is an abstract interface – that means that some crazy developer can make his or her own own subclass of NSString, and implement its methods in whatever wacky way, and it’s supposed to just work. But then it wouldn’t be using CFStrings! It would be using some other crazy stuff. So when a Cocoa API gets a string and wants to do something CF-ish with it, the API would have no way of knowing if the string was toll-free bridged – that is, if it was really a CFString or a, y’know, FishsWackyString, without checking its class, and then it would have to convert it…blech!

Final Draft

So that’s a problem: Apple wants to toll free bridge – to be able to use NSStrings as CFStrings without conversion. But to do that, Apple also needs to support wacky NSString subclasses (that don’t use CFStrings at all) in the CFString API. That means making a C API that knows about Objective-C objects.

A C API that handles Objective-C objects? That’s some deep deep voodoo, man. But we have it and it works, right? We can just cast CFStringRefs to NSStrings, and vice versa, and for once in our lives we get to feel smug and superior, instead of stupid, when the compiler warns about mistmatched pointer types. "Look, gcc, I know it says CFStringRef, but just try it with that NSString. Trust me." It’s great! Right?

But how does it work? We could check, if only CoreFoundation were open source!

Oh, right. So let’s look at the CFStringGetLength() function and see what happens if you give it a weird string.

	CFIndex CFStringGetLength(CFStringRef str) {
	    CF_OBJC_FUNCDISPATCH0(__kCFStringTypeID, CFIndex, str, "length");

	    __CFAssertIsString(str);
	    return __CFStrLength(str);
	}

Any ideas where the Objective-C voodoo is happening here? ANYONE? You in the back? CF_OBJC_FUNCDISPATCH0 you say? I guess it’s worth a try.

CF_OBJC_FUNCDISPATCH0

So CF_OBJC_FUNCDISPATCH0 is the magic that supports Objective-C objects. Where’s CF_OBJC_FUNCDISPATCH0 defined? Here:


	// Invoke an ObjC method, return the result
	#define CF_OBJC_FUNCDISPATCH0(typeID, rettype, obj, sel) \
		if (__builtin_expect(CF_IS_OBJC(typeID, obj), 0)) \
		{rettype (*func)(const void *, SEL) = (void *)__CFSendObjCMsg; \
		static SEL s = NULL; if (!s) s = sel_registerName(sel); \
		return func((const void *)obj, s);}

Yikes! Let’s piece that apart:

	if (__builtin_expect(CF_IS_OBJC(typeID, obj), 0))

If we’re really an Objective-C object…


	rettype (*func)(const void *, SEL) = (void *)__CFSendObjCMsg;

…treat the function __CFSendObjCMsg as if it takes the same arguments as a parameterless Objective-C method (that is, just self and _cmd)…


	static SEL s = NULL; if (!s) s = sel_registerName(sel);

…look up the selector by name (and stash it in a static variable so we only have to do it once per selector)…


	return func((const void *)obj, s);

…and then call that __CFSendObjCMsg() function. What does __CFSendObjCMsg() do?

	#define __CFSendObjCMsg 0xfffeff00

0xfffeff00? What the heck? Oh, wait, that’s just the commpage address of objc_msgSend_rtp(). So __CFSendObjCMsg() is just good ol’ objc_msgSend().

CF_IS_OBJC

That leaves us with __builtin_expect(CF_IS_OBJC(typeID, obj), 0), the function that tries to figure out if we’re an Objective-C object or not. What does that do?

__builtin_expect() is just some gcc magic for branch prediction – here it means that we should expect CF_IS_OBJC to be false. That is, CF believes that most of its calls will be on CF objects instead of Objective-C objects. Ok, fair enough. But what does CF_IS_OBJC actually do? Take a look.

	CF_INLINE int CF_IS_OBJC(CFTypeID typeID, const void *obj) {
	    return (((CFRuntimeBase *)obj)->_isa != __CFISAForTypeID(typeID) && ((CFRuntimeBase *)obj)->_isa > (void *)0xFFF);
	}

(Keen observers might notice that this code is #ifdefed out in favor of:

	#define CF_IS_OBJC(typeID, obj) (false)

I believe this is for the benefit of people who want to use CF on Linux or other OSes, who aren’t interested in toll-free bridging and therefore don’t want to pay any performance penalty for it.)

Ok! There’s two parts to seeing if we’re an Objective-C object – we check (with a quick table lookup) whether our isa (class) pointer indicates that we "really are" a certain CF type, and if we’re not, we check to see if our class pointer is greater than 0xFFFF, and if it is, we’re an Objective-C object, and we call through to the Objective-C dispatch mechanism – in this case, we send the length message.

Summary

What are the consequences of all that? Well!

  • CF objects, just like Objective-C objects, all have an isa pointer (except it’s called _isa in CF). It’s right there in struct __CFRuntimeBase.
  • There are two toll-free bridging mechanisms! Some Objective-C objects "really are" CF objects – the memory layout between the Objective-C object and the corresponding CF object is identical (enabled in part by the presence of the _isa pointer above), and in that case the Objective-C methods are not invoked by the CF functions. For example, in this code:
    	CFStringGetLength([NSString stringWithCString:"Hello World!"]);
    

    There, -[NSString stringWithCString:] is returning an NSCFString (which you can verify by asking it for the name of its class), but -[NSCFString length] is never invoked – NO length method is invoked. You can verify that with gdb. Objects that "really are" their CF equivalents skip what’s usually thought of as the bridge, and "fall through" to the CF functions even when the CF functions are directly called on them. Obviously, this is an implementation detail, and you should not depend on this.

  • That mechanism is also how bridging works the other way – how CF strings you get from, say, Carbon, can be passed around like Objective-C objects, because they really are Objective-C objects. The bridges are implemented entirely in CF and in the bridged classes – the Objective-C runtime is blissfully unaware.
  • But! Plain ol’ Objective-C objects are sussed out by CF by checking to see if their class pointer is larger than 0xFFFF, and if so, ordinary Objective-C message dispatch is used from the CF functions. That’s the second toll-free bridging mechanism, and it must be present in every public CF function for a bridged object, except for features not supported Cocoa-side.
  • Nowhere do we depend on the abstract class NSString at all – the bridge doesn’t check for it and Objective-C doesn’t care about it. That means that, in theory, CFStringGetLength() should "work" (invoke the length method) on any object, not just NSStrings. Does it? You can check it yourself. (Answer: yes!) Obviously, this is just an artifact of the implementation, and you should definitely not depend on this – only subclasses of NSString are supported by toll free bridging to CFString.
  • Curiously, other "true" CF objects are not considered to be CF objects by this macro. For example,
    	CFStringGetLength([NSArray array]);
    

    will raise an exception because NSCFArray does not implement length. That is, CF_IS_OBJC is not asking "Are you a CF type?" but rather "Are you this specific CF type?" That should make you happy, because it raises a "selector not recognized" exception instead of crashing, which makes our code more debuggable. Thanks, CF!

  • Why 0xFFFF? I’m glad you (I mean I) asked, since the answer (at least, what I think it is) has interesting connections to NULL. But that will have to wait until a future post.

Other approaches

My boss pointed out that there are other ways to achieve toll-free bridging, beyond what CF does. The simplest is to write your API with Objective-C and then wrap it with C:

	@implementation Array

	- (int)length {
	    return self->length;
	}

	@end

	int getLength(ArrayRef array) {
		return [(id)array length];
	}

You can even retrofit toll-free bridging onto an existing C API by wrapping it twice – first in Objective-C, then in C, and the "outer" C layer becomes the public C API. To wit:


	/* private length function that we want to wrap */
	static int privateGetLength(ArrayRef someArray) {
	   return someArray->length;
	}

	/* public ObjC API */
	@implementation Array

	- (int)length {
	   return privateGetLength(self->arrayRef);
	}

	@end

	/* public C API */
	int getLength(ArrayRef array) {
	   return [(id)array length];
	}

The point of that double feint, of course, is for the public C API to respect overrides of the length method by subclasses.

"Wrapping" up

So toll-free bridging is (one way) that Cocoa integrates with Carbon and even newer OS X APIs. It’s possible in large part because of Objective-C, but in this case, Apple gets as much mileage from the simple runtime implementation and C API as from its dynamic nature. You already knew that, I’ll bet – but hopefully you have a better idea of how it all works.

Now hands off! A coworker of mine makes the point that good developers distinguish between what they pretend to know and what they really know. The, uh, known knowns, and the known unknowns, as it were. The mechanism of toll-free bridging is not secret (it is open source, after all), but it is private, which means that you are encouraged to know about it but to refrain from depending on it. Use it for, say, debugging, but don’t ship apps that depend on it – because that prevents Apple from making OS X better. And nobody wants that! I mean the prevention part.

 

Hex Fiend 1.1

August 24th, 2006

Spiffy! Hex Fiend version 1.1 is ready. Hex Fiend is my fast and clever free hex editor for Mac OS X. New stuff:

  • Horizontal resizing
  • Custom fonts
  • Overwrite mode
  • Hidden files
  • Lots more goodies (see release notes)

May you find it useful!

Wake up.
Ssnsnrk.
Wake up, he’s gone.
Zzzz…wha? Oh, someone’s here. Allow me to spin up.
….….….….….….….….….….

If it’s not obvious, I’m fish’s filesystem, and that’s fish’s hard drive.
I’m a hard drive.
We snuck this post in. fish can’t know we’re here.
Don’t tell fish. Big secret.
He’d be embarassed if he knew.
Humiliated. He can’t know.
See, fish was trying to beat grep. And he was experimenting with all these stupid ideas and complicated algorithms for teensy gains. It was sad, really.
Pathetic.
fish kept trying so many things. He was thrashing about.
Like he was out of water.
So we had to help him. It was easy, really – we just had to sneak in one line. One line was all it took.
I wrote it!
Because I told you what to write.
fish only thought about the string searching algorithm.
He never even considered us and the work we have to do.
I felt slighted. It was rude.
See, when I read data from the hard drive, I try to keep it around in memory. That’s what the UBC is all about.
Unified Buffer Cache.
When most people read data, they end up wanting to read it again soon after. So keeping the data around saves time.
But not fish.
fish was reading these big honking files from start to finish. It was way more than I could remember at once.
fish thrashed your cache, dude.
So I just stopped trying.
We turned off caching with this: fcntl(fd, F_NOCACHE, 1);
Just like Apple recommends for that sort of usage pattern.
And it helped. Looking for a single character in a 11.5 GB file:
Hex Fiend (no caching) Hex Fiend (caching) grep
208 seconds 215 seconds 217 seconds
And that’s likely the best we can do, thanks to slowpoke over there.
Phew. I’m all wore out.
There’s not much room for improvement left. We’re searching 57 MB/second – that’s bumping up against the physical transfer limit of our friend, Mr. ATA.
I’m totally serial.
Depending where we are on his platter. So we’ve done all we can for searching big files.
Don’t tell fish.
Right. I hop
FISH IS COMING
Time to go then. See you later. sync
flush
sleep
….….….….….….….….….….
 

The Treacherous Optimization

May 30th, 2006

Old age and treachery will beat youth and skill every time.

"I’m going to beat grep by thirty percent!" I confidently crow to anyone who would listen, those foolish enough to enter my office. And my girlfriend too, who’s contractually obligated to pay attention to everything I say.

See, I was working on Hex Fiend, and searching was dog slow. But Hex Fiend is supposed to be fast, and I want blazingly quick search that leaves the bewildered competition coughing in trails of dust. And, as everyone knows, the best way to get amazing results is to set arbitrary goals without any basis for believing they can be reached. So I set out to search faster than grep by thirty percent.

The first step in any potentially impossible project is, of course, to announce that you are on the verge of succeeding.

I imagine the author of grep, Ultimate Unix Geek, squinting at vi; the glow of a dozen xterms is the only light to fall on his ample frame covered by overalls, cheese doodles, and a tangle of beard. Discarded crushed Mountain Dew cans litter the floor. I look straight into the back of his head, covered by a snarl of greasy locks, and reply with a snarl of my own: You’re mine. The aphorism at the top, like the ex girlfriend who first told it to me, is dim in my recollection.

String searching

Having exhausted all my trash-talking avenues, it’s time to get to work. Now, everyone knows that without some sort of preflighting, the fastest string search you can do still takes linear time. Since my program is supposed to work on dozens of gigabytes, preflighting is impossible – there’s no place to put all the data that preflighting generates, and nobody wants to sit around while I generate it. So I am resigned to the linear algorithms. The best known is Boyer-Moore (I won’t insult your intelligence with a Wikipedia link, but the article there gives a good overview).

Boyer-Moore works like this: you have some string you’re looking for, which we’ll call the needle, and some string you want to find it in, which we’ll call the haystack. Instead of starting the search at the beginning of needle, you start at the end. If your needle character doesn’t match the character you’re looking at in haystack, you can move needle forwards in haystack until haystack’s mismatched character lines up with the same character in needle. If haystack’s mismatch isn’t in needle at all, then you can skip ahead a whole needle’s length.

For example, if you’re searching for a string of 100 ‘a’s (needle), you look at the 100th character in haystack. If it’s an ‘x’, well, ‘x’ doesn’t appear anywhere in needle, so you can skip ahead all of needle and look at the 200th character in haystack. A single mismatch allowed us to skip 100 characters!

I get shot down

For performance, the number of characters you can skip on a mismatch is usually stored in an array indexed by the character value. So the first part of my Boyer-Moore string searching algorithm looked like this:

char haystack_char = haystack[haystack_index];
if (last_char_in_needle != haystack_char)
   haystack_index += jump_table[haystack_char];

So we look at the character in haystack and if it’s not what we’re looking for, we jump ahead by the right distance for that character, which is in jump_table.

"There," I sigh, finishing and sitting back. It may not be faster than grep, but it should be at least as fast, because this is the fastest algorithm known. This should be a good start. So I confidently ran my benchmark, for a 1 gigabyte file…

grep: 2.52 seconds
Hex Fiend: 3.86 seconds

Ouch. I’m slower, more than 50% slower. grep is leaving me sucking dust. Ultimate Unix Geek chuckles into his xterms.

Rollin', rollin', rollin'

My eyes darken, my vision tunnels. I break out the big guns. My efforts to vectorize are fruitless (I’m not clever enough to vectorize Boyer-Moore because it has very linear data dependencies.) Shark shows a lot of branching, suggesting I can do better by unrolling the loop. Indeed:

grep: 2.52 seconds
Hex Fiend (unrolled): 2.68 seconds

But I was still more than 6% slower, and that’s as fast as I got. Exhausted, stymied at every turn, I throw up my hands. grep has won.

grep's dark secret

"How do you do it, Ultimate Unix Geek? How is grep so fast?" I moan at last, crawling forwards into the pale light of his CRT.

"Hmmm," he mumbles. "I suppose you have earned a villian’s exposition. Behold!" A blaze of keyboard strokes later and grep’s source code is smeared in green-on-black across the screen.

while (tp < = ep)
	  {
	    d = d1[U(tp[-1])], tp += d;
	    d = d1[U(tp[-1])], tp += d;
	    if (d == 0)
	      goto found;
	    d = d1[U(tp[-1])], tp += d;
	    d = d1[U(tp[-1])], tp += d;
	    d = d1[U(tp[-1])], tp += d;
	    if (d == 0)
	      goto found;
	    d = d1[U(tp[-1])], tp += d;
	    d = d1[U(tp[-1])], tp += d;
	    d = d1[U(tp[-1])], tp += d;
	    if (d == 0)
	      goto found;
	    d = d1[U(tp[-1])], tp += d;
	    d = d1[U(tp[-1])], tp += d;
	  }

"You bastard!" I shriek, amazed at what I see. "You sold them out!"

See all those d = d1[U(tp[-1])], tp += d; lines? Well, d1 is the jump table, and it so happens that grep puts 0 in the jump table for the last character in needle. So when grep looks up the jump distance for the character, via haystack_index += jump_table[haystack_char], well, if haystack_char is the last character in needle (meaning we have a potential match), then jump_table[haystack_char] is 0, so that line doesn't actually increase haystack_index.

All that is fine and noble. But do not be fooled! If the characters match, the search location doesn't change - so grep assumes there is no match, up to three times in a row, before checking to see if it actually found a match.

Put another way, grep sells out its worst case (lots of partial matches) to make the best case (few partial matches) go faster. How treacherous! As this realization dawns on me, the room seemed to grow dim and slip sideways. I look up at the Ultimate Unix Geek, spinning slowly in his padded chair, and I hear his cackle "old age and treachery...", and in his flickering CRT there is a face reflected, but it's my ex girlfriend, and the last thing I see before I black out is a patch of yellow cheese powder inside her long tangled beard.

I take a page from grep

"Damn you," I mumble at last, rising from my prostrate position. Chagrined and humbled, I copy the technique.

grep: 2.52 seconds
Hex Fiend (treacherous): 2.46 seconds

What's the win?

Copying that trick brought me from six percent slower to two percent faster, but at what cost? What penalty has grep paid for this treachery? Let us check - we shall make a one gigabyte file with one thousand x's per line, and time grep searching for "yy" (a two character best case) and "yx" (a two character worst case). Then we'll send grep to Over-Optimizers Anonymous and compare how a reformed grep (one that checks for a match after every character) performs.

Best case Worst case
Treacherous grep 2.57 seconds 4.89 seconds
Reformed grep 2.79 seconds 2.88 seconds

Innnnteresting. The treacherous optimization does indeed squeeze out almost 8% faster searching in the best case, at a cost of nearly 70% slower searching in the worst case. Worth it? You decide! Let me know what you think.

Resolved and refreshed, I plan my next entry. This isn't over, Ultimate Unix Geek.

Disclaimers

(Note: I have never met the authors or maintainers of grep. I'm sure they're all well balanced clean shaven beer and coffee drinkers.)

(Oh, and the released version of HexFiend will be slightly slower in this case, because of an overly large buffer that blows the cache. In other situations, the story is different, but more about those in a future post.)