<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/1.5" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Barrier</title>
	<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/</link>
	<description>serious code</description>
	<pubDate>Wed, 20 Aug 2008 12:36:03 +0000</pubDate>
	<generator>http://wordpress.org/?v=1.5</generator>

	<item>
		<title>by: scott lewis</title>
		<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1929</link>
		<pubDate>Sun, 18 Feb 2007 00:13:03 +0000</pubDate>
		<guid>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1929</guid>
					<description>x86-64's strong ordering was a practical rather than a philosophical consideration.

Basically, AMD found themselves between a rock and a hard place. They had based their company around producing very good IA32-compliant processors. And now Intel and IBM were moving ahead into 64-bit computing with entirely new architectures. Heavily patented new architectures. AMD has licenses to produce Intel's 80486, 80386, etc. CPUs. They don't have licenses for the Pentium line. The Athlon line was AMD moving out of the role of secondary manufacturer and into the role of designer by producing their own superscalar implementations of the IA32 ISA.

However, since they have no licenses for them, and are unlikely to get licenses, a competitive AMD-produced IA64- or PowerPC-compatible processor would require designing completely original processors that are both fully compatible and better performing than the offerings from Intel or IBM. Which would be both very difficult and very expensive to do.

Hence, AMD opted to extend the life of the x86 architecture by extending it to include 64-bit addressing. They attempted to do that with the fewest number of changes. he basic premise being that fewer changes in the ISA would mean fewer changes required in compilers and operating systems to support the new ISA. And that fewer changes would give them the largest amount of backwards-compatibility to x86-32 'for free'.

Which is why x86-64 is strongly-ordered, has a small number of registers, and all the other x86 evils programmers taught MIPS and PowerPC assembly in school complain about...</description>
		<content:encoded><![CDATA[	<p>x86-64&#8217;s strong ordering was a practical rather than a philosophical consideration.</p>
	<p>Basically, AMD found themselves between a rock and a hard place. They had based their company around producing very good IA32-compliant processors. And now Intel and IBM were moving ahead into 64-bit computing with entirely new architectures. Heavily patented new architectures. AMD has licenses to produce Intel&#8217;s 80486, 80386, etc. CPUs. They don&#8217;t have licenses for the Pentium line. The Athlon line was AMD moving out of the role of secondary manufacturer and into the role of designer by producing their own superscalar implementations of the IA32 ISA.</p>
	<p>However, since they have no licenses for them, and are unlikely to get licenses, a competitive AMD-produced IA64- or PowerPC-compatible processor would require designing completely original processors that are both fully compatible and better performing than the offerings from Intel or IBM. Which would be both very difficult and very expensive to do.</p>
	<p>Hence, AMD opted to extend the life of the x86 architecture by extending it to include 64-bit addressing. They attempted to do that with the fewest number of changes. he basic premise being that fewer changes in the ISA would mean fewer changes required in compilers and operating systems to support the new ISA. And that fewer changes would give them the largest amount of backwards-compatibility to x86-32 &#8216;for free&#8217;.</p>
	<p>Which is why x86-64 is strongly-ordered, has a small number of registers, and all the other x86 evils programmers taught MIPS and PowerPC assembly in school complain about...
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Vincent Bernardi</title>
		<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1930</link>
		<pubDate>Sun, 18 Feb 2007 04:46:52 +0000</pubDate>
		<guid>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1930</guid>
					<description>Thanks for a very interesting post! An article as well written as this one deserves a lot of publicity. You should make a pdf version of it; it would make a very good teaching resource.

Thanks again,
V.</description>
		<content:encoded><![CDATA[	<p>Thanks for a very interesting post! An article as well written as this one deserves a lot of publicity. You should make a pdf version of it; it would make a very good teaching resource.</p>
	<p>Thanks again,<br />
V.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Soeren</title>
		<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1932</link>
		<pubDate>Sun, 18 Feb 2007 06:06:06 +0000</pubDate>
		<guid>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1932</guid>
					<description>Great post (as always). Interesting and entertaining.
Thanks!</description>
		<content:encoded><![CDATA[	<p>Great post (as always). Interesting and entertaining.<br />
Thanks!
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Jens Ayton</title>
		<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1933</link>
		<pubDate>Sun, 18 Feb 2007 07:19:37 +0000</pubDate>
		<guid>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1933</guid>
					<description>(Wow, this comment text is pretty small. The usual web designer’s false assumption that I lied when I told the browser what size text I want, taken to extremes. Bah, humbug.)

Nitpicks:
shouldn’t “Processor 0 is writing variable2 after variable1, so we’ll insert a memory barrier to prevent that:” say “…to ensure that…”?

The link “usual solution” [http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html] is broken.

The problem you ran into under “Our Enemy the Compiler” is an important one for test cases and benchmarks in general. I remember fondly the time someone who shall remain anonymous (because I can’t be bothered to dig through IRC logs) “proved” that long doubles are as fast as doubles on PowerPCs. His project produced similar assembly.</description>
		<content:encoded><![CDATA[	<p>(Wow, this comment text is pretty small. The usual web designer’s false assumption that I lied when I told the browser what size text I want, taken to extremes. Bah, humbug.)</p>
	<p>Nitpicks:<br />
shouldn’t “Processor 0 is writing variable2 after variable1, so we’ll insert a memory barrier to prevent that:” say “…to ensure that…”?</p>
	<p>The link “usual solution” [http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html] is broken.</p>
	<p>The problem you ran into under “Our Enemy the Compiler” is an important one for test cases and benchmarks in general. I remember fondly the time someone who shall remain anonymous (because I can’t be bothered to dig through IRC logs) “proved” that long doubles are as fast as doubles on PowerPCs. His project produced similar assembly.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Remco</title>
		<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1934</link>
		<pubDate>Sun, 18 Feb 2007 08:26:02 +0000</pubDate>
		<guid>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1934</guid>
					<description>Thanks for writing this piece. I enjoyed and learned a lot.</description>
		<content:encoded><![CDATA[	<p>Thanks for writing this piece. I enjoyed and learned a lot.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Joshua Haberman</title>
		<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1935</link>
		<pubDate>Sun, 18 Feb 2007 08:38:58 +0000</pubDate>
		<guid>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1935</guid>
					<description>You missed one important point.  When you introduced your memory barrier, the construct you used (__asm__ volatile) also instructed gcc not to optimize loads or stores across that barrier.  In absence of that kind of barrier (which is sometimes called an &quot;optimization barrier&quot;) the compiler is free to reorder loads and stores, as long as the apparent behavior is the same, which can thwart your attempts to write lock-free threads.

http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Extended-Asm.html</description>
		<content:encoded><![CDATA[	<p>You missed one important point.  When you introduced your memory barrier, the construct you used (__asm__ volatile) also instructed gcc not to optimize loads or stores across that barrier.  In absence of that kind of barrier (which is sometimes called an &#8220;optimization barrier&#8221;) the compiler is free to reorder loads and stores, as long as the apparent behavior is the same, which can thwart your attempts to write lock-free threads.</p>
	<p><a href='http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Extended-Asm.html' rel='nofollow'>http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Extended-Asm.html</a>
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Remco</title>
		<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1936</link>
		<pubDate>Sun, 18 Feb 2007 08:57:15 +0000</pubDate>
		<guid>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1936</guid>
					<description>While I'm at it I have a question: Are there high-level computer languages around that deal with multi cores implicitly and can generate optimized and multi-threaded code?</description>
		<content:encoded><![CDATA[	<p>While I&#8217;m at it I have a question: Are there high-level computer languages around that deal with multi cores implicitly and can generate optimized and multi-threaded code?
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Colin Barrett</title>
		<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1937</link>
		<pubDate>Sun, 18 Feb 2007 09:12:17 +0000</pubDate>
		<guid>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1937</guid>
					<description>Ugh, threads make my head spin (excuse the pun).

I'm not entirely sure, but I believe something like io, with it's actor model, avoids a number of these strange memory problems.</description>
		<content:encoded><![CDATA[	<p>Ugh, threads make my head spin (excuse the pun).</p>
	<p>I&#8217;m not entirely sure, but I believe something like io, with it&#8217;s actor model, avoids a number of these strange memory problems.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Wincent Colaiuta</title>
		<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1938</link>
		<pubDate>Sun, 18 Feb 2007 12:49:37 +0000</pubDate>
		<guid>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1938</guid>
					<description>Great article and a very interesting read. I think that how memory barriers work and why we need them can be very difficult for one to wrap one's head around; at some point we all end up writing an article about them to help us sort it out. &lt;a href=&quot;http://wincent.com/a/about/wincent/weblog/archives/2006/08/doublechecked_l.php&quot; rel=&quot;nofollow&quot;&gt;This one is mine&lt;/a&gt;.</description>
		<content:encoded><![CDATA[	<p>Great article and a very interesting read. I think that how memory barriers work and why we need them can be very difficult for one to wrap one&#8217;s head around; at some point we all end up writing an article about them to help us sort it out. <a href="http://wincent.com/a/about/wincent/weblog/archives/2006/08/doublechecked_l.php" rel="nofollow">This one is mine</a>.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: E</title>
		<link>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1939</link>
		<pubDate>Sun, 18 Feb 2007 12:56:42 +0000</pubDate>
		<guid>http://ridiculousfish.com/blog/archives/2007/02/17/barrier/#comment-1939</guid>
					<description>In the last example, could you just use an atomic compare &amp;amp; swap function rather than the memory barrier?</description>
		<content:encoded><![CDATA[	<p>In the last example, could you just use an atomic compare &amp; swap function rather than the memory barrier?
</p>
]]></content:encoded>
				</item>
</channel>
</rss>
