• My Least Favorite Rust Type September 20th, 2020

    My least favorite Rust type is std::ops::Range.

    Range is a foundational type, with magic syntax but a simple definition:

    struct Range<Idx> {
        /// The lower bound of the range (inclusive).
        pub start: Idx,
        /// The upper bound of the range (exclusive).
        pub end: Idx,
    }

    Idx is typically an integral type, but you can make a Range of anything, which will become important later. Here’s a range of Unit:

    ()..()

    (In case you were wondering, the Unit range does not contain Unit.)

    What’s wrong with Range? This will be a critique on Rust’s own terms.

    Range needs random-feeling borrows

    Any Rust tutorial will cover borrowing, lifetimes, ownership, etc. You think you understand, then this:

    (3..5).contains(&4) // why the &

    You need the borrow because you can’t like, OWN a number, man!

    Heh, no, it’s because, what if you wanted to make a Range<Vec> instead:

    let r = vec![1, 2, 3]..=vec![10, 11, 12];
    let v = r.contains(&vec![9, 9, 9, 9]);

    (try to guess what value v has above)

    Range<Vec> requires the borrow, so the vastly more common Range<usize> etc. forces it as well.

    Range needs random-feeling clones

    We’re not done picking on the borrow checker. Range makes you clone even when there’s no ownership transfer:

    let v = vec![1, 2, 3, 4];
    let r = 1..2;
    &v[r.clone()]; // required to avoid "use of moved value" in next line
    &v[r]

    Range could be Copy, but some Ranges are also iterators and you might copy one by mistake:

    let mut iter = 0..n;
    for i in iter { if i > 2 { break; } } // silent copy of iter
    iter.collect()

    so Ranges which are not used as iterators (many cannot be, like Range<f64> or Range<char>) pay the price, and so does any type which embeds a Range.

    This is abusing the borrow checker as a bad linter. Range undermines the story of lifetimes and ownership, making the borrow checker feel arbitrary.

    Range is unsure when it is valid

    Rust will happily construct a backwards Range, which ends before it starts:

    let x = 5..0;
    x.contains(&4); // false

    Is this backwards range valid? Its len() is 0, it contains() nothing, it yields nothing as an iterator. But if you try to use it to index into a slice, you get a panic! So it looks valid but is primed to explode!

    This is because - again - you can make a Range of anything. If you try to enforce that start <= end, you lose the ability to make a Range of non-comparable things. A Range of dyn Error can’t be invalid, so a Range of int never gets to be.

    A practical problem is writing correct bounds checks. For example, consider the get_unchecked function on slice - it says “an out-of-bounds index is undefined behavior” but never defines what out of bounds means. So how does one even call this function safely?

    Restated: a Range with start 0 and length 0 may be out of bounds. That’s wild.

    Range hides a footgun

    A regex engine (here’s one) often deals in ranges of char, for example /[a-z]/. Should it use Range<char>?

    No! The footgun is /[\u10FFFF]/, which is the largest char. Range<char> cannot represent this value, even though it has the bits to do so.

    Is RangeInclusive a better choice? This uses an additional bool field to mean…stuff, and it needs to be separately checked in several places. This is a silly expensive representation, pushing RangeInclusive<char> to 12 bytes even though it would fit in 8, with bits to spare. Not a good choice for a perf-sensitive regex algorithm.

    A Recipe for Rearranging Range

    The problem is Range is overloaded: it’s too flexible, it wants to be everything.

    • It’s sometimes a set
    • It’s sometimes an iterator
    • It’s sometimes a slice index
    • You can make silly Ranges out of anything

    These goals are in tension, and it meets none of them well.

    Perhaps it’s too late and we must live with this wart, but a recipe for a different approach:

    1. Limit Range to Copy + PartialOrd types. There may be occasional uses for Range<String> or Range<BigNum>, but it’s not worth forcing borrowing for the common case.

    2. Now that Range always knows how to compare its ends, enforce that start <= end at construction time. For Ranges constructed from constants, this will be free. This avoids the “looks valid, actually explodes” problem and will unlock further optimizations:

      • Range::is_empty() could be written naturally instead of a a bizarre way just for NaNs
      • [T]::get() would need to branch once, not twice
      • Range::len() could just be a sub and not require a branch
    3. Give Range an iter() method, like other collections have. Now Range is not an Iterator, just like Vec is not an Iterator, and Range can be easily made Copy. This does mean writing for n in (1..10).iter(), but Rust already requires that for collections, so it’s more consistent.

    4. Now that Range is not an Iterator, RangeInclusive can drop its extra bool. It would simply be a pair of indexes, and could not be empty (that’s what Swift does).

    Thanks for reading!

  • JavaScript's Tricky Rounding April 24th, 2018

    JavaScript rounds in a tricky way. It tricked all the engines, and even itself.

    Math.round() behaves the same as C’s familiar round with one key difference: it rounds halfways (“is biased”) towards positive infinity. Here is its spec in ES 5.1. It suggests an implementation too:

    The value of Math.round(x) is the same as the value of Math.floor(x+0.5)...

    And so every engine did that.

    Unfortunately, the implementation in the spec does not correctly implement the spec.

    One bug is that adding 0.5 can result in precision loss, so that a value less than .5 may round up to 1. Mac user? Try it yourself (Safari 11.0.3):

    > /System/Library/Frameworks/JavaScriptCore.framework/Versions/A/Resources/jsc
    >>> 0.499999999999999944 < .5
    true
    >>> Math.round(0.499999999999999944)
    1 
    One engine attempted to patch it by just checking for .5:
    double JSRound(double x) {
      if (0.5 > x && x <= -0.5) return copysign(0, x);
      return floor(x + 0.5);
    }
    However this fails on the other end: when x is large enough that fractional values can no longer be represented, x + 0.5 rounds up to x + 1, so JSRounding a large integer like Math.pow(2, 52) would actually increment it.

    What's a correct implementation? SpiderMonkey checks on the high end, and exploits the loss of precision on the low end:

    double math_round_impl(double x) {
       if (ExponentComponent(x) >= 52) return x;
       double delta = (x >= 0 ? GetBiggestNumberLessThan(0.5) : 0.5);
       return copysign(floor(x + delta));
    }
    fish's attempt just checks high and low:
    static const double kIntegerThreshold = 1LLU << 52;
    double jsround(double x) {
      double absx = fabs(x);
      if (absx > kIntegerThreshold) {
        // x is already integral
        return x;
      } else if (absx < 0.5) {
        // x may suffer precision loss when adding 0.5
        // round to +/- 0
        return copysign(0, x);
      } else {
        // normal rounding.
        // ensure negative values stay negative.
        return copysign(floor(x + 0.5), x);
      }
    }
    which produces surprisingly pleasant assembly, due to the compiler's fabs() and copysign() intrinsics.

    Update: Steve Canon finds a floor-based approach, which surely must be optimal or close to it:

    double roundjs(double x) {
      double provisional = floor(x);
      double fraction = x - provisional;
      return copysign(provisional + (fraction >= 0.5), x);
    }

    The ES6 spec sheepishly no longer suggests an implementation, it just disavows one:

    Math.round(x) may also differ from the value of Math.floor(x+0.5) because of internal rounding when computing x+0.5...

    JavaScript presumably rounds this odd way to match Java, and so the only engine to get it right out of the gate is Rhino, which simply calls back to Java's Math.round. Amusingly Oracle fell into the same trap with Rhino's successor Nashorn. Round and round we go!

  • Schrödinger? I hardly know her! September 8th, 2016

    At very small scales, particles are described by wavefunctions that obey the Schrödinger Equation. What do wavefunctions look like?

    The Wavefiz is a nifty visualizer that draws them! It's real physics: we're solving the Schrödinger Equation in real time with arbitrary potentials. But it's also just plain fun to play with!

    There's some non-mathy exercises to do too. Have you heard of the ground state energy or quantum tunnelling? Those pop right out - you can see them visualized.

    The visualizer was built using three.js and TypeScript. You can pitch in here on GitHub. And if you like quantum physics, and are near Silicon Valley, come meetup to learn quantum mechanics with us!

    Surf over to the Wavefiz to see it in action!

  • The One Second Dash August 15th, 2016

    The Amazon Dash is a $5 WiFi button that summons a truck to deliver you water or other stuff. Want your Dash to do something else? The popular approach is to sniff its ARP requests. This requires that Dash connect to your network, putting you perilously close to having some DUDE delivered with your IoT mood lighting.

    A more immediate problem is immediacy, or lack thereof: the Dash button only connects to your network after being pressed, so there's a ~5 second delay before anything can happen! This makes the ARP Dash hack unsuitable for interactive uses, like doorbells.

    Can we make it faster? Here's one way:

    1. "Setup" the Dash with a unique network SSID for a network that doesn't exist
    2. Use a WiFi adapter in monitor mode to observe probe requests on that network SSID

    This responds in < 1 second, which is fast enough for real time uses. And you don't even have to give the thing your password.

    A Raspberry Pi works when equipped with a WiFi adapter capable of monitoring mode. The RT5370 chipset is so capable - here's the one fish bought. Steer clear of the ubiquitous Realtek RTL8188CUS based devices.

    Head on over to the One Second Dash repo to get started!

  • fish shell 2.0 May 17th, 2013

    fish 2.0 is now released! fish is a fully-equipped command line shell (like bash or zsh) that is smart and user-friendly. fish supports powerful features like syntax highlighting, autosuggestions, and tab completions that just work, with nothing to learn or configure.

    Go get it:

       http://fishshell.com

    This marks the first release of fish in over four years, and includes many new features, fixes, and optimizations. See the release notes for a partial list of what's new.

    A big debt of gratitude to everyone who contributed to this release, including:

    Thank you for sharing your time, code, and ideas!

    P.S. Discuss fish in the #fish IRC channel in irc.oftc.net, or use the web chat (enter fish as the channel).

  • More Posts