Recent Changes - Search:


Users


Developers

edit SideBar

FileSaving

Because Hex Fiend is designed to work with large files, saving a file can be expensive. The goals for file saving are therefore twofold:

  • Avoid overwriting parts of the file that have not changed, and avoid copying data unnecessarily.
  • Provide excellent feedback to the user - accurate estimates, early failure detection, progress indication, and non-modal, cancellable saves.

Essentially, to achieve efficient in-place saves, we topologically sort our ByteArrayPieces according to whether their destination (where they are going to be written in the file) overlaps the source (the file backing of the associated ByteSlice) of the compared piece. If piece A's destination overlaps piece B's source, we write piece B first, so that we no longer need the bytes that piece B references and we can overwrite them with piece A's data. This technique allows us to avoid constructing the entire file elsewhere first, which may not be possible if the device has insufficient space. It also saves time, especially if most of the changes are in-place or occur near the end of the file.

Sometimes the dependency graph may have cycles. Imagine opening a 10 GB file, cutting the first half, and then pasting it at the end - that is, swapping the first and second halves of the file. Each half would then depend on their other half; if we begin copying either source half to its destination, we would be overwriting bytes that still needed to be written to their own destination. To support saving files with dependencies, we can "break" a cycle, which means copying out the source data of one of the ByteSlices so that the destination can write to it. We can break a cycle either by copying data into memory or copying it into a private file on disk; we copy to memory unless the size of the ByteSlice is larger than MAXIMUM_IN_MEMORY_CYCLE_BREAKING_STRATEGY_SIZE.

The details of this algorithm are discussed extensively in comments in ByteArrayFileWriting.m.

There is room for improvement here. We can support better cycle breaking strategies - writing backwards (for data that overlaps itself), or reading/writing in chunks instead of copying entire ByteSlices to private files.

Of course, if the file is not being overwritten (e.g. Save As...), then there are no dependencies or cycles, and the algorithm is correct but degenerate.

The topological sort produces a sequence of steps for writing out the file: break this cycle, copy this data from disk to this location, copy this data from memory to that location, etc. We can exploit this sequence of steps to provide excellent single-bar progress indication. Reading or writing a byte from disk is considered to have a cost of 1; we add up the total cost of breaking each cycle and writing each ByteSlice to determine the total cost, and then track the number of bytes read or written from disk to provide progress feedback.

If the file is to be made larger, we immediately ftruncate() the file to the new size, to detect out-of-space errors immediately (and before the file has been overwritten at all). Cycle breaking occurs as the next step, so that we can detect out of disk-space conditions early and before the bytes of the file are further touched. The file saving occurs in a separate thread with a document-modal sheet, allowing the user to continue working on other documents. The thread makes periodic checks to a "cancelled" flag to end the saving cleanly.

The actual saving is done in DocumentSaveSession.m

Edit - History - Print - Recent Changes - Search
Page last modified on May 03, 2007, at 07:09 PM