Friday, September 22, 2006

Writing an embedded file system

Once upon a time, Julian M. Bucknall wrote an interesting article for The Delphi Magazine. Well, he wrote more than one and they are all interesting and (that's the best part) he's still writing them, but I particularly liked that one.

The article name is Manage The Damage (it had to be Julian's Jimmy Somerville phase) and it was published in Issue 69 (May 2001). Yes, quite an old stuff - at least in computing terms. Inside, Julian described an implementation of an embedded file system i. e. a file system that rests inside another storage (a file in an ordinary file system, for example). That's such a useful concept that even Microsoft recognizes it (think OLE structured storage).

In short, embedded file system (or compound file, or structured storage) allows you to store many unconnected or loosely connected pieces of information inside one physical file in a structured way. For example, it can be used to store different parts of configuration settings for a large application, or it can be used for data acquisition to store many data packets inside one file, or it can be used as a simple database to store text fragments or code snippets.

Although I liked the idea behind Manage The Damage, I couldn't live with the implementation. It was too limited for my taste (embedded file system was limited to 32 MB) and implementation was a mess of pointers, which made the code almost nonportable to .NET.

And then, as all fairy tales start, I decided to write my own implementation.

Why not use OLE structured storage, you'd ask? Well, I like to know how my data is stored (Have you ever tried to look inside an OLE structure storage file with a hex editor? I did. Not a pretty sight.), I want the implementation to be fast and simple to use from Win32 and potentially .NET. Besides that, it sounded like an interesting challenge.

So how did I fare? Good, if I'm the one to answer. There were no pointers killed while writing the code, total size of the file system is limited only to 4 TB (4 GB for files stored inside the compound file) and file internals are easy to understand (well, at least to me ;) ).

The code was used in some commercial projects. Also, GExperts use it for snippet storage (CodeLibrarian.fs file). It seems to be stable and mostly bug-free and as such I'm releasing it to the public, with the usual string attached.

For the brave, here's the code and test suite: GpStructuredStorage.

If you're still interested, continue reading. I'll show few examples on how the GpStructuredStorage can be used.

A small how-to

Next fragment creates compound file and then reopens it for reading. Note that the compound file is implemented as an interface and doesn't need explicit destructor calls as such.

Instead of a file name, one can also send a TStream or descendant to the .Initialize method.

storage: IGpStructuredStorage;
storage := CreateStructuredStorage;
storage.Initialize(CStorageFile, fmCreate);
// write and read here
storage := CreateStructuredStorage;
storage.Initialize(CStorageFile, fmOpenRead);
// from now on, only reading is allowed

Now that we have the storage interface, we can create a file and then read it.

strFile: TStream
strFile := storage.OpenFile('/folder/file.dat', fmCreate);
try
// write to strFile
finally FreeAndNil(strFile); end;
strFile := storage.OpenFile('/folder/file.dat', fmOpenRead);
try
// read from strFile
finally FreeAndNil(strFile); end;

There is no need to create a /folder manually - every file access automagically creates all required folders.


Still, you are free to do it the old way.

storage.CreateFolder('/folder/subfolder');
if not storage.FolderExists('/folder/subfolder') then
//panic

Of course, there is also a FileExists function.


File enumeration is simplified to the max.

files: TStringList;
files := TStringList.Create;
try
storage.FileNames('/folder', files);
finally FreeAndNil(files); end;

(To enumerate folders, one would use FolderNames instead of FileNames.)


Additional information on file or folder can be access via FileInfo property:

FileInfo['/full/path/to/file.dat']


Currently, FileInfo only exports file's size (FileInfo[].Size) and file attributes (FileInfo[].Attribute).


Attributes offer you a way to store additional string info for each file and folder. Unlimited number of attributes can be stored and the only limitation is that both attribute name and value must be stringified.


storage.FileInfo['/folder/file.dat'].Attribute['author'] := 'Gp';

At the end, I must mention that it is also possible to Move and Delete files/folders and Compact (defragment) the file system.


If I have convinced you, go and use the stuff. If not, wait for the next episode.


Next in the embedded file system series




  • Internals of a GpStructuredStorage file
  • Total Commander plugin to look into GpStructuredStorage files.


Coming soon to a feed near you.

10 comments:

  1. Anonymous16:18

    Super.

    Looks very interesting. I needed something like that a lot. Now will think on where can I use that.

    .Net functionality is a huge bonus.

    Thank you

    ReplyDelete
  2. Anonymous23:34

    Looks really great. How about a posibility to pack/zip/zlib the whole file? Or would that be to slow?

    ReplyDelete
  3. Compression/encryption on a storage file level doesn't really fit into the concept at the moment.

    I would recommend just turning the compression on if file is stored on a NTFS volume.

    ReplyDelete
    Replies
    1. May be you have plans to add encryption for whole storage?

      Delete
  4. Anonymous04:47

    very good! and u can make it better by adding the compression/encryption options, good job!

    I'd like suggest another solution, to use ZipMaster, you can first write any data to a TStream or like, and then add it to a .zip archive by calling ZipMaster.AddStreamToFile. finally u got a standard .zip archive with any files in it, the most important thing is, it's free and open source too. However u will need extra dll.

    Productive Mind Mapping Software: MindVisualizer
    http://www.mindmapware.com/

    ReplyDelete
  5. Anonymous05:34

    This is great! Just what I'm looking for. Question. Maybe I need sleep but I don't see a way to insert a file into the fs from the hard disk and then extract files from the filesystem and save them to the hard drive outside the fs.

    What am I missing?

    ReplyDelete
  6. Open external file with TFileStream (or use TFileStream to create one), then use strem.CopyFrom(otherStream) to copy data in any direction.

    ReplyDelete
  7. zeuspelink13:36

    HI,
    i'm very interested in how does the component work. the code is a bit too complex for me to understand.
    Can you please explain in few steps how does it work(i'm also interested how it delets data from file) and how come it can go only to 4 TB(don't get me wrong but i'm courios, 4 tb is more than enough at this time...).
    Thank you very much.
    I would appreciate if you could mail me this info at zeuspelink*at*yahoo*dot*com
    Great job man, keep up the good work!

    ReplyDelete
  8. Dmitry22:30

    Hi,
    Just got a real problem with the storage. I had a sort of data store with size <2GB without any issue. Coming closer to 2GB there are constant errors with even overwriting storage header and losing all data.
    Right now I have such error with file length 2 148 086 223 bytes. Such critical errors happening of cause because I ignore previous errors when saving data and trying to save again..
    The main question is - is it a bug or limitation? Maybe you still have somewhere :Integer instead of :int64?

    ReplyDelete
  9. That's a limitation of the current implementation.

    ReplyDelete