New out of band memory-mapped file copying tool is looking fantastic!

Aug 9, 2013 at 7:49 PM
Edited Aug 9, 2013 at 7:51 PM
A heads up on a new Isis2 feature that I think will really be useful. I have it working now and am doing regression testing to make sure I didn't break anything in the process. I should have an alpha release up in a week or so.

The feature that has me so excited is a new memory-mapped file copying and "shuffling" tool.... I've been working on it all summer and it finally seems to be solid and stable and fast as heck. The basic idea is that you have a cluster of machines and are running, say, Hadoop. As part of your computation you generate intermediate results and store them as memory-mapped files (this is popular lately). With the new tool, you can declare these to Isis2 and then ask the system to move them around: maybe you had a copy at A and want copies added on B..F. You want to move some other file outright: it was at D and you want it now at H and J. The number of copies can be large and the files can be huge. Isis2 will concurrently carry out this file movement actions at very high speed using a form of multicast that is reliable and (if desired) secure. The speed is extremely good; my early tests suggest that the code is able to peg even a large cluster network at near 100% of capacity, with very low overheads. Moreover, it handles very high levels of concurrency: lots of files, lots of request, lots of bytes to move. Compared with BitTorrent, which is how many cluster systems solve this problem now, I think my new stuff will be vastly faster.

Because many people aren't wild about using the Isis2 library, I'm also adding a "daemon" that will use the library to support these memory-mapped file moving tasks, but be accessible via RPC or command-line. Thus you'll have the option of running the daemon on your nodes and then talking to it from any language you like (C, MPI, whatever), or even from scripts.

I'm not currently replicating persistent files but it wouldn't be a hard thing to add. I just decided to focus first on the fastest possible scenario and then work my way from that to other file management cases.
Aug 20, 2013 at 1:43 PM
The delay in moving this from beta status to a general release is coming from two issues: first, I've had surprising problems getting it to work on Linux/Mono, stemming from incompatibilities in the way Mono implements the associated .NET functionality (MemoryMappedFile class, and asynchronous I/O). I'll get this working, but along the way I may need to fix some Mono problems.

The other issue is that Heesung convinced me that something is running slower than it should in this version -- looks like the issue is with Query calls that wait for ALL replies, and they seem to be about 25% slower than they should be. I've done work to speed that layer up, so it sort of surprises me that it would be running slower, but obviously bugs are like that. I'm sure I'll figure this out, probably fairly quickly -- my hope is that it may happen on Windows platforms too (Heesung sees this on Linux).

I'm not aware of other issues and the beta version is there to play with. Do make sure to let me know. Of course Nick has posted various coding suggestions on the issues tracker too... we'll tackle those but with slightly less urgency.