Isis2 for Big Data: A very exciting new capability

Coordinator
Jan 2, 2013 at 8:56 PM
Edited Jan 2, 2013 at 8:59 PM

I've been doing this stuff for a long time, and don't easily get excited about technology, but over the winter vacation I cooked up something genuinely cool for Isis2.   This posting though is more for people familiar with LINQ, the .NET language feature for doing database-style operations on lists and key-value stores.

What I've done is to rework the Isis2 DHT so that it integrates with LINQ in a really slick way.  Moreover, now there is a way to put large numbers of (key,value) pairs into the DHT at a time, and if you like you can request a totally ordered, virtually synchronous insert.  Then you can run parallel distributed queries that employ LINQ and subdivide the work, and all this integrates with the Isis2 aggregation layer to offer ultra-scalable ways to collect the results.  I've improved performance of the Isis2 aggregation trees quite dramatically, so for really large groups, you have a fast and highly parallel way to collect results.  For smaller groups, you can just send them back to the query initiator of course.

Moreover, you can now put entire files in: there is a simple way to associate a reader and a writer method with the DHT so that if a large object shows up, you get to write the object to a temporary file and just keep a URL or a file name in the DHT per-se.

All of this is fault-tolerant, and in fact I've come up with a cute trick to minimize the costs associated with nodes failing or joining (churn).

Sounds like MapReduce?  Indeed it does, but this is a version of MapReduce for people who want to build an on-line service with data being updated dynamically, and need security and fault-tolerance and strong consistency.  With Isis2 you get all that and the code is even easy to read!  I think this is really cool and would love to help anyone who wants to be an early user of the technology.  Interestingly, it isn't a massive change to the system itself, so I think the solution is actually very stable and will be useful even at large scale from day one.

I plan to upload a new version of the V2.0.xxxx pre-release, and new documentation, later this week once I've finished my regression testing.  And just like that Isis2 will be a very serious option, I would hope, for people doing large-scale cloud services of a kind that up to now could only be tackled with fairly ugly iterative or interactive versions of Hadoop!  Have at it folks...