V2.2.1810 of the system (which is still called Isis2)

Jul 9, 2014 at 12:19 AM
Edited Jul 12, 2014 at 5:27 PM
V2.2.xxxx is a beta, but working really well. Most of the work has been on bug fixes related to the OOB code, especially on Infiniband. OOB is faster (and we'll do even better by end of summer) and more and more flexible. Weijia Song is integrating this into a file system and also extending Kaveri's command-line tool, so those options will be "real" in V2.2.xxxx soon too.
Jul 10, 2014 at 11:02 PM
Edited Jul 12, 2014 at 3:25 PM
Uploaded V2.2.1801. This fixes a problem Nick Lowe reported a while back, but that I didn't get a chance to look at until this week. In summary, he found that if he repeatedly crashed and restarted ORACLE members, eventually the membership service would hang. I was able to reproduce his bug and tracked it down to a race condition, which I think is now fixed. I doubt that anyone else has ever seen this issue.

At the same time I added yet another new parameter. A while ago I added ISIS_IGNOREPARTITIONS=false. If true, the ORACLE would keep running even when a partition failure causes it to split into two; if false it would shut down in that situation. But I was finding it annoying that in an ORACLE with 2 members, killing one caused the other to shut down. So now I added ISIS_IGNORESMALLPARTITIONS=true. In this specific case (had 2, 1 failed) Isis will keep running even though technically, this could be a partitioning event. Set ISIS_IGNORESMALLPARTITIONS=false for the old behavior.

You should read the documentation to learn more about these, and also to learn about the proper control of the various OOB transfer configuration parameters (ISIS_UNICAST_ONLY, ISIS_INFINIBAND, ISIS_FASTETHER, ISIS_USERDMA). You would also often need to set ISIS_HOSTS and ISIS_NETWORK_INTERFACES to tell the system which network interface to use, for the RDMA cases.

ib.dll is only needed for people who plan to work with ISIS_USERDMA=true and ISIS_INFINIBAND or ISIS_FASTETHER. Others can just ignore the file. You won't need to download it and Isis won't complain that it can't find it.
Jul 12, 2014 at 3:23 PM
Updated documentation, uploaded the revised user manual and the revised .chm file (for some reason SandCastle didn't rebuilt the .chw file -- I need to look into that. Let me know if this causes an issue for anyone)
Jul 12, 2014 at 5:27 PM
Updated to V2.2.1806. This fixes a small bug for the INFINIBAND version of the RDMA OOB transfer that arose (only) if a member failed midway through the transfer. (not something many people would encounter).
Jul 12, 2014 at 8:41 PM
V2.2.1807. This removes a debug print statement I neglected to comment out, and also fixes (I think) a small bug Weijia found in which the completion upcall wasn't occurring in OOBReReplicate if one or more of the target members were actually on the same node as the initiator.
Jul 13, 2014 at 7:03 PM
Edited Jul 13, 2014 at 7:54 PM
V2.2.1809. This fixes another very minor thing: Weijia found a situation in which the OOB code was unable to create a memory-mapped file "view accessor" and fixed it. It only arose on Mono, and only if you ran multiple processes on the same Linux hosts and then did an OOBReReplicate that included those processes as replica targets.
Aug 9, 2014 at 8:45 PM
V2.2.1810. Theo was working with the system on Mono and noticed 3 lines that wouldn't compile without a minor edit, as well as one reversion that "unfixed" an old bug fix impacting the group MultiJoin API. These 4 lines have now been fixed.