Starting work on a new native C++ version of Isis2

Dec 2, 2014 at 1:41 PM
I'm starting to port the C# library to native C++ and thought I might see if there are any suggestions that users of the current version of the system would like to make.

As you know, the current C# library can be used from C++/CLI by linking against .NET or Mono, but of course you end up with managed code that does garbage collection whenever it is in the mood to do so, etc. A native C++ implementation of the library would try and preserve the current interfaces to the degree possible while eliminating any dependence on .NET and Mono, would compile down to executable code rather than .NET byte code, and would manage its own memory, so garbage collection would vanish as an overhead.

The main issue I'm seeing is that so much of the Isis2 API is polymorphic and reflection-driven. But I'm exploring some of the tricks that others who have gone down this route for C++ have tried to kind of hack around the very limited C++ type system.

For example, and this is something you folks could chime in on and help me with:

1) Consider an Isis2 call like g.Send(x, y, z). Isis2 reflects on the objects x, y and z to learn their types and uses that type information in various ways -- when it creates messages, to recognize the end of the argument list and the start of the reply lists in a Query, etc. In C++ that type information is not (in general) available within a fully polymorphic, variable-args method like Send (internally, Send just expects a series of things of type Object and uses reflection to deduce the actual types you passed in). So what's the best option for addressing this?

2) When Isis2 does a call to your event handles, like a register message handler, we pass pointers from the library to your code. In C# this works because of managed memory. But in C++ this means that the library will need to allocate objects and then pass them to you. Who should then be responsible for freeing these objects? If you hold onto one of those pointers and then I free the associated object, your pointer could be pointing to anything. If I assume you'll free these objects, then memory leaks are a near certainty because people will forget to do so. The original Isis library had a kind of reference counting scheme it used, and one option is to go down that route again. Another is to provide a method for you to "take ownership" of an object.

3) I do some incredibly elaborate reflection-based upcall magic in the aggregation layer of the system and some moderately fancy reflection-based stuff in the DHT. Unclear whether or not that can be made to work in C++.

So what are your thoughts? I'm genuinely interested in advice and feedback, if anyone is listening!

Dec 16, 2014 at 5:29 PM
1/ what about protocol-buffers from google. Not as simple as simple reflection, some magic should be done ! I am giving this proposal for advise.
I did implement a simpe and crude data broadcaster using protocol-buffers and the spread toolkit.

2/ what about Boost Smart Pointers ?

Dec 16, 2014 at 5:49 PM
Edited Dec 16, 2014 at 5:52 PM
Hi Michel!

I’ve looked at protocol buffers and in fact Isis2 supports them. You need to look at the source but Nick Lowe has done work with applications that use them from C#.

So we would be able to support C++ code that uses protocol buffers too, if we do the full port.

I’ve been looking at Boost and I agree that they have nice solutions for a number of the issues that arise. Clearly we’re not the only people to be faced with this issue of passing variable argument lists with multiple types including user-defined types. In fact I did some experiment with variadic templates and it looks as if there is a kind of ugly way to solve the problem even at that level (I bet this is how the Boost solution actually works!) The caller calls g::Send(x, y, z) and you basically end up invoking Group::Send(args, types) where args is a vector of pointers and types is a vector of the corresponding types from the C++ typeid method, which actually provides pretty good type information these days. The 2011 C++ revisions are awesome in this respect and really made C++ way better for reflection.

So I think the main remaining issues will turn out to be kind of horrible recoding challenges. First, since C# is garbage collected I need to either modify Isis2 in C# to manage its own memory (this has some appeal for performance reasons, in fact) or go through the code figuring out where to free things. Very painful. I'm actually leaning towards modifying the C# code, in fact. What I was thinking is that if Isis2 ever finds that it needed say 17 FooBar objects, it will probably need that many again (this is that kind of system). So I could just have a freelist scheme in which "new FooBar()" in C# becomes "newFooBar()" (no space) and allocates from the free list if there is a FooBar available and otherwise calls new or perhaps even calls new 10 times -- sometimes you gain by allocating things in groups and stocking them for later. Then when the C# code currently discards the object I could go through and put in a call to foo.reclaim() that would put the object on a free list.

This way I could debug the needed functionality in a working version of Isis2 (still in C#) and then once I have it stable, my port to C++ becomes closer and closer to a line by line translation. Plus Isis2 in C# would gain performance benefits because in reality, garbage collection isn't doing anything useful in the library right now. We end up needing 17 FooBars on average in any case, so all that hassle to allocate and free them is wasted.

Next, since C++ lacks finally clauses, I need to modify every try/catch/finally in the C# code, and every "using" clause in the C# code to match the C++ idiom, which centers on destructors. Similarly, I can use the C++ Lambda expressions to implement the C# anonymous methods.

So basically it seems as if I can get C# and C++ to be closely enough aligned by modifying the C# code a bit so that in the end I can get them to match up, basically, and it then is just a matter of creating Isis.h with all my public methods and constants, and more or less mechanically translating the C# to C++ line by line.

Oh, and there are crypto packages I would need. I use AES256 and also a type of digest/signature scheme that I would need to find in C++. Presumably not hard.

But I see this as a pretty big task right now. I'm definitely going to do it. But it may take a while unless someone wants to help!

I really do want the two versions to be as close to line by line equivalent as feasible. People say you should reimplement everything when you switch languages but this isn't going to be that kind of project. C++ and C# aren't so extremely different in style, especially if I get rid of the dependence on managed memory. And the big advantage of preserving a line by line, comment by comment equivalence is that bug fixes in one can be applied to the other.

This probably reduces the value of Boost (if anything, I would want to think about offering Isis2 in C++ to the Boost people as a candidate to add to Boost). But the fact is that anything Boost is doing, I can probably do almost in the identical way. The main value of Boost, really, would be the variable arguments handling -- but the C++ feature is variadic templates and let's face it: if one can do the same thing that way, then instead of telling people "to use Isis2, install Boost" I would be stand-alone, which is definitely preferable...
Dec 22, 2014 at 2:50 PM
Thank you for your answer and sorry for the late reply.

Good explanation from your part, quite usufull to remember why "finally" is not required for C++.

For the crypto package, I used once the Mozilla's library NSS, but it is quite a big package !
Dec 22, 2014 at 6:26 PM
Cool! Mozilla NSS might work well for me.

Turns out that Tangible Software's translated code uses smart pointers automatically. So now I think I'll start with that version after all, and then can improve as we gain experience...
Feb 23, 2015 at 2:55 PM
Just a quick update. I'm working with the folks at Tangible to make this task a bit more automated and easier. But it does look like the API may change in small ways.

One change is that because variadic template support in C++ is really a fancy form of macro expansion, right now all of the Group class ends up in Isis.h because of the DHT and Aggregation APIs. I'm going to break these out: I'll give you a constructor for creating a new aggregation handle, or DHT handle, and then you'll use those handles in your code to access the DHT and Aggregation methods, which will let me eliminate all use of generics in the Group class. This should make a big difference and also simplify Isis.h itself.

Another possible change is that I may actually shift the C# code to use smart pointers, which would also let me maintain freelists of common Isis2 objects like MsgDesc objects, which get allocated and deallocated a whole lot. This would largely eliminate garbage collection within the library, and I think that it could speed it up quite a bit. Of course it does mean a fair amount of code revision in the C# code, but on the positive side, the C# and C++ would then be more similar.

Sorry this has gone slowly -- I was asked to run PhD admissions for CS here at Cornell, and that's a big job. Then my course turned out to be kind of larger than expected, so that's another big job. And PC roles, and papers to revise and submit. So Isis2 (which often is my late spring / summer project) has been on a back burner, but I won't leave it there indefinitely.
Feb 28, 2015 at 3:06 PM
So... seems that I was wrong. There are a few "hacks" in the way that C++ actually implements templates and the upshot is that I don't need to change the Isis2 API after all. It looks as if I can preserve it pretty much unchanged.

Tangible has it down to 254 "to do" items (some large) and about 1000 warnings to review. So I'm suddenly much further along than I expected to be...
May 30, 2015 at 3:25 PM
Edited May 30, 2015 at 3:25 PM
Just in case anyone is curious, I've resumed working on this after a pause to teach my spring class -- the distractions associated with teaching and writing grant proposals were just keeping me too busy to work steadily on software during most of February, March and April.

I don't intend for the new version to replace the existing Isis2. Instead it will just be another system, although sharing lots of code.

Right now I've stripped out a lot of the Isis2 code that I don't really need, at least initially: DHTs, OOB, the Aggregation logic, some of the fancy UDP tunneling, etc. This gives me a much smaller code base to work from. It seems to be working now, so I'll then run it through the Tangible Software C# to C++ translator, probably next week in fact, and then I need to patch in the missing .NET methods by creating C++ classes with the same functionality. For example, the native C# threads will become Pthreads, and I'll pull in a crypto library coded in C++ with the equivalent functionality to what I use now. C++ 11 has most of the same reflection features I need, and smart pointers will replace the current managed memory system.

I'm undecided about where DMC should go beyond that. Obviously the first focus is going to be on tight integration with RDMA hardware: I want to see how fast I can make the virtual synchrony ordered multicast and the Paxos-equivalent protocol, SafeSend. Seems to me that a factor of 10,000x or more may be within reach, since Isis2 isn't really an exceptionally fast system right now. Beyond that, DMC's API may evolve away from the one in Isis2 simply because at the speeds RDMA permits, having data "in band" isn't necessarily what you would want. So this strikes me as a very interesting research topic, and I want to pursue it.

But Isis2 will also evolve -- I definitely want to bring some of those high speed features back into the Isis2 system itself, as well. So I'm anticipating a code fork, with Isis2 living on and continuing to be fully maintained and to evolve, and then DMC will be the other fork, coded in C++ and evolving too but perhaps in different directions. I don't see it as a likely replacement for Isis2 anytime soon (meaning within the next five years).