g.DHTEnable(2,2,1);

Jul 17, 2015 at 7:08 PM
I don't seem to be able to set the three parameters to 1. If I use anything else 2,2,1 for example the system always returns null on a get.
        //
        // enable the Distributed Hash Table (DHT)
        //
        dataLoggerGroup.DHTEnable(1, 1, 1);
Jul 17, 2015 at 8:40 PM
Edited Jul 17, 2015 at 8:42 PM
Interesting. I'll see if I can replicate this. How many members do you have in the group when the get operations are issued? Also, I think there is a typo in your question ("If I use anything else... the system always returns null on a get").

Are you saying that the system seems solid with other values for DHTEnable, but seems to return null on all get requests if you use DHTEnable(1,1,1)?

Or does (1,1,1) work, but when you do DHTEnable with other values, get always returns null?
Jul 17, 2015 at 9:09 PM

One instance of the application on two different servers Total of 2.

I can use 1,1,1 and it seems to work but if I try 2,2,1 I never get any data from the DHT.

J.D.

J.D.




Jul 18, 2015 at 2:00 PM
Edited Jul 18, 2015 at 2:42 PM
Ok, I'll see what the cause is and then I can patch the release. In fact I did fix something in the DHT code, and it worked for my

But even so, let's talk about these parameter values, because they are kind of strange.

When you say DHTEnable(2,2,1), you are telling Isis2 that:
  • You would like a group containing sharded data with 2 replicas per shard (the "Replication factor")
  • You expect to run the group with 2 members in the group as a whole ("the Expected group size")
  • Run your application even if the group only has 1 member in it.
In effect, you seem to be telling Isis2 that you want it to create a single shard, but wish to use the put/get API within that single shard.

Normally, I've always tested with, for example, 10 members in the group as a whole, so that you would have 5 shards if you used a replication factor of 2 this way. Then I might set the minimum size to 5 or even more, like 8, telling Isis2 to block the use of the DHT if it gets smaller and might not actually have any processes assigned for some of the shards (obviously, if a shard has no replicas at all, any data you try to put into that shard would be lost).

So was this (2,2,1) experiment just to test the behavior of the DHT? Because it you really intended to run this way, it suggests a misunderstanding about what a DHT is for.

To give a realistic example, companies like Google and Facebook use DHTs for its photo caching (not based on Isis2, I should quickly add). Those might have 1024 or 2048 members each, with shards of size 2 or 4. So a DHT is intended for pretty large scenarios, where you are spreading some kind of data across all those members, and want parallelism.
Jul 18, 2015 at 5:33 PM
Ok, so I ran my little DHT test for the (2,2,1) case with 2 members in the group. As far as I can tell, it worked, so there must be something about your experiment that causes the issue.

Let's start with this. In each of the two copies, after doing the DHT Put operations, call g.DHT() and just print out the tuples that each copy holds. Does this make it obvious what has gone wrong?

If not, I need you to give me a step by step formula for reproducing the issue. You are telling Isis2 to let you do DHT Put operations as soon as the group has 1 member. Do you do these Put requests first, then add the second member, then do the Get operation that fails? Maybe the Get was done by the second member? (If so that might be a state transfer issue).

If you can post an example of how to trigger the bug, and very detailed instructions on exactly how to run the example, including these kinds of sequencing things ("run one copy.... wait until it prints FOO... now run the second copy...") and also how to tell if it worked, versus if it failed, I can certainly debug the issue.
Jul 18, 2015 at 10:23 PM
Edited Jul 18, 2015 at 10:24 PM
Hold on, I found a way to get the problem you describe to occur with my copy. Looking into the cause now... As an aside, though, you really aren't doing a good job of explaining EXACTLY how to trigger issues you've encountered. For me to work efficiently, I need you to provide enough detail so that a normal person could easily reproduce the problem you've seen... or at least see exactly what code you used to trigger it...
Jul 18, 2015 at 10:49 PM
Edited Jul 18, 2015 at 10:50 PM
OK, I have a work-around for you. The DHTEnable(2,2,1) accidentally triggered code intended for really huge groups, which seems to have a bug that I didn't know about.

The workaround: on line 13546 of Isis.cs, you will see this code:
   if (this.myDHTBinSize < 2 * log2(theView.members.Length) || this.myDHTInDebugMode)
change the < to <=, like this:
  if (this.myDHTBinSize <= 2 * log2(theView.members.Length) || this.myDHTInDebugMode)
Meanwhile, I'll have to figure out why the massive-group code is malfunctioning. Then I'll post a patch to V2.2.2003. But the above comments about doing a better job when reporting issues still apply!
Jul 20, 2015 at 1:31 PM

Will do - I'll explain better in the future.

J.D.




Jul 20, 2015 at 1:53 PM

The workaround took care of the issue.

Thanks for your help.

J.D.