The Mangy Yak

Tuesday, June 27, 2006

Fun With java.util.concurrent

Not much time to post today, so this one is going to be a quickie. ;-)
Today I had quite a lot of fun with FutureTask and ConcurrentHashMap. I attempted to implement a lockless protocol for controlling access to the distributed process servers (which seems to be working, BTW). I had to ensure that a process server would be launched only once per address/port pair. If more than one thread were competing to get a reference to a server, I had to ensure that all but one thread would remain blocked. The thread that didn't block would then be responsible for launching the server, and ultimately for making the reference available to itself and to all other blocked threads. FutureTask made that synchronzation a whole lot easier. Communicating failure to the blocked threads was also quite easy.

There were also some other issues like error recovery that I had to deal with, but I have to say - the more I use j.u.c, the more I like it. Some of the code I wrote is based on a few tips that Tim Peierls posted on concurrency-interest a while ago in response to a question posed by Greg Luck of ehcache. I wasn't able to fully grasp the elegance of that code until today.

One of the most displeasing facts about asynchronous systems, however, is distinguishing real failure from asynchronous drift is so difficult. I say that because my code manipulates lots of proxies - JDI "mirrors", CORBA proxies and RMI proxies - and these proxies might fail at any instant, but sometimes that failure is legal, sometimes it is not. Sometimes I cannot know whether my failure seems illegal because some other event I was expecting didn't arrive yet, or if it's because that event has never been emmited (in which case, failure has indeed happened).

Also, in many occasions you have incoming events that have to trigger updates on other components that have threads running inside of them. Once again, ensuring that these threads behave correctly (and don't go accessing dead proxies and sockets causing spurious exceptions) involves adding tons of state-checking code inside components.

To get this a bit clearer, consider the distributed thread manager. The distributed thread manager will respond to DDWP (Distributed Debugging Wire Protocol) events and attempt to access node proxies based on them. When a node dies, it emmits an event. When a node death event arrives at the distributed thread manager, it will no longer attempt to access that node proxy. However, a DDWP event may well arrive before a death notification - the distributed thread manager will then try to access a node proxy and, of course, will fail. Maybe communication has been severed (failure). Maybe it's because the death event has been delayed (not failure). Who knows - you may keep things pending and fail on timeout, you might sit and wait (maybe forever), or just accept the situation as legal, even though it might not be.

I'm not saying it is impossible to deal with these situations, I'm just saying that it requires careful reasoning about the order in which things might happen and about what should be the correct way to react to the different ways in which events that are required for you to get a clear picture of what's really happening might be missing. Oh, I love it. :-)

Phew. I guess it wasn't a quickie after all.

Tuesday, June 20, 2006

Bitten by licensing

Darn it. How could I have ever managed to miss that the open source version of ICE was GPL? I really don't know what I'm gonna do now - I don't have time to rewrite this code. I also don't want to rewrite, as my other alternatives - CORBA, RMI and sockets - are way inferior.

Well, I think I understand why they chose GPL as their open source license. Using GPL is the perfect way of separating proprietary and free worlds - if you want to be proprietary, you have to pay the fee. If you can be completely open source, however, you can use ICE freely as well.

The problem is that EPL is somewhere in between: it is an open source license, but it allows you to create proprietary derivative work. This means that future work may not be completely open source - and the Eclipse Foundation wants that, that's what makes Eclipse an attractive platform for many developers. But that's clearly not what the ICE guys want.

Oh well. I'm stuck until I solve this.

After a while struggling with my poor process management infrastructure, I decided to implement a server from scratch.

Well, it's finished and is just peachy. :-)
I think the work did pay off. I used ICE to get it working and, I have to say, I liked it. Nice support for asynchronous invocations, simple and well-documented API. Well, it's from the experts.

If you'd like to take a look at the code you can access it at my CVS repository.

Oh yeah, things are working out.

Thursday, June 08, 2006

Oh I hate making these decisions!

A very important component of my distributed debugger is its process management infrastructure. Its main role is managing (launching/killing) remote processes and communicating with their interfaces - which means communicating with remote graphical interfaces and redirecting stdin, stdout and stderr to the debug server.

However, the way this infrastructure is currently built (on top of SSH), doesn't allow it to do everything it's supposed to. Killing "disconnected" processes is particularly painful, since I communicate with the SSH client via string parsing and an Expect Tcl/Tk script. Yuck. String parsing is extremely unreliable. I wasn't caring much about this issue (since JDI allows me to kill remote machines), but now that I'm writing distributed tests for the debugger I must correct this.

I need to use SSH because I must be able to import remote graphical interfaces (I could do it with pure X, but SSH and firewalls are almost standard in Linux machines these days). Anyway, after taking a look at the DSDP-TM project I realized they're building a framework to do just what I need. Still, I can't decide in favor of its adoption.

Here are the forces:
  1. DSDP-TM seems kind of immature.
  2. I don't know if I can use it on top of SSH in the way I need.
  3. I have already integrated my configuration modules with my launch infrastructure through the Eclipse Launch Framework, which means I'd have to do it all over again (have no idea if this would be easy or difficult).
  4. I think I can code up and debug a small wrapper process for redirecting stdin/stdout/sterr, collecting lifecycle information, and responding to keep alive events in a day or two.
  5. It would be great to be an early adopter of an Eclipse subproject. I could contribute and maybe even help shape its future.
  6. I don't know if I have enough time to spend learning about DSDP-TM. I must complete my milestones on schedule.
  7. DSDP-TM is tied to Eclipse 3.2, which was a bit too buggy for adoption last time I checked.

Oh, man. What should I do?

Wednesday, June 07, 2006

Oh yeah, I have a blog!

Well, I'm not really a mangy yak. I just couldn't think of anything better when I had to enter a title.

I really can't predict what the fate of this blog will be. It will start as a means for publishing my progress with my Summer of Code project (http://code.google.com/soc/eclipse/about.html). If I still have the energy to post when the project is over (by september perhaps?) I might as well start talking about something else. Or not. :-)

And that's all for today.