Saturday, 30 August 2014

Pipes in Java

Sometime back I read a very well written article on PipedInputStream and PipedOutputStream at techtavern.wordpress.com/2008/07/16/whats-this-ioexception-write-end-dead/.

This article explains comprehensively, what not to do when using Pipes. If you are getting the dreaded "write end dead" exception. Have a look at it. If you still get the error come back here.

Some key takeaways from that article were:

“Write end dead” exceptions will arise when you have:
  1. A PipedInputStream connected to a PipedOutputStream and
  2. The ends of these pipe are read/writen by two different threads
  3. The threads finish without closing their side of the pipe.
  • A thread that writes to a stream should always close the OutputStream before terminating.
  • The PipedInputStream should always be read by the same thread. The PipedOutputStream should always be written by the same thread.
  • Of course, the thread of the PipedInputStream should not be the same as the one of the PipedOutputStream, or unexpected deadlock might happen.
  • In general, any InputStream or OutputStream should never be shared among different threads.
For most Pipe based  use-cases the above is sufficient information. But there are more complex use cases like the one I had and had to figure out how to take care of it. I did all of the above and still had the “Write end dead” !

Both PipedInputStream and PipedOutputStream work on two separate threads. Current implementations accross all vendors assume that the Threads accessing the streams are within the JVM. What if they belong external to this JVM. In my case the streams were being read by an OS native process. For someone else it could be that it is being rrd by another JVM. So basically the problem boils down to below questions:

  • What if you don't have control over the thread that reads from the PipedInputStream
  • or If you don't have control over the thread that writes into the PipedOutputStream and this thread, closes the stream even before you get the control back to do that explicitly.

So when will such use-cases come into picture?

Suppose you have a large file that the browser is uploading to your web application server, and for some reason you have to process this file by spawning an OS level process or some third party tool out of jvm. The third party tool consumes your InputStream as a standard input and spews out processed data in the standard output, you retrieve the output into an OutputStream and persist this somehow.

The current java pipe  implementation will fail in such scenarios. The Pipe classes do not let you set the threads that may be calling their write/read streams. In the above scenario this is what you need to have Pipes working properly.

So how did I get around this problem ?

I created my own implementation of PipedInputStream and PipedOutputStream by subclassing them and allowing them to have an externally provided Threads that can write/read streams.