I recently got started on GPGPU, and the natural choice seemed to be OpenCL, since it's supported by both ATI and nVidia. The nice thing about OpenCL (other than being cross-platform) is also that it provides a layer of abstraction that allows you to use both CPUs and GPUs (not only GPUs as for nVidia's CUDA and AMD's Stream technologies).
I started investigating a few java wrappers (there's only a handful around) and ended up playing with
jogamp.jocl.
This is just a snippet showing how to retrieve a list of OpenCL enabled devices on your machine:
// create context for all devices detected using default platform
CLContext context = CLContext.create();
// an array with available devices
CLDevice[] devices = context.getDevices();
for(int i=0; i < devices.length; i++)
{
out.println("device-" + i + ": " + devices[i]);
}
Goes without saying that if you don't see your GPU in the output it's time for some painful driver sweeping. I found that Snow Leopard works straightaway with both ATI and nVidia (MacOSX 10.6.x ships with OpenCl support), while windows can be a bit trickier to setup (as we all know, Catalyst software kinda sucks).
Just to give you a sneak-peek at what comes after, once you've had a look at the output then you can go ahead and select a device to create the queue(s) you'll use for sending data up to the devices:
// have a look at the output and select a device
CLDevice device = devices[0];
// create command queue on selected device.
CLCommandQueue queue = device.createCommandQueue();
You can see the entire code for the official jogamp.jocl
'Hello World' example here if you're curious.