Introducing Workers in RingoJS

Posted on 16 November 2011

A lot of things have changed since my last blog post, in which I tried to give an assessment of multithreading issues in RingoJS. Ringo’s master git branch has recently switched from the old multi-threaded shared data model to one of shared-nothing worker threads, making that last post pretty much obsolete.

The reason for making such a fundamental change wasn’t so much any particular technical problem we had. The main motivation was the general difficulty of understanding and predicting multithreaded code - things like developing a mental model of what’s going on in your code, or debugging race conditions or deadlocks.

In this post I’ll present Ringo’s new way of dealing with threads and concurrency.

Worker Basics

The new Ringo still supports threads, but they now come in the form of workers. A worker is a thread that operates on its own private set of modules and is therefore isolated from other workers. What it boils down to is that threads in Ringo no longer share modules among each other.

The primary means of workers communicating with each other is by asynchronous message passing. This is done using an API loosely modeled after the W3c Web Workers specification, but I took some liberties to deviate from the spec where it made sense. For example, workers may share data through the global object or singletons, but they have to do so explicitly (see “Sharing data between workers” section below).

Like Web Workers, each Ringo worker has its own event loop that is guaranteed to run in a single thread, meaning that scheduled functions and external events will only be processed as soon as no other code is running. This is a big step from the previous multi-threaded setup in Ringo. The message is: if you need multiple threads, use workers.

While Ringo workers each have their own set of modules to work on, they do share the global object with its standard constructors and prototypes. This is safe because the global object in Ringo plays de facto a read-only role. This setup also allows us to pass objects between workers without going through JSON serialization because workers share the same native prototypes and constructors (something that is not true with Web Workers).

Example code

Let’s see what the worker API looks like in practice. Here’s a simple script that acts both as command line script and as worker module. The main function called when running as command line script creates a new Worker, defines a callback function and passes a greeting message to the worker. The worker will then send back a reply to its parent script.

// Module "ringo/worker" exports the Worker constructor
var {Worker} = require("ringo/worker")

function main() {
    // Create a new workers from this same module. Note that
    // this will create a new instance of this module as workers 
    // are isolated.
    var w = new Worker(module.id);

    // Define callback for messages from the worker
    w.onmessage = function(e) {
        print("Message from worker: " + e.data);
    };

    // Post a message to the worker
    w.postMessage("Hi there!");
}

function onmessage(e) {
    print("Message from main script: " + e.data);
    // Post a message back to the event source
    e.source.postMessage("What's up?");
}

// If running as command line script call main()
if (require.main === module) {
    main();
}

Note that the onmessage function which handles messages to the worker is not exported by this module. This is on purpose because it is not meant to be called directly by other modules. Instead it should be called by the worker’s own thread.

The examples directory in the RingoJS source tree contains more worker samples, covering topics such as error handling and coordination between workers using synchronous callbacks and semaphores as an alternative to asynchronous callbacks.

Workers in web apps

Workers are not only for app developers, they’re also used internally in many places in Ringo (remember, every thread is now a worker). Web applications previously used threads in Ringo, so it’s natural they’re now using Workers. (Ringo may at some point support other models such as pure single-threaded web apps.)

As long as your app only responds to a single HTTP request at a time, you won’t notice much difference between old and new thread model. Ringo workers are reused between requests, so your app should basically run in a single worker.

However, as soon as you start serving requests in parallel, things start to get more interesting. Requests will be processed in separate workers and you’ll no longer be able to share state by simply declaring top-level module variables like you previously could. The next section explains how you can work around that if you need to.

Sharing data between Workers

There are two ways to allow unrelated workers to communicate with each other: good old global variables and singletons.

Because in Ringo the top-level scope is provided by the module script scope you have to be explicit in order to define a global variable (I can’t repeat how good that is often enough). The way you do this is by assigning to the global variable, a predefined variable pointing to the shared global scope.

Here’s an example of defining a global variable for a read-only configuration object:

// Define a global variable using the global property
global.config = {
    title: "RingoJS"
};

function getMessage() {
    // Note that we can directly access config because 
    // current module scope inherits from global
    return "Welcome to " + config.title;
}

Because two workers can execute the above code simultaneously, there is no guarantee that the global config object won’t be created more than once. Often this is not a problem. However, for objects that must only be created once Ringo provides the module.singleton() method.

// Define a singleton providing a name and factory function.
// If the singleton has been previously created it will be returned, 
// otherwise the factory will be invoked exactly once to create it.
var config = module.singleton("config", configFactory);

function getMessage() {
    return "Welcome to " + config.title;
}

Working with external events

As I said above, workers encapsulate an event loop running a single thread. This poses some restrictions on how external events (coming from Java code) can be dispatched to JavaScript.

For this purpose, Ringo now comes with a new JavaEventEmitter class/mix-in in the ringo/events module. JavaEventEmitter is similar to EventEmitter but takes a Java class or interface as argument. It then builds a Java class on the fly that subclasses or implements the given class or interface, allowing calls to Java methods to be dispatched to JavaScript callbacks in either a single or multiple workers.

The way this is done is very simple and intuitive. If you register a JavaScript function as event listener with a JavaEventEmitter, the function will always be called in its own worker, on the single event loop thread of the worker. This is necessary because functions are bound to their parent scope and therefore to their worker. Calling a function originating from one worker in another worker basically breaks the single-threaded-ness of the scope in which the function was defined, and that’s something we want to avoid.

For an example of how this is used in Ringo check out the code the adds WebSocket support in ringo/httpserver.

To dispatch Java events to multiple workers, use a JavaScript object instead of a function as event listener. The object must contain two properties called module and name holding the module id and function name that should be invoked to handle the event. Each time an event occurs, the event emitter will get a dedicated worker, invoke the function with the given name on the given module, and release the worker when it’s done.

Differences to Web Workers

The biggest difference between Ringo workers and W3C Web Workers is probably the fact that Ringo workers share one set of global objects. This brings about a few other differences between the two.

First, it makes workers cheaper to create and run, because we don’t have to instantiate a full JavaScript context for each worker. In a simple test I did that involved launching thousands of workers, Ringo performed almost 10 times better than Firefox 7.

Another difference that has to do with shared globals is that in Ringo we don’t have to use JSON serialization of messages sent between workers. With Web Workers this is required because passing objects unserialized would leak prototypes and scopes from one worker to the other.

Of course, JSON serialization also serves the purpose of making message parameters read-only. After a message has been serialized and deserialized, it is decoupled from its original state.

Finally, shared globals allow the easy sharing of data between workers through the global variable as described above.

Ringo’s worker setup is clearly more permissive than Web Workers. One scenario where tightening the rules could make sense could be the introduction of remote workers, for then it would make sense to use the same semantics for local and remote workers (mandatory serialization, no shared data). For now I think the relaxed rules should work fine.

Event loop vs. long running (choose one)

Ringo now gives you the best of both worlds: You get worker threads that can run as short or as long as they please (that’s the idea behind workers, right?), and that are isolated from each other unless you deliberately decide to share state between them.

And you get a event loop with each worker, allowing you to do callback based programming with the same single-threaded semantics you know from the browser.

Of course the catch with this combination is that long-running code and single threaded event loops do not go together very well. If a function runs for an extended time the event loop will become non-responsive as callbacks pile up waiting to run. If you rely on your event loop to be responsive and fast, you need to make sure your code executes and returns quickly.

Does this invalidate our concept of providing both paradigms in the same runtime? I’m pretty sure it doesn’t. All it means is that you have to understand both styles of coding and be consistent about using one of them with each worker you write. But that’s only true within each worker - it’s totally ok to mix event loop workers with the long-running kind in any way you like, in fact that’s what workers are here for.

So you could have an app consisting of a tight event loop spinning out long running tasks to one or multiple workers. Or you could do it the other way round: a multi-worker app that uses a single event loop worker for some shared state processing. Both models will work nicely as long as you’re consistent within each worker.

If you made it till here and are still interested you definitely should check out the Ringo master branch and give workers a try. And if you do take a minute to let us know what you think!

Comments