Papaline 0.3 introduced a new model "fork-join" for task execution. It allows you to split a task into smaller units, and execute them in parallel.
Before that, a task is processed as a single unit from the first stage to the second, the third and the last. Within a stage, all computing is done in a single thread.

This model has limitation that you are required to execute any of your stage in serial. If your task has a few split-able units, it's always better to run them in parallel. Here we have (fork) command for the situation.
For example, you are using the fanout-on-write model to build an activity stream. Once a user posted a new status, you need to find all followers(stage 1) of that user and append the status to their timeline(stage 2).
In previous version of papaline, these two stages are:
(defn find-followers [id msg]
  (let [followers (query-db-for-followers id)]
    [followers msg]))
(defn fanout-to-user-timeline [user-ids msg]
  (doseq [user-id user-ids]
    (write-redis-list user-id msg)))
In the second task, the msg is appended to user's timeline one by one.
Using (fork), the fanout-to-user-timeline can be executed in parallel.
(defn find-followers [id msg]
  (let [followers (query-db-for-followers id)]
    (fork (map #(vector % msg) followers))))
(defn fanout-to-user-timeline [user-ids msg]
  (write-redis-list user-id msg))
After the find-followers function, the result will be splitted into (count followers) parts and sent into input channel of stage 2. So the tasks execution will be like:

To collect the results of all forked sub-tasks, you can use (join). If the return value is wrapped with join, it won't trigger next stage immediately but to wait all forked tasks to finish.

So with (fork) and (join), it's very flexible to change execution model in Papaline.  Internally, I use clojure's metadata to add flags for the return value, without ruining the non-invasive design of Papaline.