The producer-consumermodel above has a standard human analogy of an assembly line. Humans have specialized skills, however, whereas threads running the same program do not need to be specialized: every thread has access to the same code and data. So any thread could do the work of any other thread.
In the bag of tasks approach, we have a collection of mostly independent "worker threads"; we no longer need to have different types of threads (one for the parser, one for the workers, etc). A bag of tasks system is very simple: every time a worker doesn't have anything to do, it goes to the bag of tasks and gets something to work on; in our web server example, this bag of tasks could contain new connections to be processed.
The advantage of the bag of tasks approach is that we can imagine all workers as being homogeneously implemented, each one running a speci?c piece of code depending on the type of the task it has to process at that time (imagine each worker as being implemented with a big switch statement).
Let us now present a way to convert the pseudo-code for the web server, given in the ?rst section, to a Bag of Tasks style. Assume we have an addWork thread and a bunch of worker threads.
addWork thread(): // only one thread of this type
while(true)
wait connection
bag.put(url)
worker thread(): /// there will be a bunch of these threads
while(true)
url = bag.poll()
look up url contents in cache
if (!in cache)
fetch from disk
put in cache
send data to client
The bag of tasks approach seems to exploit the available parallelism well, and it is easier to program and to extend. It has one major performance drawback, however, namely that there is only one lock protecting the bag; this implies contention. Also, imagine for instance that some of the workers are pretty quick; it could be the case that they acquire the lock, ?nish processing, release the lock, re-acquire it, etc. The problemnow is that each worker would be, itself, a new bottleneck.