I recently wrote a small tasking system, yaTS which is, I think, reasonably powerful and simple. Technically for the scheduling, it is a mix of workstealing, classical FIFO. It supports priorities, affinities, wait-for-completion. It also includes a very fast distributed memory allocator. You may find more details in the source and in the headers.
This is on Apache Licence (so, do whatever you want with it) and this is here:
Also, I will make a small serie of posts about the code here: