I think your best option is looking up the basic theory behind threaded processing in OS design. You are, after all, talking about implementing part of an OS.
I think Intel created a thread library when they started shipping core duos. That might be of interest.
|