Project Overview

Balancing Acts

It doesn't matter how sweet your server setup is, nor how exciting the optimizations to your service, if none of the clients of your service know to use the server. Once you scale beyond a single box, the issue of routing requests to servers becomes non-trivial. Using the wrong algorithm to schedule jobs can be incredibly costly - if your algorithm is inefficient, you'll need to invest much more in your infrastructure, eventually spending far more than was required to meet quality of service constraints.

But how do you assign jobs to servers? There are common solutions, e.g. using a random server, or going through servers in order, or taking the "lightest" loaded server to send the next job to. However, it's not clear how to evaluate these algorithms, especially with respect to each other. My work is focused on evaluation and comparison of the examples presented above. In particular, I use higher order statistical moments to analyze the performance of these algorithms as a system applied to representative traces.

Using these results, I identify situations in which the algorithms differentiate themselves form each other, and identify important characteristics of each. Finally, I make some recommendations about when to use which algorithm.