What are the benefits of a shared server, bearing in mind that you have to be somewhat careful about the transaction types you let use it? A shared server does three things: it reduces the number of operating system processes/threads, it artificially limits the degree of concurrency, and it reduces the memory needed on the system. Let’s discuss these points in more detail.
Reduces the Number of Operating System Processes/Threads
On a system with thousands of users, the operating system may quickly become overwhelmed in trying to manage thousands of processes. In a typical system, only a fraction of the thousands of users are concurrently active at any point in time.
For example, I’ve worked on systems with 5000 concurrent users. At any one point in time, at most 50 were active. This system would work effectively with 50 shared server processes, reducing the number of processes the operating system has to manage by two orders of magnitude (100 times). The operating system can now, to a large degree, avoid context switching.
Artificially Limits the Degree of Concurrency
Speaking as a person who has been involved in many benchmarks, the benefits of this seem obvious. When running benchmarks, people frequently ask to run as many users as possible until the system breaks. One of the outputs of these benchmarks is always a chart that shows the number of concurrent users vs. the number of transactions (see Figure 5-3).
Figure 5-3. Concurrent users vs. transactions per second
Initially, as you add concurrent users, the number of transactions increases. At some point, however, adding additional users does not increase the number of transactions you can perform per second; the graph tends to flatten off.
The throughput has peaked, and now response time starts to increase. In other words, you are doing the same number of transactions per second, but the end users are observing slower response times.
As you continue adding users, you will find that the throughput will actually start to decline. The concurrent user count before this drop-off is the maximum degree of concurrency you want to allow on the system.
Beyond this point, the system becomes flooded and queues begin forming to perform work. Much like a backup at a tollbooth, the system can no longer keep up. Not only does response time rise dramatically at this point, but throughput from the system may fall, too, as the overhead of simply context switching and sharing resources between too many consumers takes additional resources itself.
If we limit the maximum concurrency to the point right before this drop, we can sustain maximum throughput and minimize the increase in response time for most users. A shared server allows us to limit the maximum degree of concurrency on our system to this number.
An analogy for this process could be a simple door. The width of the door and the width of people limit the maximum people per minute throughput. At low load, there is no problem; however, as more people approach, some forced waiting occurs (CPU time slice). If a lot of people want to get through the door, we get the fallback effect—there are so many people saying “after you” and so many false starts that the throughput falls. Everybody gets delayed getting through. Using a queue means the throughput increases; some people get through the door almost as fast as if there was no queue, while others (the ones put at the end of the queue) experience the greatest delay and might fret that “this was a bad idea.” But when you measure how fast everybody (including the last person) gets through the door, the queued model (shared server) performs better than a free-for-all approach (even with polite people; but conjure up the image of the doors opening when a store has a large sale, with everybody pushing very hard to get through).