Implement dynamic scheduling

Reference no: EM132668723

CS 354 Operating Systems - Purdue University

Lab: Context-switching, Kernel Instrumentation, and Dynamic Scheduling

Objectives
The objectives of this lab are to modify context-switching to handle corner cases when instrumenting XINU to monitor system performance, and implement dynamic scheduling.

Context-switching in XINU: new and terminating processes
Corner cases are special cases that arise in context-switching of processes in an operating system that must be handled in a special manner for the system to function correctly. The two cases we will consider are context-switching in a new process and terminating (i.e., context-switching out and deleting) a running process.

1 Configuring the stack of a new process to work with ctxsw()
The discussion in the lectures of XINU's context-switching implementation centered on the assumption that the process to be context-switched in (i.e., "new" process in ctxsw()) had executed in the past and then context-switched out. Restoring the CPU state of the new process from the information in its stack relied on the process's context having been saved when it was context-switched out. This assumption obviously does not hold for a newly created process that is being context-switched in for the very first time. As discussed in the lectures, for ctxsw() to work correctly for a newly created process XINU's create() system call sets up a newly created process's run-time stack to make it appear as if it had been context-switched out in the past. That is, XINU implants fake memory into a newly created process for the purpose of allowing ctxsw() to work correctly for a new process that is about to "take its first breath."

Look into XINU's create() system call to determine how a newly created process's stack is configured so that ctxsw() "restores" (more precisely, sets up) the CPU's state to the context of the newly created process when it is selected by the resched() to run for the first time. Depict concisely the content of the process's stack near the top of its stack that is relevant for ctxsw() to work correctly. Specify the content of the stack using labels and hexadecimal notation, and their size (in bytes or bits). Explain why the value for EFLAGS is the way it is, and how this affects the code of ctxsw(). Put your answer in lab3answers.pdf under lab3/.

2 What transpires when a running process terminates
The second corner case we will consider is when a process runs and executes its last instruction, by default, ret translated by gcc from C statement return. For example, if main is specified as the first argument of create() and the code of main() completes, gcc makes sure to insert instruction ret even if the return statement is omitted in main(). The question is where does main() return to, and what happens after it returns (i.e., jumps to some code). Look into create() to determine how a newly created process's stack is set up so that when the process's starting function -- i.e., function pointer provided as first argument of create() -- returns then clean-up operations can be performed to delete the process from the system and context-switch in a different process. As in 3.1, concisely depict the content of the stack in lab3answers.pdf that is relevant for process termination. For concreteness, assume the starting function is, int addtwoinC(int a, int b), from lab1. That is, the first argument of create() is addtwoinC. Discuss what code is executed when addtwoinC() returns and how context-switching in of a different process is achieved. Since context-switching in is performed by ctxsw() which first context-switches out the "old" process whose context is saved in its stack -- in our case the process to be terminated -- does that mean a terminated process is not completely erased from the system but leaves a trace of itself? For example, its saved context in the stack? A third corner case, a variation of the second case, is when a process calls exit() to terminate itself. Does the third case require separate preparation of the process's run-time stack by create()? Explain your answers in lab3answers.pdf.

Kernel instrumentation: monitoring process CPU usage

1 Five millisecond resolution system uptime
XINU measures in unit of second the time elapsed since a backend was boot loaded, also called system uptime. In lab2, a system call, uint32 uptimesec(void), was added to return this value. In this problem, introduce a new global variable, uint32 clktime5milli, that maintains system uptime in unit of 5 millisecond. For example, clktime5milli's value being 10 means 50 msec have elapsed. Declare clktime5milli in clkinit.c and header file clock.h. Modify clkhandler() so that it maintains clktime5milli. We will utilize clktime5milli as the unit by which we monitor how much CPU time a process has used up. Common units used by operating systems including Linux and Windows include 1 msec, 4 msec, 10 msec, and 15.6 msec. We will discuss clock related issues in-depth under device management.

2 Process CPU usage
For a number of reasons including scheduling and accounting, it is important for a kernel to monitor how much CPU time a process has consumed since it was created. Add a new process table field, uint32 prcputime, initialized to 0 when a process is created that tracks its CPU usage. By CPU usage, we mean the time (in unit of clktime5milli in 4.1 which has 5 millisecond resolution) that a process spends as the current process. In XINU, interrupt handling always borrows the context of the current process irrespective of whether a task performed by interrupt handling is related to the current process or not. More refined measures may distinguish time spent in user mode vs. kernel mode. Of course, in XINU processes always run in kernel mode. For our purpose of approximately gauging how much CPU time a process has received to compare against CPU time received by other processes, prcputime will suffice.

Part A. We will consider kernel modifications needed to measure CPU usage in two parts. The first part pertains to a process that has run before, then context-switched out, and is now being context-switched back in. Part B will contain a newly created process that is being context-switched in for the first time. A process becomes current when XINU's resched() selects and calls ctxsw() to context-switch it in. It ceases to be current when the scheduler selects another process to run and context-switches out the current process. Insert code in resched() after it calls ctxsw() -- i.e., when ctxsw() returns -- to start a 5 millisecond counter, uint32 currentcputime, that is initialized to clktime5milli, to track the CPU time consumed by the newly context-switched in process until it is context switched out. Make currentcputime global following the same declaration as clktime5milli in clockinit.c and clock.h. Insert code just before calling ctxsw() to update the current process's prcputime field in the process table by adding clktime5milli - currentcputime. In case clktime5milli = currentcputime, i.e., the current process has consumed less than 5 millisecond of CPU time before being context-switching out, round up to 5 msec and add 1 to prcputime. Thus prcputime will overestimate a process's CPU usage. Accuracy can be easily enhanced by increasing time resolution to 1 millisecond. We will consider performance issues, including trade-offs, when discussing clock management later in the course. Add a new system call, uint32 proccpuuse(pid32 pid), in system/proccpuuse.c that returns a process's CPU usage stored in the process table field prcputime. If the current process calls, proccpuuse(getpid()), then update prcputime to reflect the CPU time used since being context-switched in. That is, add clktime5milli - currentcputime to prcputime and reset currentcputime to clktime5milli.

Part B. Part A is insufficient for measuring CPU usage since a newly created process that is being context-switched in for the first time will, as discussed in the lectures and followed up in 3.1, not return to resched() when ctxsw() executes ret. That is, the monitoring code added after the call to ctxsw() in resched() will not get executed which results in incorrect CPU usage estimation. This problem can be remedied in several ways. In Part B we will utilize a method based on the return address manipulation in the run-time stack practiced in lab2, also referred to as ROP (return-oriented programming). ROP is a powerful technique used in various software systems including operating systems. It is also a favorite tool employed by hackers to compromise victim software systems. To set currentcputime to clktime5milli when a process is first context-switched in, modify create() which sets up the stack of a newly created process so that ctxsw() upon executing ret does not jump to the app function (i.e., first argument of create()). Instead, it jumps to an internal kernel function, void startcpumeasure(void), in startcpumeasure.c. All that startcpumeasure() does is set currentcputime to clktime5milli and then returns. When gcc translates startcpumeasure() into machine code whose last instruction is ret, set up the run-time stack of the process in create() so that ret causes a jump to the app function (i.e., first argument of create()). Hence by manipulating the stack of a newly created process, you make ctxsw() not jump directly to the app function but make a detour to startcpumeasure(). startcpumeasure(), upon executing ret, must not return to ctxsw() but directly jump to the app function of create(). At run-time, you are able to modify the behavior of software by inserting execution of code oblivious to existing software. In particular, you do not have to modify the code of ctxsw() nor the app function to initialize CPU usage measuring in a newly created process.

Testing. Evaluate your code using two coarse tests. First, create and resume an app process of highest priority (pick a suitable value). Check where the constant QUANTUM is defined and change it to 100 so that a process has 100 msec of time budget to consume before the scheduler is called by clkhandler(). In the app process initialize a local variable, uint32 starttime, to clktime5milli. Then in a loop check if clktime5milli - starttime equals 18. The process should run undisturbed for 90 milliseconds (except for lower half interrupt handling occurring every 1 msec) since it has highest priority. When clktime5milli - starttime equals 18, call proccpuuse() to check CPU usage monitored by the kernel. The two values should approximately equal.

Second, create and resume 4 app processes running the same function as the first test case with a slight modification. Make the parent of the 4 app processes, say running main(), have higher priority than the app processes so that all child processes are created and made ready (by calling) resume before the parent process terminates. Each app process performs the same loop but changes the termination condition from clktime5milli - starttime equaling 18 to 800. Since the four processes have equal priority, round-robin scheduling in XINU should result in each process getting approximately 1/4 of the total CPU time (clktime5milli equaling 800 means around 4 seconds of actual wall clock time). Check that they get approximately the same share of CPU time, and that each app process's CPU usage is around 200 (i.e., 1 second).

Third, create your own test case aimed at complementing the first two cases. Explain in lab3answers.pdf the rationale behind your test case. Implement and check how the results compare to the first two test cases.

Dynamic priority scheduling

1 Objective of operating system process scheduling
The default XINU scheduler is a static (or fixed) priority scheduler that uses priorities specified in the create() system call and never changes them. XINU's process scheduler, resched(), when invoked, picks a highest priority ready process to run next. If there is a tie, round-robin scheduling is implemented among the group of highest priority processes. In computing systems where multiple processes share CPU resources, "fair" allocation of CPU cycles is a widely accepted goal. The notion of fairness can be assigned different technical meaning. For example, in equal share fairness if there are N tasks needing to be scheduled on a shared resource such as a CPU, every task is expected to receive 1/N, i.e., equal share of CPU time. This notion of fairness is used in several resource allocation settings including scheduling of messages, called packets, on routers which are devices that packets pass through on their journey from source to destination. Traditionally fair scheduling in the equal share sense has not been used in operating system process scheduling due to overhead and difficulties associated with its implementation. The exception is today's Linux scheduler which was revamped several years back to follow the spirit of fair scheduling. Whereas UNIX and Windows scheduler sport constant (i.e., O(1)) scheduling overhead, the Linux scheduler, called CFS (completely fair scheduler), incurs logarithmic overhead as a function of the number of processes in the system. The preceding Linux scheduler, similar to UNIX and Windows, was a constant overhead scheduler.

The main issue with applying fair scheduling to operating system scheduling is that processes exhibit diverse traits such as many being short-lived whereas some are long-lived. Enforcing meaningful equal share of CPU time when processes are not created at the same time and have different CPU demands add complications that detract from the simple idea of providing equal share of CPU cycles. Chief among the diverse traits is that some app processes tend to hog the CPU, meaning that they do little I/O. These processes are called CPU-bound processes. Other processes perform a lot of I/O which often leads to blocking system calls (e.g., read()) that context-switch out the process until it becomes unblocked at a future time (e.g., a message that read() is blocking on has arrived). These processes, called I/O-bound processes, by their intrinsic nature tend to require less CPU time than CPU-bound processes. Strictly enforcing equal share (e.g., a blocked process is not context-switched out but keeps occupying a CPU wasting cycles) leads to inefficient resource utilization. Thus additional mechanisms are required to adapt fair scheduling to process scheduling in operating systems. In so doing, much of the simplicity and elegance of fair scheduling is lost while incurring logarithmic, instead of constant, overhead.

2 Differentiated treatment of CPU- vs. I/O-bound processes
Process scheduling in operating systems has been dominated by the need to provide differentiated treatment of CPU- and I/O-bound processes, recognizing that the latter require significantly less CPU time. Scheduling aims to "compensate" I/O-bound processes for being less CPU hungry by assigning them higher priority over CPU-bound processes. Overall, this tends to enhance their responsiveness which for interactive processes such as web browsers and games that perform a lot of I/O is a significant plus. The specific scheduling algorithms employed by UNIX, Windows, and previous Linux operating systems that reward I/O-bound processes with reduced response time for consuming less CPU cycles is varied. In this problem, we will implement a method that incorporates aspects of UNIX, Windows, and constant-time Linux scheduling so that XINU's dynamic priority scheduler follows the design of modern operating systems.

3 XINU differentiated scheduler
When implementing dynamic priority scheduling in XINU, a number of files containing code will need to be modified. Specify in a text file README.3 which files you have modified, with a brief synopsis (1-2 lines) of for what purpose. Place README.3 under lab3/. Although the amount of coding is kept low, what and where coding changes are made is important to track for clarity and checking correctness.

Scheduling classes. We will call XINU's version of process scheduling that differentiates between CPU- and I/O-bound processes XDS (Xinu differentiated scheduler). XDS supports three priority levels: 0, 10, 20. Priority 0 is solely reserved for the null process so that it runs if there exists no other ready process in the system. Priority 10 is for CPU-bound processes, and priority 20 is for I/O-bound processes. It is easy to refer to CPU- and I/O-bound processes as conceptual classes, it is much harder to determine if an app process is one or the other. Processes do not have a label attached to their forehead. A process may fall in neither category but be a hybrid. Some processes may change their behavior over the course of their lifetime. We will discuss how to characterize and classify processes in 5.4.

All processes spawned using create() will start off with priority 10. That is, by default, a process will be regarded as CPU-bound until such time its behavior is deemed I/O-bound and its priority is changed to 20. A process that is classified as I/O-bound but then changes its behavior to be CPU-bound is demoted, i.e., its priority is decreased from 20 to 10. Since the priority of a process is not fixed as in legacy XINU but may change over time, XDS is a dynamic priority scheduler.

Scheduling overhead. XDS is a constant overhead scheduler (its time complexity is O(1)) since supporting two priority levels (priority 0 has only one element) can be done with two FIFO (first-in first-out) queues. For example, readylist1 is a FIFO queue with two pointers in the header: one pointing to the first element of the list, another pointing to the last element of the list. Inserting a process is a constant overhead operation (i.e., does not depend on how many processes are in the list already) since using the tail pointer the process can be added at the end. Extracting a process is also constant overhead since the head pointer identifies the process at the front of the list to be removed. If readylist1 holds all the ready CPU-bound processes of priority 10, a second FIFO queue readylist2 can be used to hold all I/O-bound ready processes of priority 20. By the scheduling invariant that a highest priority process always runs next and ties are resolved by round-robin scheduling, processes in readylist1 only get to run if readylist2 is empty. That is, all I/O-bound processes are blocked and not ready. All processes in readylist2 are scheduled round-robin, and the same goes for processes in readylist1.

In the implementation of XDS, to focus on scheduling architecture and kernel run-time behavior we will simplify the two FIFO queue design into a single queue design where we will reuse XINU's legacy readylist, a priority queue, for our purpose. That is, all I/O-bound ready processes with priority 20 will be toward the front of the queue, and all CPU-bound ready processes with priority 10 will be toward the back of the queue. The null process will always be at the tail end of the queue with unchanging priority 0. Hence in this implementation of XDS scheduling overhead is linear. Since we will not be evaluating XINU with hundreds of processes to gauge performance impact of scheduling overhead, we are trading scheduling performance for reduced coding so that we can focus on the scheduling behavior of XDS.

Time slice. CPU-bound processes are given a time slice of QUANTUM1 msec, and I/O-bound processes are assigned a time slice of QUANTUM2 msec. Following UNIX and Windows (but not the original constant overhead Linux scheduler), we will set QUANTUM1 > QUANTUM2. By default, define QUANTUM1 as 100 and set QUANTUM2 as 30 in kernel.h where legacy parameter QUANTUM is defined. We assign a smaller value time slice to I/O-bound processes in case XDS misclassifies a process as being I/O-bound. A large time slice would allow a CPU-bound process, disguised as an I/O-bound process, to monopolize the CPU for a prolonged period. As noted in the lectures, one of the problems of the constant overhead Linux scheduler was that it assigned large time slices to I/O-bound processes. We will assign time slice QUANTUM1 to XINU's null process.

4 CPU- vs. I/O-bound process classification
We will use a constant overhead, simple method used in modern kernels to classify a process as being CPU- or I/O-bound. A process that is current (i.e.,occupies the CPU) is deemed CPU-bound if XINU's clock interrupt handler, clkhandler(), finds that it has depleted its time slice. That is, countdown of preempt reaches 0 and resched() is called. If the current process's priority is 20 (i.e., was previously classified as I/O-bound), its priority is decreased to 10 before calling resched(). A process that uses up all its time slice without blocking is deemed as an indicator of being CPU-bound. This is the path to demotion. The demoted process is given a fresh time slice of QUANTUM1.

A process that is current and was classified as CPU-bound is promoted to I/O-bound with priority increased to 20 if it makes a blocking system call before its time slice is depleted. We will use sleepms() as a representative blocking system call for all other blocking system calls. One, because we have not yet studied other blocking system calls some of which we will under IPC (inter-process communication) and synchronization, and two, to reduce coding effort. If you know how to make it work for sleepms(), you know how to make it work for other blocking system calls. We assign the process a fresh time slice of QUANTUM2 msec.

Hence depending on the behavior of a process, it may toggle between the two priorities over the course of its lifetime. A process that is primarily CPU-bound is expected to spend most of its time at priority 10. A process that is mostly I/O-bound is likely to spend most of its time at priority 20.

5 CPU-bound process preemption
A CPU-bound process that is current may be preempted -- i.e., context-switched out -- as a result of an I/O-bound process that was sleeping having woken up and becoming ready. The CPU-bound process is added to XINU's readylist. If unlucky, a CPU-bound process that just became current may be context-switched out by an I/O-bound process, and in XINU's readylist, be placed at the end of all CPU-bound processes. With two ready lists, we would insert the preempted process at the front of readylist1 so that it can resume execution when there are no more ready I/O-bound processes with its partially used up time slice not refreshed.

6 Evaluation of XDS
Test applications. Code a test application, void appcpu(void), that serves as a CPU-bound app and a program, void appio(void), that acts as an I/O-bound app. The code of appcpu() (place in system/appcpu.c) works as follows:

int x, y;
x = 0;
y = clktime;
while(clktime - y < 8) // until wall clock time of 8 sec is reached do
x++;
kprintf("appcpu: %d %d %d\n", currpid, x, proctab[currpid].prcputime);

Since clktime (in unit of second) is updated by XINU's clock interrupt handler, appcpu() keeps performing ALU operations until 8 seconds have elapsed at which time it prints its PID, counter x, CPU usage (in unit of 5 msec), and terminates.

The I/O-bound application, appio() (in system/appio.c), works as follows:

int x, y, i;
x = 0;
y = clktime;
while(clktime - y < 8) { // until wall clock time of 8 sec is reached do
x++;
for(i=0; i<100000; i++) ; // consume some CPU cycles before blocking
sleepms(200);
}
kprintf("appio: %d %d %d\n", currpid, x, proctab[currpid].prcputime);

We expect the value of x and CPU usage of appio() to be significantly smaller than appcpu(). You may vary the loop bound 100000 and sleepms() parameter 200 to affect the degree to which the I/O-bound process blocks, i.e., how I/O-bound the app is.

Note: From a C programming perspective, y is assigned clktime which is a global variable that is asynchronously updated by the clock interrupt handler every 1000 msec (unless interrupts are disabled). gcc, by default, performs a number of optimizations in an attempt to make the machine code generated more efficient. Sometimes gcc can try too hard which can lead to code that is problematic, in some cases, gcc's optimizations introducing run-time bugs.

One example is a C variable that is asynchronously updated by interrupt handlers. For example, in the CPU- and I/O-bound apps in any statement where clktime occurs, gcc may be tempted to treat clktime as if it were a constant since the clock interrupt handling function clkhandler() where clktime is updated is called by clkdisp but clkdisp itself is not a function that is called by another function. That is, gcc's narrow and simplistic analysis may cause it to conclude that clktime never changes and prompt it to store the initial value of clktime on a register to reduce main memory access. Of course, clkdisp will run when clock interrupts are generated and clkhandler() updates clktime's value in main memory. Its copy kept in a register for the CPU- or I/O-bound app, however, may not be updated leading to a run-time bug.

The type qualifier, volatile, can be used to inform gcc not to engage in register optimization that can result in incorrect execution. When coding software that deals with asynchronous events, make sure to instruct gcc to refrain from well-intentioned optimizations that may do harm. gcc is but a tool, and it is up to the programmer to utilize it effectively.

Test scenario A. Create 3 CPU-bound processes from main() back-to-back. Assign the process running main() high priority so that it creates and readies all processes before terminating. If your scheduler is implemented correctly, we would expect to see the 3 processes printing similar CPU usage and x values. Discuss your findings in lab3answers.pdf.

Test scenario B. Create 3 I/O-bound processes from main() and perform the same benchmark tests as scenario A. The three process must run the same code with the same parameters (i.e., loop bound and sleep time if you modified them). Discuss your findings in lab3answers.pdf.

Test scenario C. Create 3 CPU-bound processes and 3 I/O-bound processes. We would expect the 3 CPU-bound processes to output similar x values and CPU usage with respect to each other, and the same goes for the 3 I/O-bound processes within the group of I/O-bound processes. Across the two groups, we would expect CPU-bound processes to receive significantly more CPU time than I/O-bound processes. Discuss your findings in lab3answers.pdf.

Bonus problem
For test scenario C, consider if you can code a "super predator" app, appsuperpred() in appsuperpred.c, that through its behavior (e.g., sleep duration and loop bound) can consume significantly more CPU time than the 6 processes. Explain the rationale behind your super predator in lab3answers.pdf. Test by introducing it as the 7th process in scenario C, and evaluate its ability to hoard CPU time.

Attachment:- Lab Context-switching, Kernel Instrumentation, and Dynamic Scheduling.rar

Reference no: EM132668723

Questions Cloud

What methods for the measuring performance : What methods for the measuring performance would be most suitable for the system Retrofit is using ? Why?

Which elements is most likely to be a component : Which elements is most likely to be a component of a direct reporting assurance engagement? low level of assurance. / absolute assurance.

Part of the selection process : Your team is thinking about ways it might use My Marriot Hotel game as part of the selection process.

How non-control interest affected by intra-group transaction : Find what portion of the intra-group transactions between the parent entity and the subsidiary entity will need to be eliminated on consolidation?

Implement dynamic scheduling : Lab Context-switching, Kernel Instrumentation, and Dynamic Scheduling - context-switching to handle corner cases when instrumenting XINU to monitor system

Scope creep is good-risks are managed individually : In many risk programs, risks are managed individually. A robust risk program, however, considers the cumulative effect of all risks.

Explain the external storage technologies : Conduct research on the future directions for external storage technologies. Compile a list of the various flash memory devices available today on the market.

How should the opening balances of contra-accounts be enter : How should the opening balances of contra-accounts be entered? Under which preference heading would you find the option to use account numbers?

How does netflix acquire content in the vod business : How does Netflix acquire content in the VOD business? In the DVD-By-Mail business, the "Right of First Sale" applied, but not in the VOD business.

User Account

All Pages