In critical computer applications the correctness of a delivered output and the continuity of the required service beside the speed of the used CPU are the most important measures for computer performance. Invalid output or interruption of the service in such applications may lead to loss of life or money or the synchronization of a real-time control system. A CPU completes two phases to deliver the output: the first phase is instruction fetching of the executed program from the main memory into its instruction register and then decoding that instruction to determine what operation should be performed. The second phase is instruction execution by fetching any required data and performing the intended operation on that data to produce the output. This sequence for instruction fetching and instruction executing has to be repeated for all instructions in the program being executed until its end.
The output from the described sequence is acceptable if it is correct and delivered at the required time. This is true if and only if the proper instruction is fetched and the proper data is processed. The sequence of those two phases is guaranteed by the timing sequence of the Instruction fetching and decoding unit. But this is not always the case! Some times the CPU violates this sequence due to some internal or external errors. This violation may lead to fetching an incorrect instruction or data. The result is then either incorrect output or incorrect program execution. In both cases the performance of the CPU is degraded even though its speed is high enough.
In this thesis a solution is introduced for detecting instruction fetching and executing errors and stopping the execution whenever an invalid instruction or data is being fetched in order to prevent the CPU from producing incorrect results and preventing program crash. This solution is based on adding a bit the instruction format in the CPU architecture. This bit distinguishes between an instruction byte from an operand byte.
An instruction fetching and decoding circuit has to be developed to test the added bit. If that bit indicates an instruction code and the fetching phase is an instruction fetch cycle then that instruction is decoded otherwise an error is assumed and the instruction is stopped. By this way the error is detected earlier and preventing the CPU from executing an invalid instruction or processing invalid data.
Most of the errors that affect the CPU performance are of transient type. Thus by giving some time to such an error it will disappear. Therefore by refetching the same instruction after that time period, the CPU can continue the execution of the program without any loss of data or program control flow.
The solution will be implemented by providing an Instruction Format for a CPU architecture that uses this technique. The required circuit will be fully described by its architecture and organization, timing diagram and operational flowchart. To evaluate the capabilities of this solution, the introduced circuit is then simulated. The behaviour of the circuit is monitored and recorded. The ratio between the number of injected and detected errors gives a measure of error detection coverage and the time between the moment of an error injection and detection gives a measure for error detection latency. Then the results will be collected, tabulated and discussed to show the capability of the proposed solution and its drawback.
Thesis Objectives: The following objectives are aimed to be accomplished during the work on this thesis:
1. Determining the error models that affect program execution.
2. Introducing an error detection technique for instructions fetching and decoding stages.
3. Introducing an architecture for an instruction format that prevents program crash.
4. By the introduced technique correctness and continuity of the output will be achieved.
Activities and Time Scale: The following activities have to be carried on in order to meet the objectives of this thesis and to complete this work. At each stage a report is required that forms a part of the final document of the thesis:
Activity 1: Literature Survey and Previous Work:
- CPU architecture and instructions execution phases
- Error models effecting internal CPU behaviour.
- Error detection and correction used in CPU.
- common used mechanism for error detection in CPU.
Activities 2: Introducing the proposed circuit
- Developing the structure and format for the proposed solution
- Introducing a conceptual CPU architecture that uses the proposed solution
- Developing the operational flowchart describing the functions and sequences.
Activity 3: Evaluation and Results:
- Developing a simulator for the proposed solution (using any available Package).
- Experimenting the simulator by test programs.
- Injecting errors in different paths that are followed by an instruction and recording
the behaviour of the system.
- Tabulating and reporting the results.