Abstract Title of Dissertation: Compiler-Assisted Scheduling for Real-Time Applications: A Static Alternative to Low-Level Tuning Seongsoo Hong, Doctor of Philosophy, 1994 Dissertation directed by: Assistant Professor Richard Gerber Department of Computer Science Developing a real-time system requires nding a balance between the timing constraints and the functional requirements. Achieving this balance often requires last-minute, low-level intervention in the code modules { via intensive hardware-based instrumentation and manual program opti- mizations. In this dissertation we present an automated, static alternative to this kind of human- intensive work. Our approach is motivated by recent advances in compiler technologies, which we extend to two speci c issues on real-time programming, that is, feasibility and schedulability. A task is infeasible if its execution time stretches over its deadline. To eliminate such faults, we have developed a synthesis method that (1) inspects all infeasible paths, and then (2) moves instructions out of those paths to shorten the execution time. On the other hand, schedulability of a task set denotes an ability to guarantee the deadlines of all tasks in the application. This property is a ected by interactions between the tasks, as well as their individual execution times and deadlines. To address the schedulability problem, we have developed a task transformation method based on program slicing. The method decomposes a task into two subthreads: the IO-handler component that must meet the original deadline, and the state-update component that can be postponed past the deadline. This delayed-deadline approach contributes to the schedulability of the overall application. We also present a new xed-priority preemptive scheduling strategy, which yields both a feasible priority ordering and a feasible task- slicing metric. Compiler-Assisted Scheduling for Real-Time Applications: A Static Alternative to Low-Level Tuning by Seongsoo Hong Dissertation submitted to the Faculty of the Graduate School of The University of Maryland in partial ful llment of the requirements for the degree of Doctor of Philosophy 1994 Advisory Committee: Assistant Professor Richard Gerber, Chair/Advisor Professor Ashok K. Agrawala Professor John Gannon Professor Virgil Gligor Associate Professor William Pugh Associate Professor Udaya A. Shankar c Copyright by Seongsoo Hong 1994 Table of Contents List of Tables vii List of Figures viii 1 Introduction 1 1.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Compiler Transformations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 1.2.1 Feasible Code Synthesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 1.2.2 Real-Time Task Slicing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 1.3 Summary of Contributions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 1.4 Outline of the Dissertation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2 Related Work 10 2.1 Formal Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2 Real-Time Programming Languages : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.3 Real-Time Compiler Tools : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 3 The Language of Time-Constrained Events 15 3.1 Design Goals : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 3.2 Timing Constructs and Their Semantics : : : : : : : : : : : : : : : : : : : : : : : : : 16 3.3 Example TCEL Code : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 iv 4 Basic Notations 21 4.1 Flow Graphs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21 4.2 Basic Compiler De nitions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23 5 Transformation 1: Feasible Code Synthesis 27 5.1 The Problem and Our Solution : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28 5.1.1 The Problem of Feasible Code Synthesis : : : : : : : : : : : : : : : : : : : : : 29 5.1.2 Solution Strategy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32 5.2 Section Decomposition : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 33 5.2.1 Determining Section Boundaries : : : : : : : : : : : : : : : : : : : : : : : : : 33 5.2.2 Deriving Code-Based Timing Constraints : : : : : : : : : : : : : : : : : : : : 35 5.3 Code Scheduling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 38 5.3.1 The Top-Level Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 39 5.3.2 Subroutine Schedule Section(S,D,DUR(S),V bar ) : : : : : : : : : : : : : : : : 40 5.4 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 49 6 Transformation 2: Real-Time Task Slicing 50 6.1 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 51 6.1.1 Characterization of Discrete Control Software : : : : : : : : : : : : : : : : : : 52 6.1.2 Fixed-Priority Preemptive Scheduling : : : : : : : : : : : : : : : : : : : : : : 54 6.1.3 Scheduling with Compiler Transformations : : : : : : : : : : : : : : : : : : : 56 6.2 Automatic Task Decomposition by Program Slicing : : : : : : : : : : : : : : : : : : : 59 6.2.1 The Program Slicing Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 60 6.2.2 Assigning Times to Subtasks : : : : : : : : : : : : : : : : : : : : : : : : : : : 62 6.3 Scheduling Alternatives and Their Analyses : : : : : : : : : : : : : : : : : : : : : : : 64 6.4 Priority Ordering with Task Slicing : : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 6.4.1 Feasibility Test : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 68 6.4.2 The Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 68 6.4.3 A Larger Example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 69 v 6.5 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 70 7 Practical Considerations and Prototype Implementation 73 7.1 Practical Considerations. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 73 7.1.1 Limits of Data-Flow Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : 74 7.1.2 Limits of Timing Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 74 7.1.3 User Interaction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75 7.2 TimeWare/SLICE: the Prototype Implementation : : : : : : : : : : : : : : : : : : : : 75 7.2.1 TimeWare/SLICE Tool Screens : : : : : : : : : : : : : : : : : : : : : : : : : : 76 7.2.2 Commands : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 76 7.2.3 Implementation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 79 7.3 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 80 8 Conclusions 81 8.1 Future Directions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 83 vi List of Tables 5.1 Timing Constraints of S3 and S4 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 38 6.1 Example Task Set : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 71 6.2 Priority Assignment with Program Slicing : : : : : : : : : : : : : : : : : : : : : : : : 72 vii List of Figures 1.1 Software Development Process: Two Alternatives : : : : : : : : : : : : : : : : : : : : 2 1.2 Event-Based Speci cation of Sensor-Controller System : : : : : : : : : : : : : : : : : 4 1.3 Two TCEL Programs with the Same Semantics : : : : : : : : : : : : : : : : : : : : : 5 1.4 Run-Time Behavior of a TCEL Periodic Task : : : : : : : : : : : : : : : : : : : : : : 8 3.1 Flow Graph of do Construct : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 3.2 Behavior of Periodic Timing Construct : : : : : : : : : : : : : : : : : : : : : : : : : : 18 3.3 (A) TCEL Source Code for the Flight Controller and (B) Corresponding Flow Graph 20 5.1 Software Development Process: Revisited for Feasible Code Synthesis : : : : : : : : 28 5.2 Flow Graph of do Construct and its Section Division : : : : : : : : : : : : : : : : : : 34 5.3 Flight Controller Program: After Section Decomposition : : : : : : : : : : : : : : : : 36 5.4 Top-Level Algorithm for Code Synthesis : : : : : : : : : : : : : : : : : : : : : : : : : 39 5.5 (A) Source Code in TCEL, (B) Corresponding Intermediate Code in GSA Form, (C) Intermediate Code after Bookkeeping-Free Transformations, and (D) Intermediate Code after Transformations with Bookkeeping : : : : : : : : : : : : : : : : : : : : : : 41 5.6 Speculative Code Motion: (A) Original Code, (B) Corresponding GSA Code and (C) Speculatively Transformed Code : : : : : : : : : : : : : : : : : : : : : : : : : : : 43 5.7 The Section Scheduling Algorithm (Schedule Section) : : : : : : : : : : : : : : : : : : 46 5.8 The Section Scheduling Algorithm (Sched) : : : : : : : : : : : : : : : : : : : : : : : : 47 5.9 Flight Controller Program: After Code Scheduling : : : : : : : : : : : : : : : : : : : 48 6.1 Software Development Process: Revisited for Real-Time Task Slicing : : : : : : : : : 51 viii 6.2 Generalized Slicing Method for Schedulability Tuning : : : : : : : : : : : : : : : : : 52 6.3 Task Instance at the k th Period : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 53 6.4 Dynamic Behavior of Periodically Implemented Control-Loop : : : : : : : : : : : : : 53 6.5 TCEL Program for Task  2 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 56 6.6 Simulated Time Line for the Example Task Set : : : : : : : : : : : : : : : : : : : : : 57 6.7 Decomposed Task at the k th Period : : : : : : : : : : : : : : : : : : : : : : : : : : : 58 6.8 Two Decomposed Subtasks of Task  2 : : : : : : : : : : : : : : : : : : : : : : : : : : 58 6.9 Program Dependence Graph : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 6.10 Slice with respect to Criterion hL9;cmdi : : : : : : : : : : : : : : : : : : : : : : : : : 62 6.11 Slicing a Conditional : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 63 6.12 Algorithm for Priority Ordering with Slicing Decision : : : : : : : : : : : : : : : : : 70 7.1 Tool Screens of TimeWare/SLICE : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 77 7.2 Output of TimeWare/SLICE : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 78 8.1 Software Development Process: Two Alternatives : : : : : : : : : : : : : : : : : : : : 82 ix Chapter 1 Introduction A real-time application is characterized by the existence of two competing factors: its functional speci cation and its temporal requirements. Functional speci cations de ne valid translations from inputs into outputs. As such they are realized by a set of programs, which consume CPU time. Temporal requirements, on the other hand, place upper and lower bounds between occurrences of events [9, 24]. An example is the robot arm must receive a next-position update every 10ms. Such a constraint arises from the system's requirements, or from a detailed analysis of the application environment. Temporal requirements implicitly limit the time that can be provided by the system's resources. Figure 1.1 illustrates the real-time software development process from a high-level requirements speci cation to a low-level implementation. First, high-level requirements are translated into real- time programs. Then the programs are compiled into real-time tasks, and timing constraints are imposed on these tasks. After the compilation, the programmers must assure that each of the tasks in the application will meet its timing constraints on the underlying hardware platform. This process is called a feasibility test. The programmers must also assure that the whole application will adhere to the timing con- straints, even in the presence of direct and indirect interactions among the tasks, such as task blocking and preemption. This process is called a schedulability test. If the programmers cannot guarantee either of the tests, the result is careful re nement of the implementation. As shown in Figure 1.1, we can think of two alternatives as a means for the low-level system re nement: system-tuning and compiler-based task optimization. Low-level tuning of the application can be a time-consuming process. It often involves using expensive hardware monitors (e.g., in-circuit emulators, logic analyzers, etc.), manually counting instruction-cycle times, hand-optimizing the code, and experimenting with various orderings of 1 High-Level Real-Time Programs Timing Constraints on Tasks compilation synthesis Feasibility Tests (Intra-Task Consistency) Requirements Speci cation Decomposed Tasks, Schedulability Tests (Inter-Task Consistency) Compiler-based Transformation Methods Feasible Code Synthesis (Chapter 5) failure failure failure success re nement re nement Real-Time Task Slicing (Chapter 6) System-Tuning, Re-implementation failure Figure 1.1: Software Development Process: Two Alternatives operations. As a last resort, entire subsystems may have to be re-designed altogether. The goal of this dissertation research is to provide programmers with a powerful alternative to low-level tuning. It is based on compiler-based transformation tools. Our approach consists of two inter-related components: a real-time programming language and static compiler transformation methods. The programming language not only provides a means for expressing timing constraints within a source program, but it also lays a foundation for sound compiler transformation methods. As shown in Figure 1.1, our compiler transformation methods include feasible code synthesis and real-time task slicing. 2 1.1 Motivation Our approach is motivated by the success of static compiler transformation technologies in code parallelization and program debugging. It is a natural step to adopt these technologies to real-time domains, where developers lack powerful software engineering tools to apply at the tuning stage. This stage can consume a disproportionate amount of project's budget. To carry out this approach, it is imperative to nd a real-time programming language to pro- vide high-level timing constructs with unambiguous semantics. This allows us to de ne a class of semantics-preserving transformations for them. Unfortunately, most existing real-time program- ming languages do not ful ll the requirement. We conceive this as a problem of \code-based speci cations." Consider experimental real-time programming languages which have been proposed in the lit- erature [23, 27, 30, 33, 40]. They provide high-level real-time constructs such as \within 10ms do B," where the block of code \B" must be executed within a 10 millisecond time frame. These languages, while providing a convenient frameworkfor expressing time in programs, have done little to ease the process of translating a real-time speci cation into schedulable code. The reason is straightforward: Language constructs such as \within 10ms do B" establish constraints on blocks of code. However, \true" real-time properties establish constraints between the occurrences of events [9, 24]. While language-based constraints are very sensitive to a program's execution time, speci cation-based constraintsmustbe maintained regardlessofthe platform'sCPU characteristics, memory cycle times, bus arbitration delays, etc. Our approach is to treat a real-time program as (1) an event-based timing speci cation, which represents the system's real-time requirements; and (2) a functional implementation, that is, the system'scode. We carryout this approachwith a real-time syntaxquite similar to thosefound in the abovementioned languages. However, in our approach the interpretation is quite di erent. Instead of constraining blocks of code, the timing constructs establish constraints between the observable events within the code. The language we propose is called Time-Constrained Event Language (TCEL). As an exam- ple of TCEL code, consider the following speci cation fragment, which is rendered pictorially in Figure 1.2: (1) Every 25ms, an external sensor sends a message to the controller, containing physical world measurement data. (2) The controller must receive every message. 3 (3) Using the sensor data and the current state,the controller computes a next-position command and sends it to an actuator. (4) To achieve steady state, transmission of cmd is made no earlier than 3.5ms after receipt of data. (5) To guarantee response-time threshold, transmission of cmd is made no later than 4.0ms after receipt of data. (6) Based on the sensor-input, the controller updates its current state. from Sensor toActuator start of period cmd=nextCmd(...); state=nextState(...); output(Actuator, cmd)input(Sensor, &data)  3:5ms  4:0ms 25:0ms Figure 1.2: Event-Based Speci cation of Sensor-Controller System Within ourevent-based framework,theprogramfragmentsin Figure 1.3realize thespeci cation. The system's only observable events occur instantaneously during the executions of the \output" and \input" operations. The \do" statement establishes timing constraints only between these two operations. On the other hand, the local statement \cmd = nextCmd(data, state)" is only constrained by the program's natural control and data dependences. Armed with this interpretation, our compiler treats both programs as having equivalent seman- tics. This is quite di erent from the approaches mentioned above, where timing constructs establish constraints on code. In that interpretation, program A would rst receive its data, then delay for 3.5ms and nally, evaluate nextCmd and send the result within the remaining 0.5ms. Program B would receive its data, evaluate nextCmd, then delay for 3.5ms and nally, send the result within 4.0ms of evaluating nextCmd. Both programs may fail to implement the speci cation on some hardware platforms. If nextCmd is a CPU-intensive function (and thereby requires over 0.5ms of execution time), program A is 4 / Program A / every 25ms do input(Sensor, &data); start after 3.5ms nish within 4.0ms f cmd = nextCmd(state, data); state = nextState(state, data); output(Actuator, cmd); g / Program B / every 25ms do f input(Sensor, &data); cmd = nextCmd(state, data); g start after 3.5ms nish within 4.0ms f state = nextState(state, data); output(Actuator, cmd); g Figure 1.3: Two TCEL Programs with the Same Semantics 5 inherently unschedulable. On the other hand, program B establishes a constraint between the evaluation of nextCmd and the nextState,and not between the twospeci ed events. Bothprograms would have to be rewritten to achieve the desired e ect. The necessary corrections would include manually decomposing nextCmd, as well as adjusting the timing constraints. The actual changes would heavily depend on the particular characteristics of the computer, and thus, the very reason for using high-level timing constructs would be defeated. There are several immediate bene ts to our semantics for real-time constructs. First, a source program is not hardware-speci c, and thus maintains the abstract, \portable" spirit of a high-level language. Since the timing constraints refer only to speci cation-based events, they need not be platform-speci c. Second, this decoupling of timing constraints from code blocks enables a more straightforward implementation of an event-based speci cation. But of most importance, some of the arduous, assembly-language level hand-tuning can now be accomplished semi-automatically { by compiler optimization techniques. In this dissertation we present twoof such techniques: one that relies on Trace Scheduling, and the other based on program slicing. Traditionally, Trace Scheduling has been used in instruction scheduling, etc., and program slicing has been used in programanalysis and debugging. Here, the objective is di erent: to achieve guaranteed real-time performance. In doing this we use the observable events as \signposts," which constrain the places where code can be moved. These events, as well as data dependences, establish the limiting constraints for the optimization algorithm. 1.2 Compiler Transformations Armed with the event-based semantics of TCEL, we have developed two compiler transformation techniques, which we call feasible code synthesis and real-time task slicing. The objective of feasible code synthesis is to achieve internal consistency between a task's execution time and its timing constraints speci ed with \do" constructs. On the other hand, the objective of real-time task slicing is to enhance inter-task schedulability. This transformationis applicable to \every"periodic constructs. 1.2.1 Feasible Code Synthesis We call a code segment infeasible if its execution time stretches over its speci ed deadline. Fea- sible code synthesis localizes and corrects such a fault via a two-step process. First, the compiler decomposes a \do" construct into a set of code blocks according to the control structure, and then 6 automatically derives a set of dispatch equations from the language-based timing constraints. This step is necessitated by a gap between the event-based semantics model of source programs and the code-based execution model of real-time schedulers. In fact, most real-time schedulers only accept timing constraints on the start and nish times of tasks. The compiler thereby satis es the scheduler's requirements by transforming event-driven source programs into constrained blocks of code, and providing the dispatch equations for the scheduler. Next, the compiler attempts to correct feasibility faults with respect to the derived dispatch equations. This is done by a variant of Trace Scheduling, in which worst-case execution time paths of the infeasible task are selected, and unobservable code is moved to shorten their execution time. For simple illustration, suppose that nextCmd in Program A of Figure 1.3 executes longer than 0.5ms and hence Program A is infeasible. As a solution, \nextCmd(data, state)" can be inlined, and then decomposed into two parts { one which is ow-dependent on the parameters data and state, and another which is invariant of them. This second part can be lifted out of the do statement altogether. We elaborate on this transformation method in Chapter 5. 1.2.2 Real-Time Task Slicing Real-time task slicing considers a more ambitious goal { inter-task transformations for schedulabil- ity. The e ect of this transformation is global, though it is individually applied to a small number of tasks selected from an unschedulable application. Thekey idea behind this method is based on asimple fact, thatis, anapplication's schedulability improves if we increase the deadlines of its constituent tasks. The same e ect is achieved by allowing atasktoslide pastitsdeadline, while maintaining the original event-based semantics. Wecan realize this bene t by transforming a task, so that its time-sensitive component always executes within its frame, while postponing the rest of the task. To systematically carry out the transformation, we harness a novel application of program slicing [41, 52, 53]. An unschedulable task is decomposed into two subthreads: one that is \time- critical" and the other \unobservable." The unobservable subthread is then appended to the end of time-critical thread, with TCEL semantics being maintained. Figure 1.4 pictorially illustrates the net e ect of this transformation. The downward (upward) arrow of the k th frame represents the execution of input (output) event in the same time frame. Since the goal of the transformation comes from the scheduler component of the real-time system, the framework consists of the following ingredients. 7 k BnZr1 st k th k + 1 st k + 2 nd period Figure 1.4: Run-Time Behavior of a TCEL Periodic Task (1) An algorithm which uses standard xed-priority preemptive scheduling analysis to nd un- schedulable tasks which need task slicing. (2) A program slicer which decomposes a task and isolates the thread that can be postponed. (3) An online component of the scheduler which enforces precedence constraints between task interactions. In Chapter 6 we will discuss in detail the xed-priority preemptive scheduling paradigm as well as the application of program slicing method. 1.3 Summary of Contributions The major contributions of this dissertation is itemized as follows:  The event-based abstraction is a widely used approach in the formal methods literature. We have extended it into a full-blown real-time programming language called TCEL. The event- based semantics of TCEL makes it straightforward to realize a high-level speci cation into a real-time program. This semantics allows a clear and unique interpretation of high-level timing constructs, and thus enables us to de ne safe compiler transformations for TCEL pro- grams. Although TCEL has been implemented on top of the C programming language, it can be used with other programming languages such as Pascal or Fortran. It is a general, exible mechanism for high-level timing annotations, and is not speci c to a particular language.  We have developed an automatic method of translating a high-level TCEL program into schedulable units of code. When implemented within a TCEL compiler, this method decom- 8 poses a TCEL program into tasks, and then derives code-based timing constraints from the language-based timing constraints. This method automatically re nes the original timing constraints to account for the e ects caused by programs' control structures.  We have developed an automatic compiler method that is used to transform an infeasible task into a feasible one. This problem is an intractable one, as will be proved. Thus we have invented an approximation approach based on Trace Scheduling.  We have developed another compiler transformation method that is used to enhance the real- time schedulability of an input task set. This method automatically decomposes a task so thatthe real-time scheduling method can guaranteetimely execution of observable operations, while local operations need not lead to scheduling overrun.  We have developed a priority ordering algorithm that not only assigns feasible priorities, but also selects a subset of tasks to be transformed. The guiding metric is schedulability. 1.4 Outline of the Dissertation This dissertation is organized as follows. In Chapter 2 we survey background and related work. In Chapter 3 we introduce TCEL with an example program. In Chapter 4 we introduce basic notations for ow graphs and standard compiler technologies which are used throughout the disser- tation. In Chapter 5 we present the rst transformation method, feasible code synthesis, along with a motivating example. Then, in Chapter 6 we present the second transformation method, real-time task slicing, and we discuss its underlying theory of xed-priority, preemptive scheduling. In Chap- ter 7 we consider the practical limitations of our compiler-based approach, and propose solutions to overcome these problems. We also brie y show TimeWare/SLICE, our prototype implementation of the real-time task slicer, which re ects our philosophy on tool-based system development. Finally, in Chapter 8 we conclude this dissertation with future research directions. 9 Chapter 2 Related Work In this chapter we survey related work in real-time programming. First, we review some of formal methods that have laid the foundation for our event-based semantics. We also study real-time programminglanguages, since they havein uenced the design ofTCEL'stiming constructs. Finally, we survey compiler-based real-time tools, and compare them with our approach. 2.1 Formal Methods TCEL's semantics was inspired by a principle commonly applied in formal methods. That is, when reasoning about a real-time concurrent system it is often useful to consider only \events of interest," and to abstract away local-state information. Indeed, almost all formal models ease this process by making some distinction between an \event" and a corresponding \action." In this section we survey four such methodologies: Real-Time Logic [24], RTRL [9], Timed IO Automata [37] and ACSR [29]. RTL. Real-Time Temporal Logic possesses an underlying event-action model. It captures the temporalordering between anapplication's actions andits events. An actionin RTL is theexecution of an operation which consumes a certain amount of system resources. The e ects of actions are revealed by events that are generated before and after they execute. The occurrence of an event is de ned to be instantaneous, while the execution of an action takes non-zero time. Thus a timing constraint is an assertion about temporal relationships between certain events. The conjunction of these timing assertions are \the system speci cation," and they must imply that key safety properties are maintained. 10 RTRL. RTRL (Real-Time Requirements Language) is a formal speci cation language developed for use in modeling telephone switching systems. In RTRL, a real-time system is viewed as a nite-state machine, in which a response at any instance is determined by the system's state and the external inputs. Hence the timing constraints are established between external inputs and responses. Unlike the events in RTL, however, these external inputs and responses denote signals (of a switching system) rather than points in time, and thus have duration times. Timed I/O Automata. A timed I/O automaton is essentially a state transition system consist- ing of state variables and events, where each of the events has an enabling condition and an action. A timed I/O automaton has a set of timing assumptions, each of which speci es an event set and an interval during which the event set can be continuously enabled since its last occurrence. Thus the timing assumptions place constraints on the event- ring times [46]. A \behavioral abstraction" of a timed I/O automatonallows reasoning about only the event sequences, since local state information can be ignored in the abstraction. ACSR. The computation model of ACSR (Algebra of Communicating Shared Resources) ad- dresses two key issues in modeling a real-time system: concurrency and communication. An ap- plication speci ed in ACSR consists of a set of communicating processes that use shared resources for execution and synchronization with one another. A timed action in ACSR takes unit time to execute and consumes a set of resources during that time. Synchronization between actions is supported by events that are instantaneous, and consume no resources. Impact of Formal Methods. Strongly motivated by the event-action models, we have extended the event-action abstraction into to a \full-blown" real-time programming language, in which the \events" correspond to actual IO operations within C code. A logical consequence of the event- action model is the ability to exploit this looser semantics, and to use compiler transformations to move unobservable instructions out of over-constrained code blocks. 2.2 Real-Time Programming Languages Most other real-time languages do not make such a distinction, and instead place constraints on the boundaries of code blocks. Two paradigms are used in these languages: either constraints are expressed directly in the program itself (as in [33, 30, 54]), or they are postulated in a separate interface, and then passed to the scheduler as directives. A common language-based approach ( rst 11 presented in [30]) is to provide constructs such as \within t do f:::g," \at t do f:::g" and \after t do f:::g." An alternative, taken in [33], is to set up linear constraint expressions on the the start times and deadlines of code blocks. We have borrowed from both approaches: in the TCEL source we use the higher-level constructs, while in our intermediate code we make use of the constraint representation. But in TCEL the semantics is quite di erent, as it establishes constraints between the observable events within the code, and not on the code's textual boundaries. 2.3 Real-Time Compiler Tools There have been many other compiler-based approaches to real-time programming, most of which address di erent real-time programming problems and rely on di erent techniques. However, they share a common goal, namely, enhancing predictability and schedulability of real-time applications. In this section we survey tools, and then show where we can place our tools in the realm of real-time programming. Schedulability Analyzer for Real-Time Euclid. Among theearly approaches wasthe schedu- lability analyzer [49], speci cally developed for the timing and scheduling analyses of programs written in Real-Time Euclid [27]. The schedulability analyzer consists of a front-end and a back- end. The front-end is incorporated into the code generator of the Real-Time Euclid compiler, and it produces a segment tree that represents compilation units such as modules, monitors or routines. Using a segment tree, the back-end computes the execution time pro le of each segment consid- ering synchronization, phasing, and IO between segments, as well as execution times of segments' instructions. We believe that such a technique is infeasible in a practical sense, because it presents an intractable circularity between the compiler-based analyzer and the scheduler. The result of the analysis a ects that of scheduling, and vice versa. Compiler-Assisted Adaptive Scheduling. Gopinath and Gupta introduced a technique called compiler-assisted adaptive scheduling in [18]. In their work, the compiler indexes a piece of code into four classes on the basis of predictability and monotonicity. Then it rearranges the code to support adaptive run-time scheduling. Unlike our approach, which is entirely static in terms of both program analysis and scheduling, their approach was developed to aid in dynamic runtime scheduling. However, their work still demonstrates that a successful interplay between the compiler and scheduler is possible without introducing a circularity between them. 12 Partial Evaluation. Partial evaluation may conceptually be understood as compile-time eval- uation of constant expressions. Partial evaluation of a source program not only reduces constant expressions in the program into simple values, but also simpli es some control structures such as loops and conditionals. In [39] Nirke and Pugh applied the technique of partial evaluation to real- time programming to produce residual code that is both more optimized and more deterministic. Safe Real-Time Code Optimizations. Conventional compiler optimizations are designed to reduce the expected or average execution time of programs without taking in account the exact timing behavior of programs. On the other hand, optimizations for hard real-time must meet stringent timing constraints. In [36] Marlowe and Masticola examine a large class of conventional code transformations,and then classify them forapplication in real-time programmingby the notion of safe real-time code transformations. To do so, they assume time-critical statements (or events) in the underlying programming language, and interpret deadlines as relationships between the executions of time-critical statements. This approach comes closest to our work, in that the timing behavior of real-time programs is described in terms of events or the executions of time-critical statements. Unlike the semantics for TCEL, however, the execution times of non-time-critical statements are not explicitly decoupled from timing constraints imposed on the events. This is because the start of a process or start of a critical section is still considered \time-critical." As a result, the applicability of some transformations may be unnecessarily restricted. Timing Tools. Predicting the worst-case execution time of a program is a fundamental require- ment to build a real-time application. Indeed, the results of our compiler-based methods rely on timing tools, since both the feasibility and schedulability of tasks are a function of predicted worst-case executions of the transformed tasks. The technique reported in [42] is based on a simple source-level timing schema, and it is fairly straightforward to implement in a tool. In [19] another approach for more accurate timing was proposed. Since the resulting tool is able to analyze micro-instruction streams using machine- description rules, it is retargetable to various architectures. On the other hand, both approaches do not address the problem of predicting architecture-speci c timing behaviors due to various latencies in the memory hierarchy and pipelines. Recently, several approaches have been developed to account for this timing variance. In [55] Zhang et al. presented a timing analyzer based on a mathematical model of the pipelined Intel 80C188 processor. This analysis method is able to take into account the overlap between instruction execution and fetching, which is an improvement over schemes where instruction executions are treated individually. In [44] Arnold et al. developed a 13 timing prediction method called static cache simulation to statically analyze memory and cache reference patterns. A similar but more advanced approach was reported in [32]. While the latter approach is able to predict pipeline stalls as well, both approaches essentially rely on attribute grammars [2] to propagate cache hit information backward in a ow graph. However, no static timing tool is precise enough to be used with complete con dence for de- veloping production-quality software. Moreover, even sophisticated timing analysis methods such as [32, 44] are not appropriate for ne-grained instruction timing. In Chapter 7 we explain how we can e ectively use these tools in spite of the limitations, by also taking advantage of software pro ling, as well as static timing prediction. Speci cally, our slicing technique does not require any static analyzer: it can be used to rst transform the program, with the timing carried out later by a runtime pro ler. 14 Chapter 3 The Language of Time-Constrained Events In this chapter we present TCEL's timing constructs to express timing constraints within a high- level program. These high-level timing constructs make explicit reference to observable events as well as time, so that the speci cation-level timing constraints can be extracted from the source code, and then conveyed to real-time schedulers. 3.1 Design Goals The objectives of the TCEL design are (1)to lay a semantic foundation for developing safe real-time compiler transformation techniques; and (2) to provide high-level timing constructs that enable a compiler to automatically translate a high-level real-time speci cation into executable code. In addition to these, the design of the timing constructs is motivated by the following practical goals:  To add a minimal set of features to existing languages. To this end, we embedded timing constructs in C, and then extended it as a real-time programming language. Our constructs are syntactic descendents of the temporal scope in [30].  To keep the de nition of an observable event as general as possible. The notion of an observ- able event is a relative term depending on the application system. We support a generalizable mechanism in that observable events are classi ed by the programmer, and the resulting event speci cation is externally provided for a TCEL compiler. Such a speci cation may include input and output operations, message passing primitives and memory accesses to shared vari- ables. For simplicity, in the sequel we consider all \input" and \output" operations to be observable. 15 3.2 Timing Constructs and Their Semantics We rst elaborate the \do" construct which establishes several types of relative timing constraints. Its general form is as follows. do hreference blocki [start after t min ] [start before t max1 ] [ nish within t max2 ] hconstraint blocki The reference block (RB) and the constraint block (CB) are simply C statements, or alterna- tively, timing constructs themselves. The \do" construct induces the following timing constraints:  start after t min : There is a minimum delay of t min between the last event executed in the RB, and the rst event executed in the CB.  start before t max1 : There is a maximum delay of t max1 between the last event executed in the RB, and the rst event executed in the CB.  nish within t max2 : There is a maximum delay of t max2 between the last event executed in the RB, and the last event executed in the CB. Since either block may contain conditionals, depending on the program's state there may be several such events executed either \ rst" or \last." For example, consider the fragment from a typical ow graph in Figure 3.1. Depending on the path taken, the last event executed in the reference block may be either E1 or E2. Similarly, the rst event in the constraint block will be E3 or E4, while the last event will be either E4 or E5. To denote such possibilities, we introduce two mappings FIRST and LAST from code blocks to sets of events. That is, LAST(RB) = fE1;E2g, FIRST(CB) = fE3;E4g and LAST(CB) = fE4;E5g. Thus, the \do" construct introduces two potential constraints between an executed event from LAST(RB)and another fromFIRST(CB),aswell as one constraint between two executed events from LAST(RB) and LAST(CB) each. Thesecond real-time constructdenotes astatementwith cyclic behavior ofapositive periodicity: every p [while hconditioni ] [start after t min ] [start before t max1 ] [ nish within t max2 ] hconstraint blocki 16 E1 E2 E3 B2 B1 B3 B4 B5 B6 B7 B9 B8 Reference Block Constraint Block (RB) (CB) event node local op. node E4 E5 Figure 3.1: Flow Graph of do Construct As long as the \while" condition is true, the observable events in the constraint block execute every p time units. Akin to an untimed while-loop, when the condition evaluates to false the statement terminates. However, unlike the untimed counterpart, event operations cannot be part of the condition. In its real-time behavior, the interpretation of the \every" construct is similar to that of \do." For example, assume that the statement is rst scheduled at time t, and that the \while" condition is true for periods 0 through i. The periodic constraints established by this statement are depicted in Figure 3.2, where the time-line shows the rst two instances of the statement. Examining the time-line, we see that the every statement is released at time t, and that within the rst frame, the rst observable event (denoted by an arrow) occurs between t + t min and t+t max1 . Similarly, the rst frame's last event occurs before t+t max2 . Generalizing, the following constraints are induced for period i where i  0:  start after t min : The rst event executed in the CB occurs after t+ ip+ t min .  start before t max1 : The rst event executed in the CB occurs before t+ ip+ t max1 .  nish within t max2 : The last event executed in the CB occurs before t +ip+t max2 . 17 observable event occurrence t t + p t + 2p t + t min t + t max1 t + t max2 t + p + t min t + p + t max1 t + p + t max2 t + ip Figure 3.2: Behavior of Periodic Timing Construct 3.3 Example TCEL Code As we have stated, timing constructs may be arbitrarily nested. Consider the program in Fig- ure 3.3(A), which is a (very gross) 2-dimensional abstraction of an aircraft navigation/control loop. A set of route coordinates are maintained in the array \GOAL," which is managed by another mod- ule. These coordinates are accessed using index variable \i," which is initialized before the periodic loop, and updated within the loop body. The TCEL program's role is to (1) sample the aircraft's current coordinates, its (true) heading, roll, and its ground speed; (2) get the next route coordinate to visit; (3) compute the relative attitude between the heading and the coordinate; and (4) adjust the course by updating throttle and roll. Adjustments are made in discrete increments, and are contingent on the current roll and velocity, as well as the amount that the course must be changed. The timing constraints are induced as follows: (1) Control updates are made periodically, with rate 50/second. (2) In order to give the actuators time to get updated (and for the craft to respond accordingly), all updates must be made within the rst 5ms of each period. (3) Velocity (ground speed) is obtained via a \request-response" protocol from an external unit; the response arrives with maximum latency of 0.75ms. (4) To correlate ground speed with outputs, all throttle and ap updates must be made within of 3.1ms of actual ground speed sample. In the best case this may be made upon issuing the request. If the speci cation mandated additional timing constraints, clearly we could employ further 18 levels of nesting to achieve them. For example, suppose we desired to add a fth timing requirement to the four listed above: (5) The nal two outputs must be correlated within 0.5ms of each other. (This is not unrealistic, since the two outputs control are coordinated to e ect the angular adjustment.) To accomplish this we would replace the sequential composition with an additional nested TCEL statement: do output(THROT, throttle); nish within 0.5ms output(FLAP Cntrl, w ap); The net runtime e ect would simply be a re nement of the potential behaviors; i.e., the time-event relationships exhibited by the altered program would be a subset of those in the original version. 19 every 20ms nish within 5ms do f / Get current position, heading & roll / input(GPS, &x, &y); input(NAV, &theta); input(IMU, &roll); / Request current velocity / output(Cntrl out, REQ VEL); g start after 0.75ms nish within 3.1ms f / Get current velocity / input(Cntrl in, &vel); / Update target position / if (GOAL[i].passed) f gx = GOAL[i].x; gy = GOAL[i].y; i = (i+1) % NCOORD; g / Using relative attitude w.r.t target, / / compute angular adjustment. / rtheta = compRelAtt(theta, x, y, gx, gy); if (jrthetaj < EPS) dtheta = 0.0; else f if (vel < VHIGH) dtheta = rtheta; else dtheta = safeDtheta(rtheta, roll); g / Adjust ap and throttle for heading. / w ap = compFlapw(roll, vel, dtheta); throttle = compThrot(roll, vel, dtheta); output(THROT, throttle); output(FLAP Cntrl, w ap); g _ _ _ Periodic t min max1 max2t t Entry Node Exit Node Reference Block (RB) Constraint Block (CB) _ _ true false input(GPS, &x,&y) if(GOAL[i].passed) rtheta=compRelAtt(theta, x,y, gx,gy) if(|rtheta| 0) =) v 0 = 10; do f v 1 = (v 0 ;v 2 ); v 2 = v 1 BnZr1; g while( v 2 > 0) 25 Here we denote the value initialized before the loop by v 0 (subscript 0). Gated Single Assignment Form. Although SSA form allows for ecient representation of data ows in a program, it possesses a weakness in terms of program transformability. For example, in the above code the -function is structurally bound to the conditional \if P," since the -function \v 3 =(v 1 ;v 2 )" can be executed only after the outcome of \P" is known. Such a coupling will be avoidable, if we parameterize the -function with the associated predicate. Gated single assignment (GSA) form solves this problem [20]. GSA form is an extension of SSA form with new functions that encode control over value assignment. The following code fragments show translation from a conditional into the code in GSA form. if cond then v = f(); else v = g(); a = v; =) P = cond; if P then v 1 = f(); else v 2 = g(); v 3 = (P;v 1 ;v 2 ); a = v 3 ; The -function denotes the extension. The rst argument of the -function is the predicate associated with the original -function and its remaining arguments are the same as the -function. Having our intermediate programs in GSA form, we can treat -functions in the same way as we deal with any other statement. In GSA, loops need a special form of pseudo-assignment function other than a -function. The reason is the con uence points at loop headers are not associated with particular conditional predicates. Thus for each variable v de ned in a loop body, a de nition v 0 = (v init ;v iter ) is inserted at the loop header, where v init is the initial value of v reaching the header from outside and v iter is the iterative value reaching along the back-edge of the loop [20]. In our example below we denote v init and v 0 by v 0 and v 1 , respectively. v = 10; do f v = vBnZr1; g while(v > 0) =) v 0 = 10; do f v 1 = (v 0 ;v 2 ); v 2 = v 1 BnZr1; g while( v 2 > 0) 26 Chapter 5 Transformation 1: Feasible Code Synthesis In this chapter we address translating a TCEL program into a low-level representation eligible for real-time scheduling. To give a clear understanding of the problem domain, consider Figure 5.1. Our problem domain at hand { denoted by the components with bold lines { actually contains two problems. The rst problem lies in compiling a TCEL program into the schedulable units, and the second problem deals with attaining the feasibility of an infeasible task. We approach the rst problem with a compiler-based decomposition method. This method en- ables a TCEL compiler to automatically translate a TCEL source programinto a set of decomposed tasks, and it derives a set of timing equations for them. Later, at runtime, a scheduler bene ts from this decomposition method, since it can eciently dispatch the tasks according to their time schedules (or priorities) based on both their execution times and associated timing equations. The second problem arises when any task is determined to have an execution time con icting with the derived timing constraints. In this case, it is impossible for any scheduler to generate a feasible schedule for the task set. We develop a compiler transformation method to automatically correct such feasibility faults based on a variant of Trace Scheduling [12]. We name this method feasible code synthesis. In this method, the compiler picks the worst-case execution time paths of an infeasible task and moves unobservable code to eliminate the overload. The remainder of this chapter is organized as follows. In Section 5.1 we formally state the feasible code synthesis problem, and show that it is essentially an intractable problem. Then we brie y discuss our solution strategy. In Sections 5.2 and 5.3 we explain the transformation strategy. 27 High-Level Real-Time Programs Timing Constraints on Tasks compilation synthesis Feasibility Tests (Intra-Task Consistency) Requirements Speci cation Decomposed Tasks, Schedulability Tests (Inter-Task Consistency) Compiler-based Transformation Methods Feasible Code Synthesis (Chapter 5) failure failure failure success re nement re nement Real-Time Task Slicing (Chapter 6) System-Tuning, Re-implementation failure Figure 5.1: Software Development Process: Revisited for Feasible Code Synthesis 5.1 The Problem and Our Solution Compiling a TCEL program into schedulable code raises twofundamental issues, which are endemic to all real-time systems. 1. A program may not be feasible, i.e., a single process's execution time may con ict with its own real-time constraints. 2. While the program may be feasible, it may not be schedulable under any tractable real-time scheduling algorithm. As for the feasibility problem, consider the following TCEL fragment: 28 do input(P, &m); start after 10ms nish within 20ms f input(Q, &x); S; [20ms] output(R, y); g The code's timing constraints mandate a 10ms latency between the events generated by \in- put(P, &m)" and \input(Q, &x)," as well as a 20ms deadline between the events generated by \input(P, &m)" and \output(R, y)." Meanwhile, the bracketed \20ms" denotes that the un- observable statement S requires a maximum of 20ms to execute, a bound obtained by a timing analysis tool (e.g., [19, 32, 42, 47, 55]). Consequently, the program possesses an inherent con ict, since S requires 20ms to execute while it is only allowed 10ms. We address this problem by an approach we call feasible code synthesis. In our example this would involve decomposing S and, if possible, moving instructions not dependent on \x" out of the overloaded section. However merely achieving feasibility may be of little help, since the transformed code may still not be schedulable under any known methods. For example, when a program contains if-then-else branches, then the actual execution paths (and the events executed) are determined dynamically. But since schedulers must provide guarantees, they do not have the exibility to instantaneously, dynamically reschedule a task set whenever an event is triggered. Indeed, while an event-based semantics makes conceptual sense at the source-program level, most real-time schedulers only accept timing constraints on the start and nish times of tasks. Since the strategies used to achieve feasibility have a profound a ect on the ultimate task structure, we solve these two problems together. Thus the role of the TCEL compiler is to partition event-driven source programs into time-constrained blocks of code, in which all of the blocks are feasible. 5.1.1 The Problem of Feasible Code Synthesis Even without the code block partitioning component, achieving feasibility is a nontrivial problem. This is obviously the case if we allowed potentially unbounded loops (or conditional jumps), which would render the problem undecidable. But since real-time programs must be amenable to worst- case timing estimates, we assume that upper bounds can be obtained for execution times. Formally, 29 we let \wt(S)" denote a statement's worst-case execution time, where we assume that wt(S) 2 [0;1) for any statement S. Obviously \wt" is implementation dependent, and its tightness is determined by the quality of the analysis tool used to generate the bound. 1 While the feasibility problem may be decidable for our domain, it is not necessarily trivial. Even when a program is structured like our example above, and where S is a basic block, simply deciding whether the program can be made feasible is still NP-hard. The problem can be stated as follows: given a TCEL timing construct do RB start after t min start before t max1 nish within t max2 CB is it possible to transform RB and CB to meet the following constraints? (1) On any execution path, the original ordering of observable events is maintained. (2) The original data and control dependences are not violated between instructions and events. (3) The code's execution time does not con ict with the timing constraints between the events. Clauses (1)-(2) imply that the transformed code must be functionally correct: events may not be reordered, and the original relationships between input and output data must be maintained. Clause (3) means that the new code is feasible. This problem is NP-hard, due to the existence of immovable operations and data dependences. Theorem 5.1 Feasible code synthesis is NP-hard. Proof: The proof follows by a straightforward transformation from \Partition[SP12]" [13] to fea- sible code synthesis. Consider an instance (A;s) of Partition, where A = fa 1 ;a 2 ;:::;a n g is a set of elements, and where s : A 7! N is the cost of each element. Letting P n i=1 s(a i ) = 2T, a partition of A is some A 0  A such that P a2A 0 s(a) = T = P b2ABnZrA 0 s(b). Determining whether such an A 0 exists is equivalent to determining whether the following TCEL program can be made feasible: 1 We return to this issue in Chapter 7. 30 do E 1 : input(P, &x); start after T start before T nish within 2T f E 2 : output(Q, g(x)); L 1 : x 1 =f 1 (x); [s(a 1 )] L 2 : x 2 =f 2 (x); [s(a 2 )] . . . . . . L n : x n =f n (x); [s(a n )] E 3 : output(R, h(x 1 ;:::;x n )); g Here E 1 BnZrE 3 generate events, L 1 BnZrL n are unobservable instructions, and each line is considered atomic (that is, a line must be relocated as a single entity). Then by the construction, (1) E 2 , L 1 BnZrL n are mutually data independent. (2) E 2 , L 1 BnZrL n are data dependent on E 1 . (3) E 3 is data dependent on L 1 BnZrL n . If we assume that wt(L i ) = s(a i ) for 1  i  n, and that wt(E j ) = 0 for 1  j  3, then there exists a partition of our original set A if and only if there is a feasible transformation for the program. As for the \only if" part, assume there is a partition A 0  A. Then for all a i 2 A 0 , moving the corresponding instruction L i between the events E 1 and E 2 ensures feasibility: exactly T execution time is consumed between E 1 and E 2 , and another T is used between E 2 and E 3 . As for the \if" part, assume there is a feasible transformation. Then, since 2T execution time is used overall, the constraints mandate that exactly T of it be used between E 1 and E 2 , and the rest between E 2 and E 3 . Thus we must move some set of instructions L  fL 1 ;::: L n g between E 1 and E 2 , where P L j 2L wt(L j ) = T. But then the corresponding elements in A form a partition. When both the constraint block and reference block consist of straight-line code, the problem is obviously in NP as well. A feasible ordering can always be \guessed" and then veri ed, which consequently yields the following corollary. Corollary: Feasible code synthesis is NP-complete for straight-line code. 31 5.1.2 Solution Strategy In proving Theorem 5.1 we used the simplest possible TCEL program, which possessed just two basic blocks. In this case a feasible transformation would simply reorder the instructions, while keeping the program's fundamental structure intact. But the situation gets signi cantly more complicated when the program possesses branches, and when the events that get executed are determined at runtime. Since attaining feasibility mandates that we statically guarantee the timing constraints along all execution paths, reordering the instructions along a single path may not be sucient. In the worst of all cases, all paths of a multi-branching program would have to be \expanded," and then each one reordered individually { potentially requiring time and space exponential in the original number of instructions. Clearly this is not an attractive solution approach. Moreover, it would render the problem of schedulable task decomposition { our second objective { next to impossible. Thus we take the following alternative approach, in which feasible tasks are synthesized in a two-step process { section decomposition (Section 5.2) and code scheduling (Section 5.3). Section Decomposition. First the code is translated into its gated single assignment (GSA) form [8, 20]. This representation serves two purposes: (1) it yields a compact means of representing the data-dependence relation discussed in Chapter 4; and (2) GSA's convention of uniquely naming each variable assignment is precisely what we require in the code transformations phase. Next, the timing construct is decomposed into sections, which represent the natural schedulable \task" units. This step involves determining the section boundaries, as well as generating a set of dispatch equations that constrain the start and nish times of each section. Code Scheduling. In this step the dispatch equations are checked for their consistency with the code's execution time. If there is an inconsistency, program transformations are used to relocate unobservable code across section boundaries. The algorithm is a greedy approximation, in that each section is processed locally, with the goal of attaining consistency for an entire program. The following strategy is used: if a given section is determined to be over-constrained, code is moved out of the section and \up" on the HFG. After the section at hand is determined consistent, another section can be processed in reverse topological order. Thus the net result is an \upward migration" of unobservable instructions, which terminates either when the program achieves consistency, or it is determined to be inconsistent. An analogous strategy is used for programs with nested timing constructs. In this case the HFG 32 is scheduled in a \bottom-up" manner. That is, the innermost nodes in the HFG are scheduled rst, with the goal of satisfying their local constraints. Then, the surrounding node is handled. If this level is found inconsistent, the inner nodes are \opened up" once again, and more aggressive optimization is carried out. The actual transformations are similar to those used in Trace Scheduling [12]. As the name implies, the Trace Scheduling algorithm works on speci c traces: it selects a path (or trace) from a given code block, and then selects instructions on that path to move. Such an approach is well suited to our purpose, because it can focus on the traces that have the maximum execution time among all traces within a given code block. 5.2 Section Decomposition Section decomposition is the process of decomposing the program into a set of \code blocks" (or sections). The input is the original HFG (e.g, that portrayed in Figure 3.3(B)), with the output being a slightly di erent HFG, which is more amenable to real-time dispatching. This involves dividing a timing construct into ve code sections, as portrayed in Figure 5.2. As can be seen, the reference block is decomposed into three sub-blocks. The unobservable code before the rst observable statement becomes an interface section (S1). The code containing the observable state- ments becomes the reference section (S2). The unobservable code after the observable statements becomes the rst part of the delay section (S3). Consequently, the topmost unobservable code of the constraint block becomes the second part of S3, and so on. 5.2.1 Determining Section Boundaries Recall the discussion of the FIRST and LAST functions in Chapter 3. Since a code block may contain complicated control structures, we require a convenient means of de ning the boundaries of S2 and S4 { the sections that contain observable events. We accomplish this by inserting \markers" in the ow graph, which consume no time and are not visible. The following marker de nitions guarantee that there are unique boundaries into and out of the sections containing observable events.  begin S2: This marker is inserted directly after the unobservable instruction most closely dominating LAST(RB) [ fexit(RB)g. 33 E1 E2 E3 B2 B1 B3 B4 B5 B6 B7 B9 B8 interface section delay section Entry/Exit interface section E4 E5 Reference Block(RB) Constraint Block(CB) S1 S2 S3 S4 S5 begin_S2 end_S2 begin_S4 end_S4 Figure 5.2: Flow Graph of do Construct and its Section Division  end S2: This marker is inserted directly before the unobservable instruction most closely post-dominating LAST(RB) [ fentry(RB)g.  begin S4: This marker is inserted directly after the unobservable instruction most closely dominating FIRST(CB) [ fexit(CB)g.  end S4: This marker is inserted directly before the unobservable instruction most closely post-dominating LAST(CB) [ fentry(CB)g. For example, consider the constraint block in Figure 5.2. The unobservable node B9 post- dominates LAST(CB) and the entry node. Thus, its logical place is in the interface section S5, which is not subject to the construct's timing constraints. Hence we need the marker end S4, which is the unique exit point for the constrained section S4. Now, let the variable S2.start correspond to the actual time that the marker begin S2 is \exe- cuted" (that is, the dispatch time of section S2), and let S2. nish correspond to the time that the section ends. Similarly, let S4.start and S4. nish represent the start and nish times of section S4. Using these variables we can represent the section decomposition of a TCEL construct in a manner similar to that found in the Flex language [25]. Recall the ight controller program from Figure 3.3. Figure 5.3 illustrates its constituent sec- tions. The constraint-expression for S6 corresponds to the program's outer, periodic loop. As the 34 program is in GSA form, -functions appear at con uence points where di erent values of the same variable in the original program merge. For example, \dtheta4= (c3,dtheta2,dtheta3)"is inserted at the place where two di erent values of dtheta merge. As a result, each use of a variable is reached by a unique assignment. Also, -functions for variables gx, gy and i are inserted at the periodic loop header so that the initialized values are merged with the iterative values. Again, the bracketed numbers denote the maximum execution times of the corresponding operations on the targeted CPU. On modern architectures, ne-grained operations like simple assignments possess minuscule execution times, and cannot be measured (in isolation) by any timing tool. For the sake of presentation we assume that such instructions take zero time, and concentrate on larger-grained function calls and the like. 2 5.2.2 Deriving Code-Based Timing Constraints As seen in Figure 5.3, the code-based timing constraints can be expressed as conjunctions of linear inequalities between start-times and nish-times of di erent sections. However, note the di erence between thecode-based constraintsandtheTCEL source-level constraints: In Figure 3.3the\ nish within" deadline is 3.1ms, while in Figure 5.3 it is tightened to 3.0ms. There is good reason for this { the new code-based timing constraints must be strong enough to guarantee the original semantics of the event-based constraints. That is, they must take into account the program's execution-time characteristics. In general, consider the TCEL construct such as do RB start after t min start before t max1 nish within t max2 CB Obviously, the original TCEL parameters are not tight enough to guarantee the correctness of the code-based constraints. For example, if we wish to maintain the \t max1 " requirement, it is not sucient to simply mandate that S4 starts within a maximum delay of t max1 after S2 ends (though this is certainly necessary). We can see in Figure 5.2 that the event actually executed in LAST(S2) may be E1, while the event executed in FIRST(S4) may be E4. Thus the naive strategy fails to factor in the execution times of B3 and B4. However, the event-based semantics is clear: the time between the executed event in LAST(S2) and the executed event in FIRST(S4) is at most t max1 . To guarantee that this occurs, we must account for all possible execution scenarios. Speci cally, we must tighten the constraints, allowing for the maximum amount of time between an event in LAST(S2) and end S2, as well as the 2 We revisit this issue in Chapter 7. 35 S6: (S6.start[p]p20ms, S6. nish[p]p20ms+5ms) f S1: f gx1 = (gx0, gx3); gy1 = (gy0, gy3); i1 = (i0, i3); input(GPS, &x1, &y1); [.1ms] input(NAV, &theta1); [.1ms] input(IMU, &roll1); [.1ms] g S2: f output(Cntrl out, REQ VEL); [.1ms] g S3: f /* Null */ g S4: (S4.startS2. nish+0.75ms, S4. nishS2. nish+3.0ms) f input(Cntrl in, &vel1); [.1ms] c1 = GOAL[i1].passed if (c1) f gx2 = GOAL[i1].x; gy2 = GOAL[i1].y; i2 = (i1+1) % NCOORD; g gx3 = (c1, gx2, gx1); gy3 = (c1, gy2, gy1); i3 = (c1, i2, i1); rtheta1 = compRelAtt(theta1, x1, y1, gx3, gy3); [.25ms] c2 = jrtheta1j < EPS; if (c2) dtheta1 = 0.0; elsef c3 = vel1 < VHIGH; if (c3) dtheta2 = rtheta1; else dtheta3 = safeDtheta(rtheta1, roll1); [.43ms] dtheta4 = (c3, dtheta2, dtheta3); g dtheta5 = (c2, dtheta1, dtheta4); w ap = compFlapw(roll1, vel1, dtheta5); [.95ms] throttle1 = compThrot(roll1, vel1, dtheta5); [.89ms] output(THROT, throttle1); [.1ms] output(FLAP Cntrl, w ap); [.1ms] g S5: f / null / g g Figure 5.3: Flight Controller Program: After Section Decomposition 36 maximum amount of execution time between begin S4 and an event in FIRST(S4). We must similarly adjust t max2 . To do this, we make the following de nitions:   S2 def = maxfwt(p) j e 2 LAST(S2); e ! p end S2g. 3   S4 def = maxfwt(p) j e 2 FIRST(S4); begin S4 ! p eg. Note that  S2 and  S4 are sensitive to not only code's execution time characteristics, but also changes made to some paths between events and markers during program translation. For example, changes to paths between end S2 and a node in LAST(S2) might require re-evaluation of  S2 . Now the code-based timing constraints can be postulated as follows: (1) S4:start  S2: nish +T min (where T min = t min ) (2) S4:start  S2: nish +T max1 (where T max1 = t max1 BnZr S2 BnZr  S4 ) (3) S4: nish  S2: nish +T max2 (where T max2 = t max2 BnZr S2 ) These timing constraintsare strongenough toguaranteethe original event-basedtiming constraints. (By convention, if the \start after" constraint is omitted, we consider t min to be 0. Similarly, when either the \start before" or \ nish within" constraints are missing, we consider t max1 = 1 or t max2 = 1, respectively.) Returning to Figure 5.3, we can see that equation (3) indeed mandates tightening the original 3.1ms to 3.0ms. Now we wish to determine when (1)-(3) can be met. That is, what do these equations infer about the program's allowable worst-case execution-time behavior? This can easily be derived if we add precedence constraints re ecting the natural ow of the program; i.e., that S4 executes after S3, which executes after S2: (4) S2: nish + wt(S3)  S4:start (5) S4:start+ wt(S4)  S4: nish Eliminating S2. nish, S4.start and S4. nish from (1)-(5), we end up with: (a) T min  T max1 (b) wt(S3)  T max1 (c) wt(S3)+ wt(S4)  T max2 (d) wt(S4)  T max2 BnZrT min 3 We overload the wt-function for paths, since a path can be thought of as a sequential composition of statements. 37 Section Duration Constraint (DUR(S)) S3 minfT max1 ;T max2 BnZrwt(S4)g S4 T max2 BnZrT min Table 5.1: Timing Constraints of S3 and S4 Obviously, (a) had better be true in order for the TCEL construct to make any sense. For the pur- poses of our algorithm we combine (b) and (c), yielding the following two constraints on execution times: () wt(S3)  minfT max1 ;T max2 BnZrwt(S4)g () wt(S4)  T max2 BnZrT min These are the necessary and sucient conditions to achieve feasibility, and they are summarized in Table 5.1. Returning to our example, we nd that section S4 violates its duration constraint (DUR(S4) = 2:25ms), 4 since wt(S4) by far exceeds it. (Adding up the time annotations yields wt(S4) = 2:82ms along the worst-case execution time path.) In the next section we discuss our code-scheduling techniques which handle cases such as this, in which the duration constraints fail to hold. 5.3 Code Scheduling The code scheduling algorithm is inspired by a common compiler strategy used for VLIW and superscalar architectures [3, 10, 12, 14, 38, 48]. In such domains, an optimizing compiler exploits a program's inherent ne-grained parallelism, and \packs" its computations into as many functional units as possible. Thus the objective is to keep each unit busy, and to achieve better overall throughput. Our problem context has an entirely di erent goal, and it cannot be solved by directly applying well-known techniques such as Trace Scheduling [12] or Percolation Scheduling [3, 10]. We are concerned not with enhancing average-case performance, but instead with ensuring feasibility. In fact, we will be satis ed with even increasing the program's overall execution time { as long as the timing constraints are met. 4 T max 2 BnZrT min = 3:0msBnZr0:75ms = 2:25ms 38 algorithmCode Scheduling(T) / T is a timing construct / input: the ordered set of sections fS1;S2;:::;S5g in T begin d = T max2 BnZr T min ; / Schedule code from S4 into S3 / call Schedule Section(S4, S3, d, ;); recompute T max1 ; / to re ect the change in  S4 / if (wt(S3)  T min ) then exit(\No scheduling needed for S3."); else d = minfT max1 ;T max2 BnZrwt(S4)g; / Schedule code from S3 into S1 / call Schedule Section(S3, S1, d, Def(S2)); end Figure 5.4: Top-Level Algorithm for Code Synthesis 5.3.1 The Top-Level Algorithm Our approach to code scheduling is a greedy approximation, and it attempts to attain the desired feasibility of a timing construct in a section-by-section manner. It inspects sections S4 and S3 (in reverse topological order), and checks whether they satisfy their duration constraints. If S4 violates its constraint, the algorithm attempts to reduce its surplus execution time by moving nodes into section S3. In turn it processes section S3, which may now contain newly moved code. To perform greedy code motion, we have adapted a technique from the approach to Trace Scheduling in [12], and we use it as a component of the code scheduling algorithm. In our approach, nodes lying on paths that exceed their section's duration constraint are considered for code motion. We distinguish such paths as critical traces. Formally, a critical trace p of section S is de ned as a path entry(S) ! p exit(S) such that wt(p) > DUR(S). The reason we use the trace-based approach is straightforward: op- timizing to avoid hard real-time exceptions demands scheduling only the critical traces, and no others. Figure 5.4 sketches the algorithm. Note that T max1 is recomputed afterscheduling S4 and before 39 scheduling S3. This is mandatory, since  S4 may be changed during the scheduling. Also observe that the code of S3 is moved into S1, while that of S4 is moved into S3. We disallow code from moving into S2 because it could potentially change the value of  S2 , which would in turn invalidate our assumptions about DUR(S4). In order to complete the procedure in a single pass, we assume that  S2 remains constant. In reality this restriction does not seriously limit the approach: from our experience, events in the RB typically lie in straight-line code (and thus S2 contains a single instruction, as in our example). The top-level algorithm calls subroutine \Schedule Section," which then schedules the over- loaded section at hand. Note that when code is scheduled from S3 into S1, the variables de ned in S2 are passed to the subroutine, which ensures the dependences from S2 to S3 are maintained. In the following section we discuss this subroutine, and the strategies it uses to solve the scheduling problem. 5.3.2 Subroutine Schedule Section(S,D,DUR(S),V bar ) For the ow graphs of source section S and destination section D, the critical trace scheduling problem is to construct new ow graphs for S and D such that: (1) The observable nodes of S remain in S and keep their relative order on all paths in S. (2) wt(p)  DUR(S) holds for all traces p in S. (3) Execution ordering established by code's original data dependences is preserved. When code is scheduled from S3 into S1 the parameter V bar { containing the variables de ned in S2 { is required to maintain property (3). As we have stated, the trace-based approach is attractive precisely because it allows us to concentrate on paths which violate the duration constraints. However, a direct application of Trace Scheduling induces a severe liability { extra code must be inserted to preserve the program's semantic integrity. In the parlance of instruction-scheduling, this is typically called bookkeeping code. Consider the GSA program in Figure 5.5(B) to see why Trace Scheduling requires bookkeeping. Since the instruction \z1=F(x)" is free of a data dependence on the variable\y," it may be eligible to be moved into S3. (This transformation { which we develop in the sequel { is called speculative, since it breaks a control dependence.) As shown in Figure 5.5(C), moving the instruction requires no additional code to maintain the program's semantics; GSA's naming conventions maintain the correctness. 40 (A) Code in TCEL (B) Code in GSA (C) Our Approach (D) Bookkeeping Approach do input(P, &x); start after t 1 nish within t 2 f input(Q, &y); c1 = p(y); if (c1) f a = E(y); z = F(x); g else z = G(y); c2 = q(y); if (c2) r = H(z); else r = K(z); s = I(r); . . . g . . . S2: f input(P, &x); g S3: f /* Null */ g S4: (S4.start :::, S4. nish :::) f input(Q, &y); c1 = p(y); if (c1) f a1 = E(y); z1 = F(x); g else z2 = G(y); a2 = (c1, a1, a0); z3 = (c1, z1, z2); c2 = q(y); if (c2) r1 = H(z3); else r2 = K(z3); r3 = (c2, r1, r2); s = I(r3); . . . g . . . S2:f input(P, &x); g S3:f z1=F(x); g S4:(S4.start :::, S4. nish :::) f input(Q, &y); c1 = p(y); if (c1) a1 = E(y); else z2 = G(y); a2 = (c1, a1, a0); z3 = (c1, z1, z2); c2 = q(y); if (c2) r1 = H(z3); else r2 = K(z3); r3 = (c2, r1, r2); s = I(r3); . . . g . . . S2:f input(P, &x); g S3:f z1 = F(x); r1 = H(z1); r2 = K(z1); g S4:(S4.start :::, S4. nish :::) f input(Q, &y); c1 = p(y); if (c1) f a1 = E(y); c3 = q(y); r3 = (c3, r1, r2); g else f z2 = G(y); c4 = q(y); if (c4) r4 = H(z2); else r5 = K(z2); r6 = (c4, r4, r5); g a2 = (c1, a1, a0); z3 = (c1, z1, z2); r7 = (c1, r3, r6); s = I(r7); . . . g Figure 5.5: (A) Source Code in TCEL, (B) Corresponding Intermediate Code in GSA Form, (C) In- termediate Code after Bookkeeping-Free Transformations, and (D) Intermediate Code after Trans- formations with Bookkeeping 41 However a more aggressive policy could be carried out, which is shown in Figure 5.5(D). Note that additional code may be moved without breaking data dependences; even the variable r may be split into movable parts (i.e., r1 and r2) and the parts that depend on y (i.e., r3 and r5). However, the price we pay is the additional bookkeeping code required to maintain correctness. One obvious problem with bookkeeping is that it induces a signi cant amount of extra code { indeed, if carried to extremes, the transformations in Figure 5.5(D) may result in an exponential blow-up. And in our problem context bookkeeping may have an additional, \fatal" e ect: Scheduling a critical trace may insert bookkeeping code on other, non-critical traces, and thereby increase their execution times. Hence a non-critical trace may become critical. To avoid this drawback of Trace Scheduling, we use the type of transformation depicted in Figure 5.5(C), and we use it as aggressively as possible. The strategy involves repetitively applying the following three steps: (i) nding a critical trace p; (ii) identifying a node n which can be moved into the destination section D; and (iii) moving n into D, along with n's ancestor nodes required to maintain the program's semantics. Since our objective is to keep the amount of new code to a minimum, step (iii) translates into the following rules for moving n into D: (1) Node n's data dependence predecessors are moved along with n; i.e., the nodes on which n is transitively data dependent. (2) The control-dependence predecessors (i.e., the if-then-else's guarding n) are treated as fol- lows: (a) If possible they are copied into D, so that they still guard the execution of n. (b) Otherwise (as in Figure 5.5(C)), n will now be unguarded in its destination section D. The end result of code scheduling appears as if large-grained control structures were rearranged, and hence we name the strategy structural code motion. Yet code scheduling is still trace-based, since it is driven by worst-case paths. Consider a node n to be moved into the destination section. When all of n's dependence predecessors (both dataand control)aremoved(orcopied) along with n, the new execution ordering is guaranteed to maintain the program's original semantics. But what are the rami cations of case 2(b) above, i.e., where control dependences are broken? Consider Figure 5.6(A), where we assume that the condition variable \c" is dependent on an input event. Examining the source code, assume that we wish to move the node  x=a+3 above  if c . 42 if c x=a+3 x=a+2 x3= (c, x1,x2) x3= (c, x1,x2) if c if c x2=a+3 x2=a+3 x1=a+2 x1=a+2 (A) (B) (C) Figure 5.6: Speculative Code Motion: (A) Original Code, (B) Corresponding GSA Code and (C) Speculatively Transformed Code The moved instruction is executed regardless of the control-predicate's outcome; hence the name \speculative transformation." Carrying out speculative transformations raises three critical issues: variable naming, execution time and safety. Naming: Consider what would happen if the transformation shown in Figure 5.6(C) were per- formedatthe sourcelevel. Since x would alwaysend up de ned as a+3, one branch oftheconditional would result in an incorrect state. Fortunately, the GSA form of the program ensures that multiply de ned source variables { and their corresponding -functions { maintain the original semantics, regardless of where assignments are moved. Examining Figure 5.6(C), we see that x1 and x2 are de ned sequentially. By GSA's naming conventions, the node   x3= (c,x1,x2) ensures that x3 always carries the assignment corresponding to the original source variable x. Timing: Figure 5.6 shows how speculative transformations can easily increase the execution time of the destination section D. This is not necessarily harmful, since D may in fact possess sucient slack for both instructions to execute. (Indeed, D may contain an explicit delay.) But if D itself exceeds its own duration constraint, excessive speculative transformations could make matters worse. Thus we take the following approach: the algorithm performs speculative code motion only when feasibility cannot be achieved with the non-speculative variety. 43 Safety: Perhaps the most critical issue is the correctness of the transformed program. After all, the source code is written by a human programmer. When an instruction appears within the body of a conditional (but is free of a transitive dependence on it), one should still assume that the programmer had a good reason for putting it there. Often the reason stems from a personal coding style, or perhaps for readability. Also, splitting variable de nitions in the style of GSA is a rather unnatural practice at the source level. Referring back to Figure 5.5(C), we note that the \eager" execution of \F" should be safe if: (1) it contains no observable events, (2) it induces no global side-e ects, and (3) it does not cause an exception. We can assume that (1)-(2) hold { otherwise \F" would not have been moved in the rst place. However, verifying property (3) may be dicult, since there may be an invariant relationship between \p" and \F" that only the programmer understands. While this seems to argue against speculative transformation, recall that our objective is to assist programmers in tuning faulty code. And production real-time programmers will nd this type of code reordering sadly familiar, since it is usually carried out by hand, and often under the pressure of an approaching release deadline. Our technique can help in this e ort, since it helps automate this process by (1) identifying the \good target" instructions to move, (2) by transferring them to their \correct"places, and (3)by analyzing the results. Nonetheless, we do believe that this should be aninteractive process (perhaps driven by a graphical front-end), in which the programmer visually checks each transformation. 5 For the sake of brevity, however, we present the algorithm in a fully automatic form. Thus we assume that any node n that can be speculatively executed is \pre-checked" and is denoted by the condition spec(n). Unconditional and Speculative Movability. The preceding discussion leads to three classes of instructions: those that can be unconditionally moved, those that can be moved to execute speculatively, and thosewhich cannot be moved atall. The following de nitions distinguish between these cases. De nition 5.1 (Unconditional Movability) Mu(S;V bar ) is the set of nodes in S that do not trigger events, and do not use any variables in V bar ; i.e. Mu(S;V bar ) = fm 2 Sjm is not an event ^ Use(m)\V bar = ;g 5 We discuss this issue in Chapter 7. 44 Then Umove(n) denotes that we can unconditionally schedule node n from S into D: Umove(n)  DC(n;S)  Mu(S;V bar ) That is, all of n's data and control dependence ancestors are also unconditionally movable { and when n is moved, they will be moved (or copied) as well. De nition 5.2 (Speculative Movability) Additional considerations comeintoplay when anode is speculatively executed. Consider the set Ms(S;V bar ): Ms(S;V bar ) = fm 2 Mu(S;V bar )jspec(m)g If a node n is in Ms(S;V bar ) then (1) it uses no variables in V bar , (2) it triggers no events, and (3) the programmer has checked that it does not cause a local exception. Then Smove(n) denotes that we can speculatively move node n: Smove(n)  DDC(n;S)  Ms(S;V bar ) That is, when Smove(n) holds true, all of n's data-dependent ancestors can be moved too, without mandating bookkeeping. The Algorithm. The code scheduling algorithm is presented in Figures 5.7 and 5.8. It is com- posed of three stages: pre-processing, marking/deleting and post-processing. In the pre-processing stage, S's ow graph is traversed in topological order, during which the conditions Umove and Smove are evaluated. A topological traversal ensures that whenever a node n is visited, all ances- tor nodes in DC(n;S) have already been processed; hence a single traversal is sucient to evaluate these conditions. Then a \clone" S 0 of S is created, which is used to hold the part of the ow graph to be transferred to D. Next the algorithm searches for a critical trace, and if one exists it invokes subroutine \Sched." Sched makes use of array \mark," each of whose entries corresponds to a node in S 0 . Whenever \mark[m;S 0 ] = true," it means that node m will be \moved" into the destination section. Sched examines the critical trace in topological order, looking for node n such that Umove(n) is true. If such a node n exists, then closure DC(n;S) is generated; its non-predicate members are deleted from S, while all corresponding nodes in S 0 are marked. If no unconditionally movable instruction exists, then a transformation of the speculative variety is attempted. And if no movable node is present, the program is forced to exit. 45 subroutine Schedule Section(S, D, d, V bar ) input: source section S, destination section D, duration constraint d, and a set V bar of variables de ned in sections lying between S and D. begin /* Pre-processing Stage */ foreach node n in S in topological order do evaluate Umove(n) and Smove(n); /* Marking/Deleting Stage */ make a copy S 0 of S; compute p in S such that wt(p) = maxfwt(p 0 ) j entry(S) ! p 0 exit(S)g; while (wt(p) > d) call Sched(p;S;S 0 ); recompute p in S such that wt(p) = maxfwt(p 0 ) j entry(S) ! p 0 exit(S)g; end /* Post-processing Stage */ delete all unmarked nodes from S 0 ; delete all predicate nodes guarding null code from S; append S 0 to end of D; end Figure 5.7: The Section Scheduling Algorithm (Schedule Section) At the end, if all critical traces were scheduled, the algorithm proceeds to a post-processing stage. If speculative transformation was carried out then S 0 will, by de nition, contain branching structures with empty predicate nodes. In this case, the nodes on the di erent branches of the predicate node are merged into a single block. Finally, S 0 is attached to the end of the destination section D. Cleaning up, the algorithm deletes control-predicates which guard empty nodes in section S { i.e., the \if" nodes whose corresponding bodies and -functions were completely transferred to S 0 . Example Revisited. We return to our original ight controller example from Figure 5.3, and subject it to the code scheduling algorithm. The end-result appears in Figure 5.9. In scheduling S4, \Sched" unconditionally moves the function call compRelAtt, as well as the other nodes in its dependence closure. This reduces wt(S4)by0.25ms,which nowstandsat2.57ms. Since DUR(S4) = 46 subroutine Sched(p;S;S 0 ) input: critical trace p, source section S, and a clone S 0 of S. begin if there is some n 2 p such that Umove(n) holds then begin Select rst such n 2 p such that Umove(n) holds; foreach node m 2 DC(n;S) do mark[m;S 0 ] := true; if m is not a control-predicate then Delete m from S; end end else if there is some n 2 p such that Smove(n) holds then begin Select rst such n 2 p such that Smove(n) holds; foreach node m 2 DDC(n;S) do mark[m;S 0 ] := true; if m is not a control-predicate then Delete m from S; end end else exit(\Unable to synthesize."); end Figure 5.8: The Section Scheduling Algorithm (Sched) 2:25ms), further reductions are made by entering the speculative transformation phase; this results in moving one conditional branch and the function call safeDtheta beyond the immovable control- predicate if (c3). After the transformation, the implementation satis es the necessary condition for consistency, since the body of S4 requires 2.14ms in the worst-case. Since wt(S3) = 0:68ms is still less than DUR(S3), the code scheduling successfully terminates without further scheduling S3. In addition to such an instant bene t, the transformation converts possibly wasteful delay into useful computation time, since the new code in S3 can be scheduled within the delay interval between S2 and S4. 47 S6: (S6.start[p]p20ms, S6. nish[p]p20ms+5ms) f S1: f gx1 = (gx0, gx3); gy1 = (gy0, gy3); i1 = (i0, i3); input(GPS, &x1, &y1); [.1ms] input(NAV, &theta1); [.1ms] input(IMU, &roll1); [.1ms] g S2: f output(Cntrl out, REQ VEL); [.1ms] g S3: f c1 = GOAL[i1].passed; if (c1) f gx2 = GOAL[i1].x; gy2 = GOAL[i1].y; i2 = (i1+1)% NCOORD; g gx3 = (c1, gx2, gx1); gy3 = (c1, gy2, gy1); i3 = (c1, i2, i1); rtheta1 = compRelAtt(theta1, x1, y1, gx3, gy3); [.25ms] c2 = jrtheta1j < EPS; if (c2) /* null */ else dtheta3 = safeDtheta(rtheta1, roll1); [.43ms] g S4: (S4.startS2. nish+0.75ms, S4. nishS2. nish+3.0ms) f input(Cntrl in, &vel1); [.1ms] if (c2) dtheta1 = 0.0; elsef c3 = vel1 < VHIGH; if (c3) dtheta2 = rtheta1; else /* null */ dtheta4 = (c3, dtheta2, dtheta3); g dtheta5 = (c2, dtheta1, dtheta4); w ap = compFlapw(roll1, vel1, dtheta5); [.95ms] throttle1 = compThrot(roll1, vel1, dtheta5); [.89ms] output(THROT, throttle1); [.1ms] output(FLAP Cntrl, w ap); [.1ms] g S5: f / null / g g Figure 5.9: Flight Controller Program: After Code Scheduling 48 5.4 Summary In this chapter we have addressed the problem of feasible code synthesis. We approach the problem via a two-step process, in that (1) an event-based TCEL program is decomposed into code-based tasks; and (2) unobservable code is scheduled to avoid feasibility faults. We attackthe intractability of the code scheduling problem with a greedy approximation approach, based on Trace Scheduling. 49 Chapter 6 Transformation 2: Real-Time Task Slicing After tasks have been successfully synthesized into feasible code, the next challenge lies in achieving schedulability of the entire application. Consider Figure 6.1. The two components at the bottom of the diagram correspond to the major elements involved in the tuning problem. \Schedulability" is, by de nition, a key metric that drives our compiler-based task transformation tools. Thus close interactions between the real-time scheduler and the compiler tool are necessary. But this presents two competing demands: (1) it is desirable to maintain the traditional separation of concerns between compilation and scheduling; and (2) schedulability depends on complex task interactions that are often exposed at runtime. In this chapter we present a compiler transformation method which satis es these demands. We name the technique real-time task slicing. This transformation is speci cally intended for periodic tasks in the domain of periodic, discrete control systems. Ideally real-time task slicing should be complementary to feasible code synthesis, since both address two distinct system-tuning problems. Each must be solved before we get a correct real- time system. In practice, however, it is possible that their successive applications to the same task will produce a con icting result. If we were to truly aim for optimal performance, feasible code synthesis would have to be geared for the speci c schedulers. However we have already shown that the synthesis problem is, by itself, highly complex. This would add an additional layer of complexity. In this thesis we treatthetwophases asorthogonal. In doing so, we restrictourselves to applying the real-time task slicing method only to simple \every"constructs with no nested \do" constructs within their bodies. Fortunately, many applications from discrete control domains possess this type of periodicity requirement. To take a closer look at the problem of schedulability tuning, we open up the process diagram of 50 High-Level Real-Time Programs Timing Constraints on Tasks compilation synthesis Feasibility Tests (Intra-Task Consistency) Requirements Speci cation Decomposed Tasks, Schedulability Tests (Inter-Task Consistency) Compiler-based Transformation Methods Feasible Code Synthesis (Chapter 5) failure failure failure success re nement re nement Real-Time Task Slicing (Chapter 6) System-Tuning, Re-implementation failure Figure 6.1: Software Development Process: Revisited for Real-Time Task Slicing Figure 6.1, and re-draw the related components in Figure 6.2. The generalized task transformation method consists of three interrelated components: (1) schedulability tests, (2) a priority ordering algorithm, and (3) a real-time task slicer. In this chapter we explain these components, as well as the interactions. First, in Section 6.1 we discuss the characteristics of control domain software and xed-priority preemptive scheduling algorithms. We also present the key idea of the generalized slicing method, and discuss the safety of the transformation method. We devote the remaining sections to the technical discussion of these three components. 6.1 Background In this section we study twoessential factorsrelated tothe real-time taskslicing method, namely the characteristics of discrete control software and xed priority preemptive scheduling. Since discrete 51 Task Set, Priority Ordering Algorithm (Schedulability Tests) Tasks To Be Sliced Schedulable Priority Order Task Slicer success failure Period Constraints Figure 6.2: Generalized Slicing Method for Schedulability Tuning control software possesses many representative properties that can be found in other applications (e.g., multimedia, vision, etc.), our approach can be easily adopted to other types of real-time systems. 6.1.1 Characterization of Discrete Control Software Manydiscrete controlalgorithms possess computations that t a xed-rate algorithm paradigm [28], i.e., control-loops which execute repetitively with xed periods. During each period, the physical world measurement data is sampled, and then actuator commands are computed. Meanwhile, a set of states is updated based on the current state and the sampled data. The dynamic behavior of discrete control systems can often be expressed by the following equations. Output k = g(State k ;Input k ) State k+1 = h(State k ;Input k ) In these equations, Input k , State k , and Output k respectively represent the input, current state, and output of the k th period, while g is an output generation function and h is a state evolution function. Since control equations are thought of as simultaneous relationships (and not as a computation procedure), there are usually many valid computational orderings. The usual practice is to choose a single ordering, and then to code it up as a cyclic control-loop whose k th iteration is rendered 52 Task at the k th period Input k Output k State k State k+1 Figure 6.3: Task Instance at the k th Period in Figure 6.3. The actual loop structure is often driven by one's personal programming style, or perhaps the availability of generic code modules. But regardless of the choice (unless the underlying control laws are stateless), g and h mandate key precedence constraints, denoted by \": Input k  Output k State k  Output k Input k  State k+1 State k  State k+1 The typical way to enforce these constraints is to use the \code-based" semantics, and ensure that each iteration of the control-loop completes by the end of its period. This means that the k + 1 st iteration starts only after the k th iteration ends. Figure 6.4 illustrates the e ect. kBnZr1 st k th period k +1 st k + 2 nd Input k Output k Figure 6.4: Dynamic Behavior of Periodically Implemented Control-Loop 53 6.1.2 Fixed-Priority Preemptive Scheduling In any nontrivial system, there are usually many such tasks that must share the CPU and other resources. Thus they must be scheduled in a way that allows each of them to adhere to their timing constraints. Fixed-priority, preemptive scheduling algorithms are well-suited for control domain applications, not only because they possess periodic behavior, but also because ecient schedulability tests can be applied. Rate-monotonic scheduling, originally developed by Lui and Layland, is the rst well-known algorithm of this kind. In their seminal paper [34] they proposed a priority assignment algorithm, in which a task with the shorter period is assigned the higher priority (hence the name rate-monotonic scheduling (RMS)). They also showed that such priority assignment is optimal in a sense that whenever it fails to nd a feasible priority ordering, neither can any other static priority algorithm. However, their algorithm is applicable only to the periodic task model where tasks have xed periods, deadlines are equal to periods, and tasks are totally independent of each other. Recent research has made signi cant enhancements to this model which relaxes the original restrictions. In [31] Leung and Merrill showed that deadline monotonic priority assignment is also optimal where deadlines are shorter than periods. In [45] Sha et al. presented two protocols which enable tasks to interact via shared resources, while still guaranteeing the tasks' deadlines. Most recently, a group of researchers at the University of York developed a set of analytical techniques which can provide schedulability tests for broad classes of tasks, including those whose deadlines are greater than their periods. [5, 51, 50]. In this dissertation work we choose the York model, mainly because of its generality. The following notation is used in the remainder of this chapter.  BnZr = f 1 ; 2 ;:::; n g denotes a set of n tasks to be scheduled.  T i denotes the period of task  i .  D i denotes the deadline of task  i .  c i denotes the worst-case execution time of  i . The xed priority scheduling theory of York is based on response time analysis. The response time of task  i is de ned as the time interval between when a request for  i arrives, and when  i nishes its execution servicing the request. If we can con rm that the maximum response time of  i is no greater than D i , we can guarantee that  i will meet its deadline even in the worst-case. We rst consider the simpler case where D i  T i . 54 Let R i denote the maximum response time of task  i . Then R i is computed as shown below. R i = c i + X  j 2hp(i) d R i T j ec j (Eq6.1) where hp(i) is the set of higher priority tasks than  i . Observe that R i is composed of two com- ponents, namely execution time c i and interference. The interference, the second term of Eq 6.1, is the amount of time during which  i is preempted by the higher priority tasks in hp(i) since the arrival of its request. As Eq 6.1 is a recurrence equation on R i , an iterative algorithm computes R i by initially assigning it c i , and then getting a new value until it converges on a xpoint. On the other hand, where there exists a task  i such that D i > T i , Eq 6.1 is not sucient. This is because uncompleted iterations of  i can now interfere with the current one. In this case the following general equation is used instead as discussed in [51]. R i = max q=0;1;2;::: fr i;q BnZrq T i g (Eq6.2) where r i;q = (q +1)c i + X  j 2hp(i) d r i;q T j ec j Consider the case of three periodic tasks, where the source of task  2 is given in Figure 6.5. Task Execution Time Period Deadline  1 c 1 = 400 T 1 = 1000 D 1 = 1000  2 c 2 = 400 T 2 = 1600 D 2 = 1600  3 c 3 = 570 T 3 = 2500 D 3 = 2500 Since the periods are equal to the deadlines, rate-monotonic priority assignment is a natural choice. In the above table the row order corresponds to the priority order; i.e.,  1 is assigned the highest priority. We can carry out the response time analysis for these tasks using Eq 6.1 as follows: For  1 : R 1 = 400 < D 1 = 1000 For  2 : R 2 = 400+ d800=1000e400= 800 < D 2 = 1600 For  3 : R 3 = 570+ d2570=1000e400+d2570=1600e400= 2570 > D 3 = 2500 We observe that the two high priority tasks  1 and  2 are schedulable, while  3 is not. (R 3 is greater than D 3 when  3 runs at priority 3.) In an e ort to make the task set schedulable, we might try some hacking: e.g., by promoting  3 to the highest priority level. Although this makes  3 schedulable, it does not achieve the desired schedulability, since  2 will now be unschedulable. 55 every 16ms f L1: input(Sensor, &data); L2: if (!null(data)) f L3: t1 = F1(state); L4: t2 = F2(state); L5: t3 = F3(data); L6: t4 = F4(data); L7: state = F5(t1, t2, t3); L8: cmd = F6(t1, t3, t4); L9: output(Actuator, cmd); g L10: status dump(\log le", cmd, state); g Figure 6.5: TCEL Program for Task  2 Indeed, since the rate-monotonic assignment is optimal, no xed priority assignment will suce here { unless the code-based semantics is abandoned! Thereason theabove taskset is unschedulable is obvious: the computation demands of 3 exceed the available time. The simulated time line given in Figure 6.6 pictorially illustrates an unschedu- lable instance of  3 . Thus, to make the task set schedulable, we must be able to postpone some computation out of the overloaded time frame. This transformation helps achieve schedulability, since we can utilize \idle" time intervals outside the congested period. 6.1.3 Scheduling with Compiler Transformations When the taskis found unschedulable, current engineering practice forces programmerstomanually pick some critical tasks from the task set, and then to hand-optimize them. Such system-tuning is often repeated many times, until the entire task set nally gets schedulable. We aim to ease this process by providing a semi-automatic task transformation method, real-time task slicing. 56 Deadline Miss preempted preempted 1000 2000  1  2  2  1  1 T 3 = 2500  3  3  3 Figure 6.6: Simulated Time Line for the Example Task Set Real-time task slicing is based on two simple observations; that is, (1) the current practice is to assume that an entire control-loop must nish by its deadline, but (2) the high-level TCEL semantics mandates only the observable event operations be nished within the task's time frame. The key idea of the task slicing method is as follows. We decompose a task  into two subtasks: one containing all observable event operations, and the other all remaining local operations. We call the former the IO-hander and the latter the state-update component, and we denote them by  IO and  State , respectively. Figure 6.7 demonstrates the decomposition of the control-loop task originally shown in Figure 6.3. After the decomposition, we ensure thatthe IO-handler subtask will execute within its allowable time frame. On the other hand, we may postpone the execution of the state-update subtask under the worst-case task phasing. Finally, we maintain precedence constraints between  IO and  State , originally induced by the task's data and control dependences. The task decomposition itself is carried out by the program slicing technique. As we stressed above, we put the greatest emphasis on preserving the timing behavior of observable events and the precedence constraints derived in Subsection 6.1.1. Before presenting the systematic slicing procedure we describe our approach using our example task set. Assume that slicing  2 yields the greatest bene t in schedulability. We decompose  2 's code into IO-handling  IO 2 and state-update  State 2 , as shown in Figure 6.8. Their computation times are separately calculated as follows: c IO 2 = 2:2ms; c State 2 = 1:9ms Note that the sum of the two execution times is slightly greater than the original execution time 57 IO Handler State-Update IO Handler Input k Output k State k State k+1 Figure 6.7: Decomposed Task at the k th Period / Subtask  IO 2 / input(Sensor, &data); c = !null(data); if (c) f t1 = F1(state); t3 = F3(data); t4 = F4(data); cmd = F6(t1, t3, t4); output(Actuator, cmd); g / Subtask  State 2 / if (c) f t2 = F2(state); state = F5(t1, t2, t3); g status dump(\log le", cmd, state); Figure 6.8: Two Decomposed Subtasks of Task  2 58 4.0ms of  2 . To enforce that the precedence constraints between the subtasks, we rewrite them for  2 : (1) the i th instance of  IO 2 must nish before the i th instance of  State 2 ; and (2) the k th instance of  State 2 must nish before the k +1 st instance of  IO 2 starts. Thus, the scheduling and dispatching subsystems must guarantee the following execution behavior of  2 .  IO 2 !  State 2 !  IO 2 !  State 2 ! ::: The net result of the transformation of  2 is as follows: within the worst-case time interval of 2T 2 , we may only see one whole instance of  State 2 . Remainder of This Chapter. Throughout the remainder of this chapter we discuss the gen- eralized slicing method for schedulability tuning, explaining the details of the three components. In Section 6.2 we present the program slicing algorithm which is the crux of our transformation. In Section 6.3 we discuss new schedulability tests, and then show how they are used, by way of a motivating example. In Section 6.4 we introduce a priority ordering algorithm that is capable of making feasible slicing decisions, as well as nding a feasible priority order for a given task set. To demonstrate the e ectiveness of this algorithm we show the result of an experiment we conducted on a task set drawn from an avionics platform. 6.2 Automatic Task Decomposition by Program Slicing The idea behind the task decomposition is, as discussed in Subsection 6.1.3, to accept a task and then generate its two code components: one that corresponds to a subthread triggering all ob- servable events, and the other that corresponds to a subthread computing the next-state update. Straightforward as it may look, the decomposition can be a very complex compiler problem. Many factors make this the case, among which are intertwined threads of control, nested control struc- tures, complex data dependences between statements, procedure calls in the task code, etc. To cope with these problems in a systematic manner, we harness a novel application of program slicing [41, 52, 53]. For the sake of brevity, we assume the following:  Function calls are inlined.  Loops are unrolled.  The intermediate code of programs is translated into static single assignment form [8, 21, 16]. 59 The rst assumption allows us to avoid interprocedural slicing [22]. The next two assumptions simplify problems induced by spurious data dependences such as anti-dependences and output de- pendences [4]. However, we can partially alleviate the restrictions, relying on dependence breaking transformations, such as scalar expansion [4]. Static single assignment is one such transformation. However, such methods will unfortunately lead to overly-conservative analysis { a limiting factor in the algorithm. 6.2.1 The Program Slicing Algorithm Informally a slice of program P with respect to program point p and expression e consists of P's statements and control predicates that may a ect the value of e at point p. We call a pair hp;ei a slicing criterion, and denote its associated slice by P=hp;ei. The result is that we can execute the slice P=hp;ei to obtain the value of e at location p. Recall our periodic controller task  2 of Figure 6.5. The following fragment is the slice  2 =hL9;cmdi. L1: input(Sensor, &data); L2: if (!null(data)) f L3: t1 = F1(state); L5: t3 = F3(data); L6: t4 = F4(data); L8: cmd = F6(t1, t3, t4); L9: output(Actuator, cmd); g Statements L1, L3, L5, L6 and L8 are included in the slice, because variable \cmd" depends on their computations (this is called data dependence). Also, statement L9 is included because it generates an observable event. 1 Finally, the predicate on line L2 is included, because the execution of statements L3, L5, L6, L8 and L9 (hence the value of \cmd") depends on the boolean outcome of the predicate (this is called control dependence). Thus the computation of slices is based on data dependence as well as control dependence. In this regard, using a program dependence graph [11, 22, 41] is ideal, since it represents both types of dependences in a single graph. (See Section 4.2 for de nition.) The program dependence graph PDG of our controller task  2 is shown in Figure 6.9. 1 We intentionally include L9, as will be discussed in Algorithm 6.1. 60 if(!null(data)) Entry control dependence data dependence t1=F1(state) t2=F2(state) t3=F3(data) t4=F4(data) status_log(?, cmd, state)input(?, &data) output(?, cmd)state =F5(t1,t2,t3) cmd =F6(t1,t3,t4) Figure 6.9: Program Dependence Graph The slice of program P with respect to program point p and expression e (i.e., P=hp;ei) can be obtained through a traversal of P's program dependence graph. We can extend the de nition of a program slice for a set of slicing criteria C in a way that P=C = S hp;ei2C P=hp;ei. A simple algorithm to compute the slice is given below. In the algorithm the program point p corresponds to a vertex of PDG. Algorithm 6.1 Computes the slice P=hp;ei: Step 1 Compute reaching de nitions RD(p;e). Step 2 Compute the slice by a backward traversal of PDG such that P=hp;ei = fm j 9n 2 RD(p;e) : m )  ng[fpg: Figure 6.10 shows the graph that results from taking a slice of the program dependence graph in Figure 6.9 with respect to criterion hL9;cmdi. One of the essential points in using our task decomposition algorithm is providing right slicing criteria for the algorithm, so that the computed I/O slice of a task \covers" all the observable behaviors of the original task. Criteria selection can be automated by means of the observable event speci cation, or it can be manually performed by way of graphical user interface. Let C IO () be a set of slicing criteria for I/O slice of task . Then the task decomposition 61 if(!null(data)) Entry control dependence data dependence t1=F1(state) t3=F3(data) t4=F4(data) input(?, &data) output(?, cmd)cmd =F6(t1,t3,t4) Figure 6.10: Slice with respect to Criterion hL9;cmdi algorithm is given below: Algorithm 6.2 Decompose task  into  IO and  State : Step 1 Compute the slice of  with respect to C IO () using Algorithm 6.1. The generated slice =C IO () becomes  IO . Step 2 Delete from  all repeated statements of  IO except for the conditional statements. The remaining code becomes  State . Figure 6.8 shows the two subtasks  IO 2 and  State 2 of  2 computed by Algorithm 6.2 with slicing criteria C IO () = fhL1;datai;hL9;cmdig. 6.2.2 Assigning Times to Subtasks Program slicing may well increase worst-case execution times of tasks for a number of reasons: (1) control structures are replicated and will be executed twice; (2) splitting a basic block may increase the number of register load and store operations [2]; and (3) worst-case execution time paths of the two resultant subtasks may be incorrectly derived. We take a close look at the last factor, since it tends to take up the greatest portion of the increase, though it is not a genuine cause of the increase, but an artifact of overly-conservative timing prediction. 62 After a conditional of a task is sliced and then spliced, the worst-case execution time of the new task may be increased unless we carefully correlate the duplicated condition predicates. Figure 6.11 pictorially depicts this case. The original task  consists of one conditional, one branch of which is IO-generating code \IO" and the other is state-update code \ST." The worst-case execution time of  is: wt() = wt(c)+maxfwt(IO);wt(ST)g: In Figure 6.11  is sliced into two subtasks  IO and  State . Their worst-case execution times are also given below. wt( IO ) = wt(c)+wt(IO) wt( State ) = wt(c)+wt(ST): Consequently, the worst case execution time of the transformed task  0 (  IO ; State ) may be measured as: wt( 0 ) = 2wt(c)+wt(IO)+wt(ST); which is much larger than wt(). ST c c ST c IO IO   IO  State Figure 6.11: Slicing a Conditional However, tighter worst-case execution time can be easily obtained by correlating the conditional predicate of the subtask  IO with that of the subtask  State . For example, in Figure 6.11 if \IO" is executed in subtask  IO , then we know that the empty left branch will be executed in subtask  State . Thus wt( 0 ) can be re ned as follows: wt( 0 ) = 2wt(c)+maxfwt(IO);wt(ST)g 63 For the given two subtasks of  i , we carry out the following simple steps which are based on the notion of predicate correlation to compute tight worst-case execution times of subtasks  IO i and  State i . Step 1 Calculate c 0 i by running a timing tool with the code of  0 i . Step 2 Calculate c IO i by running a timing tool with the code of  IO i . Step 3 Calculate c State i such that c State i = c 0 i BnZrc IO i . This will serve as a good rough estimate for the transformed task code. Then we can use a pro ler to account for the two other factors that incur timing overhead. 6.3 Scheduling Alternatives and Their Analyses We now sketch a simple, high-level procedure that uses slicing to transform an unschedulable task set into a schedulable one. The input is set BnZr of n tasks, processed in order from  n to  1 . If the task set is found unschedulable, the slicer algorithm is invoked to decompose  n into its two constituent threads, which then replace  n in BnZr. If the updated set is still deemed unschedulable, the procedure goes to work on  nBnZr1 , and so on. In [15] we present a detailed alternative to this approach, in which tasks are processed from  1 to  n ; i.e., the rst task found unschedulable is selected for slicing. One can imagine other alternatives as well. However, any such scheme is critically dependent on two elements: (1) A scheduling policy that can exploit our task model; i.e. while the  State threads can miss their original deadlines, the precedence constraints between instances  IO and  State must be maintained. (2) An oine schedulability analyzer for the given scheduling policy. In this section we present an analytic approach that systematically addresses (1) and (2). A static priority approach: In [15] we present a RMS based method which enjoys a simple priority assignment rule and analysis test for its dual-priority scheme. This is certainly one of its strengths. Its principal weakness is that the online component lacks the simplicity found in pure, static priority scheduling due to its semi-dynamic dual priority assignment. 64 Thus the following question arises: when can a set of transformed TCEL tasks be scheduled under a fully preemptive, static priority scheme? Burns [6] provides an answer to this question after identifying a simple, but essential fact about the TCEL task model. That is, whenever we let a task's deadline be greater than its period, this represents a relaxation of the classical rate- monotonic restrictions put forth in [34]. Thus the rate-monotonic priority assignment may not be the optimal one. Given set BnZr 0 of transformed TCEL tasks  0 1 =  IO 1 ; State 1  0 2 =  IO 2 ; State 2 . . .  0 n =  IO n ; State n it turns out the appropriate priority assignment is not only dependent on the deadlines (as in the pure deadline-monotonic model), but also on the respective execution times of each IO-handler and state-update component. In [6] Burns presents a search algorithm to generate the feasible static- priority order { or to detect when no such order exists. Thus the approach includes the following components. Online Scheduler: This is a simple, preemptive dispatching mechanism, in which priority \ties" are broken in favor of the task dispatched rst. Thus, for example, a task's current iteration will nish before the next one starts. Oine Analyzer: The analyzer is constructive, in that it produces a feasible priority assignment if one exists. If no such assignment exists, perhaps the programmer may have to go back to the system design step and reply on more aggressive system-tuning. For given task set BnZr = f 1 ; 2 ;:::; n g Burns' priority assignment algorithm accepts the pre- processed task set BnZr 0 = f 0 1 ; 0 2 ;:::; 0 n g as its input, where  0 i is a sliced version of  i . It then begins looking for a task that can run at the lowest priority (level n) 2 . After such a task, say,  0 k is found, the algorithm proceeds to search the new task set BnZr 0 BnZr f 0 k g for the second lowest priority task, and so on. There is an important fact that leads to the optimality of this algorithm: while a task is being tested for priority level p, all pBnZr 1 tasks whose priorities have not yet been assigned are assumed to run at a higher priority. In xed-priority, preemptive scheduling, since a lower priority task can never preempt the higher priority tasks, selections made for priority levels p or below will not a ect those above p. 2 For n tasks, n denotes the lowest priority level, and 1 the highest. 65 During this priority ordering, the schedulability test of  0 i (  IO i ; State i ) for priority p yields the following two conditions: (1) Whether  IO i can always run within time D i at priority p, and (2) Whether q consecutive iterations of  0 i can run within q T i +D i at priority p where q  1. Condition (1) is required by the TCEL's semantics; condition (2) accounts for the case where at least one iteration of  State i is delayed. The schedulability test boils down to a check to see if the maximum response time of  IO i is no greater than D i in either case. The maximum response time (denoted by R IO i ) of  IO i with respect to hp(i) is computed as below: r i;q = q(c IO i +c State i )+ c IO i + X  j 2hp(i) d r i;q T j ec j R IO i = max q=0;1;2;::: fr i;q BnZrq T i g (Eq6.3) We must subtract q  T i from r i;q to obtain the real response time, since r i;q is measured from the start of the q th period prior to the current period. Although q is denoted as an unbounded number in Eq 6.3, it can be trivially shown that there exists bounded response time R IO i as long as utilization of tasks in hp(i) and  i is less than 100% [51]. The value of q is bounded below by p such that r 0 i;p < pT i + D i where r 0 i;p = (p+1)(c IO i +c State i )+ X  j 2hp(i) d r 0 i;p T j ec j : The intuition behind this is the execution pattern of  0 i repeats after the entire execution of  0 i ts within its time frame. Now recall the unschedulable task set we showed in Subsection 6.1.2. Suppose that only  2 was sliced. This requires priority rearrangement among the tasks, since RMS is no longer optimal in the transformed task model. The result of new priority ordering is as follows:  3   1   0 2 In the next section we show how this ordering is obtained. But given that we have an ordering, we 66 can check it using Eq 6.3. For  3 : R 3 = 570 < D 3 = 2500 For  1 : R 1 = 400+ d970=2500e570= 970 < D 1 = 1000 For  0 2 : r 0 2;1 = 2410+ d2750=2500e570+d2750=1000e400= 2750 < T 2 +D 2 = 3200 r 2;0 = 220+ d1590=2500e570+d1590=1000e400= 1590 r 2;1 = 410+ 220+ d2400=2500e570+d2400=1000e400= 2400 R IO 2 = maxf1590;2400BnZr 1600g = 1590 < D 2 = 1600 As a result, the task set is shown to be schedulable under the new priority assignment. 6.4 Priority Ordering with Task Slicing As discussed in Section 6.3, Burns' priority assignment algorithm expects that all tasks in BnZr are sliced before they are submitted for priority assignment. However, it is typically not desirable to slice all tasks in the application due to execution time overhead incurred by task slicing. As an example, consider a task set whose utilization is 0.96. Suppose that task slicing uniformly increases the worst-case execution times of the tasks by 5%. If we naively slice all the tasks, this will result in utilization of 1.008 and render the task set permanently unschedulable. Moreover, since we view slicing as a means of tuning an application, it should selectively be applied to tasks which will realize the greatest bene t. To address this problem, we present an algorithm that not only nds a feasible task priority ordering, but also picks only a small subset of tasks to slice. For a given ordered list of tasks BnZr = [ 1 ; 2 ;:::; n ], we make the following de nitions.  sliced( i ) : a boolean variable denoting whether or not  i is sliced.  c 0 i = c IO i + c State i . We refer to a certain permutation BnZr 0 of BnZr as a con guration, i.e. BnZr 0 denotes a priority ordering 3 of the tasks in BnZr, and sliced( i ) is de ned for all  i 2 BnZr 0 . There are n! di erent priority orderings, and 3 The rst task in the list has the highest priority. 67 2 n possible slicing choices. Thus the algorithm's job is to choose a task con guration among 2 n n! distinct ones in an ecient manner. De nition 6.1 (Feasibility) For a given BnZr, a con guration BnZr 0 is said to be feasible i all tasks in BnZr 0 meet their deadlines under the priority ordering and slicing choice denoted by BnZr 0 . 6.4.1 Feasibility Test Since Burns' algorithm accepts a pre-sliced application, it can compute the exact amount of inter- ference from the higher priority tasks when it considers a task for a priority. On the other hand, our problem is to slice for schedulability. (Recall that for a task  i , c i 6= c IO i + c State i .) Thus it seems inevitable to search the entire solution space of size 2 n n! in order to nd a feasible task con guration. Fortunately, there are cases where we can make a slicing decision without exhaustively exploring the search space. We rely on the response time analysis summarized by equations Eq 6.2 and Eq 6.3 to nd these cases. To be speci c, we make use of the following schedulability test. Feasible(L; k )  if :sliced( k ) then max q=0;1;2::: fr k;q BnZrq T k g  D k else max q=0;1;2::: fr 0 k;q BnZrq T k g  D k where S = f i 2 L j sliced( i )g; r k;q = (q + 1)c k + P  j 2LBnZrS d r k;q T j ec j + P  j 2S d r k;q T j ec 0 j ;and r 0 k;q = q c 0 k + c IO k + P  j 2LBnZrS d r 0 k;q T j ec j + P  j 2S d r 0 k;q T j ec 0 j : \:sliced( k )^ Feasible(L; k )" denotes that the unsliced  k is schedulable with tasks in L running at higher priorities. Similarly, \sliced( k ) ^ Feasible(L; k )" means that  k , when being sliced, is schedulable with tasks in L. 6.4.2 The Algorithm We now present the priority ordering algorithm in Figure 6.12. We describe below variables we use in the algorithm.  BnZr = [ 1 ; 2 ;:::; n ] is the input task list which is initially ordered in nondecreasing order of the deadlines. Such a deadline monotonic ordering is desirable as a starting point, since most 68 tasks, except for a small number of tasks to be sliced will be consistent with the ordering.  The two parameters L 1 and L 2 of function Search collectively hold the list of tasks to be priority-ordered. In every invocation, function Search(L 1 ;L 2 ) returns either the priority- ordered list of the tasks or false, if it cannot nd any feasible ordering among them. We use operator \@" to denote list append operation. In every invocation, function Search attempts to assign the last task in L 1 ( in Figure 6.12) priority level jL 1 j+jL 2 j. The condition on line (1) denotes that the algorithm has already generated a complete task con guration of BnZr. The condition on line (2) means that the algorithm has checked all tasks in lists L 1 and L 2 for priority level jL 1 j+ jL 2 j, but it can assign none of them that priority. Thus false is returned. When the condition on line (3) holds, the algorithm checks if  is feasible with current high priority tasks. Before doing this, the algorithm attempts to nd a feasible priority ordering among the higher priority tasks (on line (4)). If it cannot do so,  is infeasible in the current priority level, and thus the algorithm tries other task for the current priority level, invoking tail recursion on line (12). On the other hand, if the higher priority tasks are schedulable, the algorithm checks if  is feasible. If so, it returns a new ordering L@[]. Otherwise, the algorithm slices  and sees if  0 is feasible. If  0 is deemed feasible, it returns L@[ 0 ]. 6.4.3 A Larger Example We constructed a task set consisting of eighteen periodic tasks, based on the avionics application described in [35, 51]. We added tighter deadlines to the original taskset, and modi ed the execution timesofsomeofthetasks. Asaresult, thetasksethadutilization of0.836,anditwasunschedulable. The resultant timing speci cation of our task set is given in Table 6.1 where the time unit is 1 microseconds. We make the following assumptions for the task set, which we have found representative. 1. Only small portion of a task { no more than 25 % of the original task code in terms of the worst case execution time { can be sliced. 2. Slicing incurs no more that 5 % increase in a task's worst-case execution time. 69 algorithm PriAssign(BnZr) begin return(Search(BnZr;[])); end list function Search(L 1 ;L 2 ) case (1) when L 1 = L 2 = []: return([]); (2) when L 1 = [];L 2 6= []: return(false); (3) when L 1 = L 0 1 @[]: (4) L = Search(L 0 1 @L 2 ;[]); (5) if L 6= false then (6) if Feasible(L;) then (7) return(L@[]); (8) else (9)  0 = Slice(); (10) if Feasible(L; 0 ) then (11) return(L@[ 0 ]); end end (12) return(Search(L 0 1 ;[]@L 2 )); end Figure 6.12: Algorithm for Priority Ordering with Slicing Decision When we ran the priority ordering algorithm with the task set in Table 6.1, it chose to slice tasks  4 ; 7 and  16 , and made the task set schedulable. The utilization grew slightly to 0.844. We show the result in Table 6.2, where \R" denotes the maximum response time of an unsliced task, and R IO and R State respectively represent the maximum response times of the two components of an sliced task. 6.5 Summary In this chapter we addressed the schedulability tuning problem for real-time applications in discrete control domains. We solved this problem with a two-tier approach. At one end was a semantics- based compiler transformation method that safely segregates two subthreads from given task code; at the other end was an extended schedulability analysis test, and its associated feasible priority ordering algorithm. While the compiler-based tool had to be carefully guided by the scheduling 70 T D c c IO c State  1 1000 1000 51 51 0  2 25000 5000 2000 1600 500  3 25000 5000 1000 800 250  4 40000 5000 2000 1600 500  5 50000 20000 3000 2400 750  6 200000 20000 3000 2400 750  7 50000 25000 5000 4000 1250  8 59000 25000 8000 6400 2000  9 80000 80000 9000 7200 2250  10 80000 80000 2000 1600 500  11 100000 80000 8000 6400 2000  12 100000 100000 5000 4000 1250  13 200000 100000 3000 2400 750  14 200000 100000 1000 800 250  15 200000 120000 1000 800 250  16 200000 140000 2000 1600 500  17 1000000 1000000 1000 800 250  18 1000000 1000000 1000 800 250 Table 6.1: Example Task Set component in our approach, the conventional separation of concerns between compilation and scheduling could still be maintained. 71 T D c c 0 R R IO R State Sliced?  1 1000 1000 51 51 51 0 0 n  2 25000 5000 2000 2100 2153 0 0 n  3 25000 5000 1000 1050 3204 0 0 n  4 40000 5000 2000 2100 0 4855 5406 y  6 200000 20000 3000 3150 8559 0 0 n  5 50000 20000 3000 3150 11712 0 0 n  8 59000 25000 8000 8400 20171 0 0 n  7 50000 25000 5000 5250 0 24375 28829 y  9 80000 80000 9000 9450 38339 0 0 n  10 80000 80000 2000 2100 42643 0 0 n  11 100000 80000 8000 8400 71372 0 0 n  12 100000 100000 5000 5250 79780 0 0 n  13 200000 100000 3000 3150 96747 0 0 n  14 200000 100000 1000 1050 97798 0 0 n  15 200000 120000 1000 1050 98849 0 0 n  16 200000 140000 2000 2100 0 139890 140441 y  17 1000000 1000000 1000 1050 141492 0 0 n  18 1000000 1000000 1000 1050 142543 0 0 n Table 6.2: Priority Assignment with Program Slicing 72 Chapter 7 Practical Considerations and Prototype Implementation The TCEL paradigm helps incorporate a higher level of abstraction into real-time domains. As we have shown, TCEL's event-based semantics constrains only those operations that are critical to real-time operation; i.e., the events denoted in the speci cation or those derived from it. As such, a source program is an appropriate representation of the designer's intentions, and it need not over- burden the system with unnecessary constraints. Moreover, the event-based semantics enables our compiler tools to transform the program, and helps resolve con icts between the timing constraints and the code's actual execution time. Since this is exactly the type of dirty work that compilers do best, a human programmer's time is probably better spent elsewhere. On the other hand, the success of our compiler tools is contingent upon the limitations of static program analysis techniques. In this chapter we consider such limitations, and show how we can work around the problems associated with them. We faced these problems in our implementation of the real-time slicer tool which uses an existing data and control dependence analyzer. We nish this chapter with a discussion of the prototype implementation which we call TimeWare/SLICE. 7.1 Practical Considerations. For the sake of brevity we presented the code scheduler and the program slicer in a rather idealized form, abstracting out some implementation-related considerations. For example, we assume that a given program is in its perfect GSA form so that all spurious dependences are e ectively removed. We also assume that available timing tools provide fairly tight execution time bounds for small code 73 segments. During our research these implementation-related factors revealed themselves via three sources: (1) experiences in building our prototypes, (2) experiences in dealing with large programs, and (3) discussions with colleagues who design and build production-quality real-time systems. In the following subsections we brie y summarize several of these considerations. 7.1.1 Limits of Data-Flow Analysis Both of the code scheduler and the program slicer heavily rely on contemporary compiler methods, including intra- and inter-procedural data and control analysis. And as with all program transfor- mation algorithms, the limitations of this enabling technology become a constraining factor of our approach. For example, current static data- ow analysis is incapable of disambiguating all pointer aliases (which at worst is an undecidable problem). Thus we cannot always translate the TCEL source into its corresponding \perfect" GSA form. We partially assuage the problem by adopting techniques such as (1) inlining procedures to avoid inter-procedural aliases; (2) rendering in GSA form only those assignments that contain statically analyzable variables; and (3) unrolling loop bodies. Of course these and similar methods will degrade the code scheduler's performance, either by increasing the amount of code, or by decreasing its ecacy. However, dependence analyzers are improving at a rapid rate, and our algorithm will improve along with them. For example, if we incorporate the recent advances in loop dependence analyses such as those in the Omega Test [43], we may not have to unroll loops to slice a real-time task. We can obtain better slices for loops using techniques like loop distribution. 7.1.2 Limits of Timing Analysis Another limiting factor is the diculty of achieving accurate, static timing analysis in the face of more complicated architectures. Quite simply, it has become incredibly dicult to use vendor- supplied benchmarks, and to model the interplay between pipelines, hierarchical caches, shared memories, register windows, etc. Thus with an approach like ours, it seems meaningless to predict the execution time of a single instruction (or even a small block). First, the CPU time will probably be too small to make a di erence in achieving feasibility, and second, the \noise" in the prediction will be too large. Thus we have adopted a hierarchical abstraction approach to deal with time predictions. For example, in the program of Figure 5.3 we accounted only for the CPU-intensive function calls that performed complex operations, while ignoring the execution time of ner-grained instructions. The same approach can be used on larger-grained structures. Our experience shows the compiler should 74 usually hunt for the \big-game targets," and forget about the smaller ones. However after code scheduling or program slicing is completed, it becomes imperative to verify the result with a more sophisticated timing tool; for example, a good pro ler. Performing such re-timing is especially important in a cached memory structure, where code scheduling will always change the instruction alignment. We note that all modern RISC compilers re-order instructions to some degree; thus the ecacy of any source-level timing analysis is diminishing. 7.1.3 User Interaction The above two factors argue against the fully automated code synthesis and program slicing tools. There is also a third factor, which we discussed in reference to speculative code motion. That is, programmers of production-quality, real-time systems will simply not accept a compiler technology that \outsmarts" them, and possibly \disobeys" their intentions. They will, however, accept a tool that helps tune their systems, but not at the price of sacri cing traceability to their original programs. A simple example illustrates the importance of this. Consider what might happen if an instruction that interacts with the environment fails to be annotated as an event (which could easily happen with memory-mapped IO). If the instruction is relocated outside of its source section, debugging the transformed program could become a nightmare. Even worse, a fatal timing fault may accrue at runtime, since the program slicer may place it in the state thread, which can be delayed past the original deadline. All of these considerations argue for a front-end that permits the programmer to interact with the tool during system-tuning. With our transformation engine as its foundation, a graphical interface allows a programmer to selectively apply the transformations { and also remain informed of the results. We have implemented such a tool for program slicing. In this tool, programmers are asked to pick a slicing criterion from the tool's program source screen, rather than the program slicer doing it automatically. In this way, programmers can selectively control the application of program slicing. 7.2 TimeWare/SLICE: the Prototype Implementation As a proof of concept, we have implemented a program slicer tool we named TimeWare/SLICE. The key features of the tool are as follows:  It computes a program slice with respect to a given slicing criterion. 75  It works on source code.  It allows users to save a computed slice, and to carry out operations between the current and the saved slices. These operations include intersection between slices, union of slices and subtracting one slice from another. These commands are used to segregate special threads from given task code.  It provides a graphical user interface.  It generates the transformed task code. In this section we discuss the implementation of TimeWare/SLICE. 7.2.1 TimeWare/SLICE Tool Screens A source program is displayed on the two tool windows: one called the primary window and the other the secondary window. Figure 7.1 demonstrates a possible layout on the tool screen of a SUN Sparc station equipped with a 17 inches display. The primary window provides users with a work place where they can pick a slicing criterion and get the slicing result. The result is shown as a set of highlighted source lines on the window. On the other hand, the secondary window provides with a bu er space where a user can temporarily store a pre-computed slice. When a user carries out operations between two slices (one on the primary window and the other on the secondary window), the primary window works as if it were an accumulator. That is, the primary window provides the rst operand and gets the result. We refer to the slice on the primary window as a current slice, and the one on the secondary window as a saved slice. 7.2.2 Commands Figure 7.2shows a screen dump of the primary window of TimeWare/SLICE. The highlighted source lines correspond to a computed slice. Now we describe the command buttons that are located below the text window. Picking a Slicing Criterion. A slicing criterion consists of a source line and a variable name. By pressing the leftmost button of a mouse and then dragging the pointer on the primary window, the user can select a variable name. Then the selected string is highlighted. Similarly, by moving the pointer and clicking the middle button of a mouse, the user can mark a source line. At that point, an arrow sign appears at the beginning of the line. 76 Figure 7.1: Tool Screens of TimeWare/SLICE Realtime Slice Command. The     Realtime Slice function computes a static program slice of the function one of whose source lines are picked as part of the slicing criterion. Figure 7.2 shows the slice with respect to the slicing criterion h line 29, state i. Addand Delete Line Commands. When     Add Line (or     Del Line )is pressed, TimeWare/SLICE adds (or deletes) a current line. These commands are useful to manually edit a slice. If the current slice is empty then     Del Line has no e ect. Subtract Command. The     Subtract function subtracts the saved slice from the current slice, and then displays the result on the primary window. As a result, all source lines common to both slices are removed from the current slice. Add Command. The     Add function adds the saved slice to the current slice, and then displays the result on the primary window. 77 Figure 7.2: Output of TimeWare/SLICE 78 Intersect Command. The     Intersect function computes the intersection of the saved slice with the current slice, and then displays the result on the primary window. Isolate Command. The     Isolate function deletes all repeated lines from the current slice, but leaves the control predicates of the current slice undeleted. The     Isolate command is used to isolate the state-update slice from the IO slice. Transform Command. The     Transform command produces transformed code by attaching the current slice to a user-speci ed target line. The result is copied into a le but the original code remains intact. Save, Load and Swap Commands. These buttons are used to save the current slice onto the secondary window, load the saved slice onto the primary window, and swap the two slices. 7.2.3 Implementation The prototype implementation of TimeWare/SLICE is based on a dynamic program slicing tool SPYDER developed at Purdue University [1]. SPYDER is originally a program debugging tool relying on dynamic slicing, and it consists of two components: a modi ed version of GCC (GNU C compiler) and GDB (GNU symbolic debugger). The role of the modi ed GCC is to produce the program dependence graph for an input program as well as the object code. SPYDER traverses the graph to compute a static program slice. We had to tailor the implementation of SPYDER due to the following limitations. (1) It does not allow users to pick a general slicing criterion. Instead, it limits the criterion to a variable name. (2) It is a program analysis tool where our implementation actually transforms the program. (3) Its static slicer is not complete, which results in incorrect slices being produced. For example, the static data ow analyzer of the modi ed GCC does not detect rede nitions of a global variable within a function, but SPYDER does not take into account of such limitation. We had to retool TimeWare/SLICE to conquer this problem. We have added several features to the TimeWare/SLICE implementation, as we described in the previous subsection. We made a strong assumption that every function call has a potential to 79 rede ne every global variable. This is due to SPYDER's limited interprocedural analysis. While our assumptions may result in too a large slice, users of TimeWare/SLICE can still modify the automatically generated slice using the editing facilities. In such a case it is always better to be safe than sorry. The implementation of TimeWare/SLICE enjoys all the bene ts of GNU-based software. For instance, it can easily be compiled and run on various hardware platforms. 7.3 Summary In this chapter we considered two fundamental technologies that enabled our tool-based method- ology. Unfortunately, these technologies still impose practical limitations and they impede the development of fully automated compiler tools. Thus it is desirable that programmers should be able to closely guide and trace tool applications. Keeping this philosophy as a principle, we de- veloped a program slicer tool that allowed user interactions and traceability of transformations through a graphical front-end. 80 Chapter 8 Conclusions Real-time computing, once the realm of small, hard-coded embedded systems, has evolved into a majorcomputing discipline. Many practical techniques and formal theories have been developed for real-time computing, and they are now widely available for various computer applications. The goal of our work is to provide engineers with software support where we believe they need it the most: in system-tuning. We identi ed three problems in this area and then presented a comprehensive solution to them. The rst problem arises well before system-tuning, but it results in many of the inconsistencies that only tuning can remedy. That is, a high-level real-time program must be translated into both scheduler-oriented tasks and associated timing constraints on the tasks. The second problem is in reconciling the di erence between task execution characteristics and the timing constraints. The last problem is attaining the desired real-time schedulability of the task set. Our alternative to system-tuning is based on semi-automatic, compiler-based transformation methods as shown in Figure 8.1. Our approach consists of three ingredients: a new real-time programming language and two novel compiler transformation methods. The TCEL Language. We designed the TCEL language, which possesses high-level timing constructs for deadlines, periodicity, etc. Unlike other languages that came before it, TCEL has a semantics based on time-constrained relationships between observable events. The semantics yields a clear interpretation of timing behavior. Feasible Code Synthesis. Armed with the event-based semantics, we developed a program transformation called feasible code synthesis, to help synthesize feasible task code from a TCEL program. The objective is to correct intra-task feasibility errors. 81 High-Level Real-Time Programs Timing Constraints on Tasks compilation synthesis Feasibility Tests (Intra-Task Consistency) Requirements Speci cation Decomposed Tasks, Schedulability Tests (Inter-Task Consistency) Compiler-based Transformation Methods Feasible Code Synthesis (Chapter 5) failure failure failure success re nement re nement Real-Time Task Slicing (Chapter 6) System-Tuning, Re-implementation failure Figure 8.1: Software Development Process: Two Alternatives Real-Time Task Slicing. After the feasibility of each task is achieved the entire task set may still be unschedulable. We developed a transformation method to address this problem, and we call it real-time task slicing. While the optimization is completely local, the improvement is realized globally, for the entire task set. Alan Burnsatthe University ofYorkindependently developed a xed-priority scheduling scheme to support the TCEL model. His analysis, in turn, led us to make a further improvement in the slicing tool; speci cally, we developed a priority ordering algorithm that derives only a small subset of tasks to be sliced for schedulability. 82 8.1 Future Directions We originally envisioned TCEL as a stand-alone real-time programming language. However, over time the emphasis of this work shifted from \programming languages" to \tuning." Eventually we realized that this tuning approach was applicable to many di erent languages where the minimal requirement is an annotation mechanism that can both distinguish observable events in a real- time program, and establish timing relationships between the events. Indeed, perhaps a graphical language may provebetter suited to this purpose. Though their application domain is di erent from ours, several recent multimedia tools have been developed, which allow users to specify temporal relationships between several continuous media [26]. We plan to borrow some of the ideas used in these languages to graphically describe timing relationships in a real-time program. When a real-time application is written in such a graphical language, the compiler has more free- dom to interleave operations because code blocks are only constrained by the speci ed precedences and data dependences. We are investigating this research direction, since it accords with our thrust in a graphical user interface. Moreover, we believe a wider spectrum of compiler transformations will be applicable to the program. Thereis anotherinteresting researchdirection which is closely related toourtool-based approach to real-time programming. The compiler-based approach we presented in this dissertation can help developers tune applications. However, when the design is awed, no amount of tuning can help achieve either feasibility or schedulability. Flaws are often the result of translating a high-level speci cation into a set of schedulable \tasks." For example, real-time programmers manually break a complex real-time design into tasks, while trying to maintain the functional correctness, so that the real-time schedulers can tractably guarantee the system'srequirements. Consequently, this decomposition introduces a new class of intermediate timing constraints, which are artifacts used to realize the original high-level requirements. If our tuning tools cannot help the programmers achieve an implementation consistent with the intermediate constraints, then the whole manual decomposition may have to be repeated. An important research direction is to automate this process with a comprehensive design methodology. Although our compiler-based approach provides a partial solution by means of a new programming language, we plan to push it into a higher level in the design hierarchy. We recently laid the foundation to achieve this goal, via an automated design methodology that works hand-in-hand with real-time design tools [17]. We call this end-to-end design. The methodology links two more phases in the design hierarchy, i.e., a high-level component derives intermediate constraints using the low-level tuning tools. The result is a way to aid pro- 83 grammers in decomposing tasks and deriving the intermediate rate constraints. We believe that this type of strategy can signi cantly streamline the design process, since it supports a variety of low-level resource-speci c considerations early on in the life-cycle. We are currently investigating a full-scale version of this method in order to incorporate network commu- nication, as well as the load on each processor. 84 Bibliography [1] H. Agrawal, R. DeMillo, and E. Spa ord. Debugging with dynamic slicing and backtracking. Software Practice and Experience, 23(6):590{616, June 1993. [2] A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques, and Tools. Addison Wesley Publishing Company, 1986. [3] A. Aiken and A. Nicolau. A development environment for horizontal microcode. IEEE Trans- actions on Software Engineering, pages 584{594, May 1988. [4] F. Allen, B. Rosen, and K. Zadeck. the forthcoming Optimization in Compilers. Addison Wesley Publishing Company, 1992. [5] N. Audsley. Optimal priority assignment and feasibility of static priority tasks with arbitrary start times. Technical Report YCS 164, Department of Computer Science, University of York, England, December 1991. [6] A. Burns. Fixed priority scheduling with deadlines prior to completion. Technical Report YCS 212 (1993), Department of Computer Science, University of York, England, October 1993. [7] J.-D. Choi, R. Cytron, and J. Ferrante. On the ecient engineering of ambitious program analysis. IEEE Transactions on Software Engineering, 20(2):105{114, February 1994. [8] R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and F. Zadeck. Eciently computing static single assignment formand the control dependence graph. ACM Transactions on Programming Languages and Systems, 13:451{490, October 1991. [9] B. Dasarathy. Timing constraints of real-time systems: Constructs for expressing them, method for validating them. IEEE Transactions on Software Engineering, 11(1):80{86, Jan- uary 1985. [10] K. Ebcioglu and A. Nicolau. A global resource-constrained parallelization technique. In In- ternational Conference on Supercomputing, pages 154{163. ACM Press, June 1989. 85 [11] J. Ferrante and K. Ottenstein. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9:319{345, July 1987. [12] J. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 30:478{490, July 1981. [13] M. Garey and D. Johnson. Computer and Intractability: A Guide to the Theory of NP- Completeness. W. H. Freeman and Company, 1979. [14] F. Gasperoni. Compilation techniques for VLIW architectures. Technical Report RC 14915(#66741), IBM T. J. Watson Research Center, September 1989. [15] R. Gerber and S. Hong. Semantics-based compiler transformations for enhanced schedulabil- ity. In Proceedings of IEEE Real-Time Systems Symposium, pages 232{242. IEEE Computer Society Press, December 1993. [16] R. Gerber and S. Hong. Compiling real-time programs with timing constraint re nement and structural code motion. IEEE Transactions on Software Engineering, 1995. To Appear. [17] R. Gerber, S. Hong, and M. Saksena. Guaranteeing end-to-end timing constraints by cali- brating intermediate processes. In Proceedings of IEEE Real-Time Systems Symposium, pages 192{203.IEEE Computer Society Press, December 1994. Also to appear in IEEE Transactions on Software Engineering. [18] P. Gopinath and R. Gupta. Applying compiler techniques toscheduling in real-time systems. In Proceedings of IEEE Real-Time Systems Symposium, pages 247{256. IEEE Computer Society Press, December 1990. [19] M. Harmon, T. Baker, and D. Whalley. A retargetable technique for predicting execution time. In Proceedings of IEEE Real-Time Systems Symposium, pages 68{77. IEEE Computer Society Press, December 1992. [20] P. Havlak. Interprocedural Symbolic Analysis. PhD thesis, Department of Computer Science, Rice University, May 1994. [21] S. Hong and R. Gerber. Compiling real-time programs into schedulable code. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation. ACM Press, June 1993. SIGPLAN Notices, 28(6):166-176. [22] S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graph. ACM Transactions on Programming Languages and Systems, 12:26{60, January 1990. 86 [23] Y. Ishikawa, H. Tokuda, and C. Mercer. Object-oriented real-time language design: Constructs for timing constraints. In Proceedings of OOPSLA-90, pages 289{298, October 1990. [24] F. Jahanian and A. Mok. Safety analysis of timing properties in real-time systems. IEEE Transactions on Software Engineering, 12(9):890{904, September 1986. [25] K. Kenny and K. Lin. Building exible real-time systems using the Flex language. IEEE Computer, pages 70{78, May 1991. [26] M. Kim. Hyperstories: Combining time, space and asynchrony in multimedia documents. Technical Report RC 19277 (#83726), IBM T. J. Watson Research Center, October 1993. [27] E. Kligerman and A. Stoyenko. Real-Time Euclid: A language for reliable real-time systems. IEEE Transactions on Software Engineering, 12:941{949, September 1986. [28] J. Krause. GN&C domain modeling: Functionality requirements for xed rate algorithms. Technical Report (DRAFT) version 0.2, Honeywell Systems and Research Center, December 1991. [29] I. Lee, P. Bremond-Gregoire, and R. Gerber. A process algebraic approach to the speci cation and analysis of resource-bound real-time systems. IEEE Proceedings, 82(1), January 1994. [30] I. Lee and V. Gehlot. Language constructs for real-time programming. In Proceedings of IEEE Real-Time Systems Symposium, pages 57{66. IEEE Computer Society Press, 1985. [31] J. Leung and M. Merill. A note on the preemptive scheduling of periodic, real-time tasks. Information Processing Letters, 11(3):115{118, November 1980. [32] S. Lim, Y. Bae, C. Jang, B. Rhee, S. Min, C. Park, H. Shin, K. Park, and C. Kim. An accurate worst case timing analysis for risc processors. In Proceedings of IEEE Real-Time Systems Symposium, pages 97{108. IEEE Computer Society Press, December 1994. [33] K. Lin and S. Natarajan. Expressing and maintaining timing constraints in FLEX. In Pro- ceedings of IEEE Real-Time Systems Symposium. IEEE Computer Society Press, December 1988. [34] C. Liu and J. Layland. Scheduling algorithm for multiprogramming in a hard real-time envi- ronment. Journal of the ACM, 20(1):46{61, January 1973. [35] C. Locke, D. Vogel, and T Mesler. Building a predictable avionics platform in ada: A case study. In Proceedings of IEEE Real-Time Systems Symposium, pages 181{189.IEEE Computer Society Press, December 1991. 87 [36] T. Marlowe and S. Masticola. Safe optimization for hard real-time programming. In Second International Conference on Systems Integration, pages 438{446, June 1992. [37] M. Merritt, F. Modungo, and M. Tuttle. Time-Constrained Automata. In CONCUR '91, August 1991. [38] A. Nicolau. Parallelism, Memory Anti-aliasing and Correctness Issues for a Trace Scheduling Compiler. PhD thesis, Yale University, June 1984. [39] V. Nirkhe. Application of Partial Evaluation to Hard Real-Time Programming. PhD thesis, Department of Computer Science, University of Maryland at College Park, May 1992. [40] V. Nirkhe and W. Pugh. Partial evaluator for the Maruti hard real-time system. In Pro- ceedings of IEEE Real-Time Systems Symposium, pages 64{73. IEEE Computer Society Press, December 1991. [41] K. Ottenstein and L. Ottenstein. The program dependence graph in a software development environment. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Sympo- sium on Practical Software Development Environments, pages 177{184, May 1984. [42] C. Park and A. Shaw. Experimenting with a program timing tool based on source-level timing schema. In Proceedings of IEEE Real-Time Systems Symposium, pages 72{81.IEEE Computer Society Press, December 1990. [43] W. Pugh and D. Wonnacott. Eliminating false data dependences using the Omega test. In Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation. ACM Press, June 1992. [44] D. Whalley R. Arnold, F. Mueller. Bounding worst-case instruction cache performance. In Proceedings of IEEE Real-Time Systems Symposium, pages 172{181. IEEE Computer Society Press, December 1994. [45] L. Sha, R. Rajkumar, and J. Lehoczky. Priority inheritance protocols: An approach to real- time synchronization. IEEE Transactions on Software Engineering, 39:1175{1185, September 1990. [46] U. Shankar. A simple assertional proof system for real-time systems. In Proceedings of IEEE Real-Time Systems Symposium, pages167{176.IEEE ComputerSociety Press, December1992. [47] A. Shaw. Reasoning about time in higher level language software. IEEE Transactions on Software Engineering, pages 875{889, July 1989. 88 [48] M. Smith, M. Horowitz, and M. Lam. Ecient superscalar performance through boosting. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 248{259. ACM Press, October 1992. [49] A. Stoyenko. A schedulability analyzer for Real-Time Euclid. In Proceedings of IEEE Real- Time Systems Symposium, pages 218{227. IEEE Computer Society Press, December 1987. [50] K. Tindell. Using o set information to analyse static priority pre-emptively scheduled task sets. Technical Report YCS 182 (1992), Department of Computer Science, University of York, England, August 1992. [51] K. Tindell, A. Burns, and A. Wellings. An extendible approach for analysing xed priority hard real-time tasks. The Journal of Real-Time Systems, 6(2):133{152, March 1994. [52] G.Venkatesh. Thesemantic approachtoprogramslicing. In Proceedingsof the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, June 1991. [53] M. Weiser. Program slicing. IEEE Transactions on Software Engineering, 10:352{357, July 1984. [54] V. Wolfe, S. Davidson, and I. Lee. RTC: Language support for real-time concurrency. In Proceedings of IEEE Real-Time Systems Symposium, pages 43{52. IEEE Computer Society Press, December 1991. [55] N. Zhang, A. Burns, and M. Nicholson. Pipelined processors and worst case execution times. The Journal of Real-Time Systems, 5(4), October 1993. 89