We have argued towards a system's approach to optimization of I/O logic. We are not the first to propose optimization at this level [KKM+07], but it is definitely not common practice in operating system design. In distributed computing, especially in the context of grids, automated resource discovery, selection and allocation are well entrenched [KBM02,YB05]. As individual machines start to resemble distributed systems, one can argue that it is only natural that mechanisms from distributed systems find their way into host operating systems. To a certain extent, then, we only propose to reuse and combine proven ideas. Where this work differs from distributed systems, is in the resources that it manages. On a single machine, communication is more reliable. Fault-tolerance is less of a concern, as errors (e.g., power loss) are likely to affect the entire system at once. On the other hand, the view of a communication system as a set of fixed-bandwidth channels interconnecting isolated processes is untenable. We must adapt established control practice to kernel and embedded environments that lack process and memory isolation and other userland comforts.
false split concerns and move into respective sections: (1) cost-effectiveness->automation (2) deterministic->automation (3) model incorrectness->automationWe see three reasonable concerns regarding operating system-level automation. First cause for hesitation is the matter of cost effectiveness. Streamline is ideally structured for runtime optimization and therefore poses a suitable candidate to see if at least some applications can benefit: a weak, because not universal, justification for automation.
We introduce the term application tailoring to mean the automatic process of generating quantitatively optimized software applications from a set of composite application and system specifications and a set of relocatable computation and communication operations. The method is a practical midway point between portable but inefficient compiled programs and fast but error prone just-in-time recompilation of entire applications to each target platform. The approach is particularly well suited to high throughput I/O because I/O application code is already composite: it crosscuts classical OS layering and naturally decomposes into computation and communication steps. Also, I/O applications are generally long-lived, which is necessary to have increased operational efficiency offset the cost of the application tailoring process. Taken together, we call our approach for solving the issues set out in section one application-tailored I/O.
We take a purposefully conservative approach to software optimization by basing automation on linear programming, which has been applied to multiprocessor scheduling before [Sto77,NT93]. Even though quantitative optimization in systems has been frequently advocated in recent years [Sir06,KKM+07,GDFP09], practical examples remain few and far between. Successful implementation in an operating system should bolster confidence in the approach.
false Application-tailored I/O as espoused by Streamline avoids common I/O bottlenecks by adapting logic at run-time to match application type and hardware configuration. Run-time adaptation pushes some optimization complexity to the end-host; Streamline implements a mid-way point between static code and full recompilation, because both have issues: the first cannot anticipate all computer architectures and applications, the second is impractical (e.g., it requires a tool-chain for each device on each end-host).
false
As a streamline path does not have to be general purpose,
it gets by with fewer control-flow statements per packet.
Statement-for-statement, constructed paths are slower than their
compiled equivalents, however. Why did we not choose the path of code-generation
and on-demand recompilation? For many cases, e.g., within third party
userspace processes, this is simply not feasible. For embedded devices
the development tools are often uncooperative. Moreover,
the cost turns out to be small if streams are implemented correctly
(we quantify this in Section
). In the few exceptional
cases where a constructed datapath causes noticeable slowdown, we
advocate the use of special-purpose state-machines,
such as BPF, as nodes within the DAG.
false
willem 2010-02-03