Replication

 To efficiently integrate peripheral resources, Streamline must apply copy, context switch and cache invalidation reduction techniques not just between processes and the kernel, but across peripheral buses, CPU interconnects and possibly even network links. Latency and bandwidth characteristics of these media differ substantially. Across peripheral buses and network links caching, prefetching, burst and hardware (DMA) offload can and must be employed to minimize the gap with local memory. The BMS optimizes access behind the Unix interface interface without client interference, by replicating data on the consumer side of a slow transport medium. We call interfaces that optimize high-latency access shadow buffers. The diversity in media bottlenecks has led to four shadow buffer types: zero copy, copy once, hybrid and prefetching.

Zero-copy accesses the imported data segment as if it were local. It uses the same read and write implementations as non-shadow buffers. Zero-copy is efficient for shared memory communication, such as between kernel and userspace. When bandwidth is narrow, latency is high, or features such as burst mode or DMA offload can be exploited, zero-copy is suboptimal. Across the shared PCI bus, for example, bulk one-way transfers benefit from both burst mode and DMA offload. While zero-copy access is possible, every read incurs a round-trip delay. For high-speed sequential access --the common case in I/O processing-- copying a block using hardware support (copy once) is more efficient. This method does not directly reflect read call requests to the imported buffer, but keeps a local shadow copy of its imported data segment and issues bulk copy requests between the two. In networking, especially for forwarding and filtering applications, it is common to initially only inspect packet headers and based on this to decide whether to access the larger payload. Both zero copy and copy once are inefficient for these applications, one in the common case of header-only processing, the other in the more expensive case of payload inspection. A hybrid solution supplies the first $ n$ bytes of a block directly from an imported segment in zero-copy fashion and switches to copy once bulk transfer for payload requests. For TCP/IP networks the threshold is hardcoded to the default payload offset. Finally, on high-throughput, high-latency links such as WANs, sequential throughput benefits from speculative copying of more data than is directly requested, or prefetching. Table 4.3distributed memory access methods  summarizes the four methods.


Table 4.3: distributed memory access methods 
\begin{table}\centering
\begin{tabular*}{\linewidth}{lll}
\toprule
{\bf Acces...
...hiding & Very high latency networks\\
\bottomrule
\end{tabular*}
\end{table}


false All presented methods have in common that they are pull-based, i.e., that the consumer initiates replication. This is a consequence of the fact that producers cannot predict consumer access pattern. They do not need to know, as dataplane signaling pushes events.........



Subsections
willem 2010-02-03