Zero-copy accesses the imported data segment as if it
were local. It uses the same read and write
implementations as
non-shadow buffers. Zero-copy is efficient for shared memory communication,
such as between kernel and userspace.
When bandwidth is narrow, latency is high, or features such as
burst mode or DMA offload can be exploited, zero-copy is suboptimal.
Across the shared PCI bus, for example, bulk one-way
transfers benefit from both burst mode and DMA offload.
While zero-copy access is possible, every read incurs a round-trip delay.
For high-speed sequential access --the common case in I/O
processing-- copying a block using hardware support (copy once)
is more efficient. This method does not directly
reflect read call
requests to the imported buffer, but keeps a local shadow
copy of its imported data segment and issues bulk copy requests between the two.
In networking, especially for forwarding and filtering applications, it is
common to initially only inspect packet headers and based on this to
decide whether to access the larger payload.
Both zero copy and copy once are inefficient for these applications,
one in the common case of header-only processing, the other in the more
expensive case of payload inspection.
A hybrid solution supplies the first
bytes of a block directly
from an imported segment in zero-copy fashion
and switches to copy once bulk transfer for payload requests.
For TCP/IP networks the threshold is hardcoded to the default payload offset.
Finally, on high-throughput,
high-latency links such as WANs, sequential
throughput benefits from speculative copying of more data than is directly
requested, or prefetching.
Table 4.3distributed memory access methods summarizes the four methods.
false All presented methods have in common that they are pull-based, i.e., that the consumer initiates replication. This is a consequence of the fact that producers cannot predict consumer access pattern. They do not need to know, as dataplane signaling pushes events.........