Moshe Malka, M.Sc. Thesis Seminar
Wednesday, 19.11.2014, 12:30
The IOMMU allows the OS to encapsulate I/O devices in
their own virtual memory spaces, thus restricting their DMAs
to specific memory pages. The OS uses the IOMMU to protect
itself against buggy drivers and malicious/errant devices. But
the added protection comes at a cost, degrading the throughput
of I/O-intensive workloads by up to an order of magnitude.
This cost has motivated system designers to trade off some
safety for performance, e.g., by leaving stale information in
the IOTLB for a while so as to amortize costly invalidations.
We observe that many devices such as network and disk
controllers-typically interact with the OS via circular ring
buffers that induce a sequential, completely predictable workload.
We design a ring IOMMU (rIOMMU) that leverages this characteristic
by replacing the virtual memory page table hierarchy with a circular,
flat table. A flat table is adequately supported by exactly one IOTLB
entry, making every new translation an implicit invalidation of the
former and thus requiring explicit invalidations only at the end of
I/O bursts. Using standard networking benchmarks, we show that rIOMMU
provides up to 7.56x higher throughput relative to the baseline IOMMU,
and that it is within 0.77-1.00x the throughput of a system without IOMMU protection.