fbRegEx
Silicom Inline Regular Expression Search and Flow Steering
Inline RegEx search and dynamic flow steering on a NIC, as a method of accelerating DPI engines up to 40Gbps and 80Gpbs
Introduction
The need to enhance and accelerate attack mitigation systems stems from the rise in bandwidth, and the increase of traffic that needs to be monitored in real time. Common attack mitigation systems, be it open source project such as Snort, Bro, Suricata as well as commercial implementations, suffer from the same drawback when it comes to dealing with high bandwidth of traffic, which is anything above 10Gbps. Typically these security applications, as in every other security application (Firewall, for instance), scale in performance as processing core count climbs. However, somewhere between 10Gbps to 20Gpbs there are not enough cores available, even on a high end server; and functionality needs to be expanded to the server next door. Pricy and complicated.
Several approaches have been taken in an attempt to accelerate attack mitigation systems. One of which was taking the stage of pattern match or regular expression search in deep packet inspection stage off a standard CPU onto a sub engine. With relatively limited data cache, a standard CPU is not the optimal machine for text match, formal regular language expressions and grammar structures representations. These kind of tasks are better carried out on a purpose built pipeline, on an ASIC’s or FPGA’s sub engine.
Figure 1: Lookaside vs. fbRegEx inline acceleration
The sub engine approach, however, should also be carefully considered. When using a lookaside sub engine (see Figure 1), workload requests are issued by the main processor, waiting for a response. With a lookaside regular expression match it was proved to be only partially effective. Take, for example, a Snort machine inspecting traffic that streams. Snort instances – if so programmed – would hand over search and match jobs to a sub engine and would wait on replies by sitting around doing nothing. Evidently the cost of offload is considerable1.
A different approach for acceleration is around data plane handling in such a way, that will be beneficial to the performance of the attack mitigation system as a whole; and it should be carefully considered how exactly this one should be done. Continuing with the Snort example, Snort’s data plane is very simple and consists of Snort DAQ (Data AcQuisition component), which simply streams in every packet it gets right into Snort instances. What if Snort DAQ could tell in advance what traffic is likely to interest the Snort instance, and throw away (or just count) the rest? This approach, if properly implemented, allows Snort to focus on what it should focus on, and thus could have a tremendous effect on overall performance.
1 Often such an engine resides down a PCIe complex, so data traverses several times over PCIe bus. A software system that utilizes a lookaside engine should better work asynchronously, as if it were a NIC.
Solution Overview
Taking into account the above challenges, Silicom came up with a solution that combines the two most effective strategies for acceleration, and implemented inline regular expression search offload, along with data plane improvement, on a NIC. This document describes how regular expression searches are applied on streaming data under inspection, in the NIC itself, before it hits the CPU. Regular expression search costs, therefore, are virtually reduced to zero, leaving only the net gain. The regular expression search engine is based on Helios RXPF core delivered by TitanIC2, and integrated by Silicom.
The core of the pipeline, naturally is the Helios RXPF regular expression search engine, configured as an inline engine. A single instance of this engine is capable of line rate search of up to 40Gbps, and depending on the complexity of the expressions to search, up to thousands or tens of thousands of expressions can be compiled in. The engine includes numerous optimizations; such as a prefix match that enhances performance typically with layer 7 searches.
Built with complete interoperability with the inline regular expression search engine, a flow steering mechanism is implemented as part of the NIC processing. Once a packet payload is matched against an expression on the NIC itself, the flow table is updated and subsequent packets belonging to the same flow are then steered. By default, matched flows are steered up to the host, where an attack mitigation system would scrutinize them even deeper; while flows that are not matched on NIC, i.e., not of interest to the system, would not arrive up to the host at all, and by that, an enormous amount of CPU cycles is saved and can be dedicated to that actual application core business logic.
Regular Expression Packet Evaluation Flow
Silicom’s fbRegEx implements a well-defined packet processing pipeline. All network ingress packets are pushed to a flow steering engine and then to an engine of regular expression search, of which results are also fed back to the flow steering engine. According to the output of these engines, a decision is taken as to whether to forward the packet or not. In case a flow is marked for forwarding, the rest of this flow’s packets are forwarded.
Software Host Interface
Silicom delivers the fbRegEx solution along with a complete software suite to operate the solution with. Starting with a full set of data path host drivers, moving to and runtime utilizes, the package from Silicom includes:
(1) Host driver, either Linux kernel driver or DPDK;
(2) Optional DPDK based Snort DAQ;
(3) Host utilities, including:
a. Host interface to load regular expressions set onto adapter;
b. Adapter firmware load interface.
(4) Walkthrough guidance and readme files.
The package enables quick and easy bring up and smooth system integration.
Host Driver Control Messaging
Ingress data paths of packets that are streaming in form the network through the fbRegex card, and up to the host consists both of actual packet payload, as well as control information as for matched rule, first matched packet, etc. That way the driver, if so programmed, could signal the application that runs on the host, with information in that regard.
PoC Ready, Production Ready
Silicom fbRegEx is a complete solution that includes a PCIe network adapter, delivered with the software host interface package. The choice of pipeline bandwidths and sizes is detailed in Table 1.
# | Num. of GbE links | Type of GbE links | Regular Expression Search Pipeline | Silicom Model |
1 | 8 | 10GbE | 40Gbps | fb8XG@V7 |
2 | 2 | 40GbE | ||
3 | 8 | 10GbE/25GbE | 80Gbps | fb4CGg3@VU/VU+ |
4 | 2 | 40GbE/100GbE | ||
5 | 16 | 10GbE/25GbE | ||
6 | 4 | 40GbE/100GbE |
Table 1: Choice of fbRegEx solution sizes
The software host interface enables working with multiple ingress queues, such that multiple application instances can be used on top of the solution to spread workload evenly across CPU cores. The fbRegEx framework reveals two types of statistics. One is the received DMA statistics and the other type is match statistics from the pipeline itself. The fbRegEx solution is a production grade solution ready for application integration. Silicom demonstrates fbRegEx together with Snort v2.9.8 through multiple receive queues, that allow for multiple Snort instances.