Software & High Performance: Why They Don’t Have to Be Mutually Exclusive

F5 Ecosystem | March 03, 2020

Tom AtkinsSenior Product Marketing Manager | F5

At this point the benefits of migrating away from hardware-dominated environments to cloud and software-defined architectures are well known—increased scalability, operational agility and economic flexibility, to name just a few. But there is also the common misconception that in order to realize these gains, organizations are forced to make a sacrifice regarding the performance of their apps. After all, how can shared, virtualized infrastructure come close to delivering the same performance as custom, dedicated hardware?

And while many new, stateless, cloud-native apps are designed to scale horizontally as demand for them grows, there are still thousands of monolithic workloads with stateful requirements that restrict them to scaling vertically to satisfy increased demand. For these apps, greater software performance (and scalability) is crucial as many will not—or simply cannot—be rearchitected as they move to the cloud.

Enter BIG-IP Virtual Edition (VE)

Since the inception of BIG-IP VE a decade ago, one of the most frequently posed questions asked by customers goes something like, “What level of scaling and traffic processing performance are we likely to experience with VE compared to your hardware?”All those years ago, the gap between the two was sizeable as early VEs were only really intended to replace app delivery hardware that fronted low-traffic specific apps. Back then, VE was only capable of processing around 1Gbps of traffic while handling a fraction of the L4/L7 requests and connections possible in hardware.

Fast forward to present day, however, and given the right conditions VEs can now process over 100Gbps of application traffic and match all but the highest performing appliances across other traffic processing metrics. In this article, we’ll take a look at some recent VE enhancements and supported acceleration technologies that have helped all but close the performance gap on its physical counterparts, while providing a sneak peek into the next VE optimization project we’re working on with SmartNICs.

Custom VE Poll-Mode Drivers that Optimize Single-Root I/O Virtualization (SR-IOV)For those unfamiliar with the fundamentals of virtualization, the core concept entails a physical server hosting a software layer (OS/hypervisor) that emulates the functions of the underpinning hardware, and enables multiple, distinct virtual machines (BIG-IP VE for example) with potentially distinct operating systems to run on top of it. While great for optimizing the resource utilization of the physical server and enabling app mobility, the additional hypervisor layer and associated virtual switch required increases latency and hampers performance as requests must pass through it with all the associated copies and interrupts. The use of SR-IOV however allows VE to interact directly with the network interface (NIC) on the physical server; bypassing the virtual switch layer and improving latency and performance. While SR-IOV is a fairly common technology these days (with most NIC vendors supporting it), the guest drivers that come included in the OS kernel or those provided by the NIC vendors are generic and not optimized specifically for BIG-IP. This is why F5 has also heavily invested in developing VE poll-mode drivers for a range of leading NIC adapters, helping accelerate VE packet processing when using SR-IOV. It’s this approach that has allowed VEs to reach L4 throughputs of up to 20Gbps on AWS (using AWS Elastic Network Adapter in Gen5 instances), up to 10Gbps on Azure (using Azure Accelerated Networking), and exceed 85Gbps in private cloud environments (using Mellanox CX5 100G NIC). Additionally, it’s possible to achieve over 100Gbps by performing link aggregation—which essentially combines multiple distinct NIC ports to create a single high-throughput datapath. Using this approach, you can learn how a single VE achieved 108Gbps using three 40G Intel NICs in this DevCentral article.
Offloading cryptographic processing with Intel Quick Assist Technology (QAT)Well over half of web traffic is now encrypted, and with explosive growth of IoT devices and the global movement to 5G, the amount of data that needs to be encrypted is set to increase exponentially. Operating within a full-proxy architecture between clients and servers, BIG-IP VE decrypts all encrypted traffic allowing it to inspect, analyze, and block malicious looking payloads, before re-encrypting the data and routing it to its desired destination. And while VE has been optimized to deliver high-performance software-based encryption, this process can still be a burden on CPU resources—reducing the number of processing cycles available for other L7 tasks, iRules, or policy enforcement. To mitigate this effect for workloads with significant cryptographic processing requirements, VE offloads encryption to Intel QAT—a hardware accelerator purpose-built for cryptographic processing and compression. In doing so, VE is able to offload these CPU-draining tasks, free up compute cycles, and increase overall performance. Evidence of this can be found in this recent study on the impact of using QAT with VE, which showed:
- CPU utilization reduction of up to 45%
- Bulk throughput increase of up to 200%
- Transactions per second (TPS) increase of up to 500%
Introduction of Unthrottled High Performance VEs

Prior to the availability of F5’s High Performance VEs, all VEs used a throughput rate-limited license model where licenses were aligned to set throughput levels and quantity of CPUs (200Mbps & 2vCPUs, for example). For smaller apps a 25Mbps instance would likely suffice, or conversely, for apps in high demand, the largest 10Gbps instance might be more appropriate.

But what about apps with greater demands? Or those with unpredictable requirements? As F5 moved to support higher bandwidth NICs and exceed 10Gbps, we introduced High Performance VEs which were licensed according to the number of vCPUs they were permitted to use. The maximum performance attainable with High Performance VEs is instead dependent on the number of vCPUs assigned to it—from 8vCPUs to 24vCPUs in increments of 4vCPUs. In addition to allowing VE to squeeze every last ‘packet per second’ out of each CPU, this approach is also more supportive of CPU intensive use cases including DDoS mitigation and SSL/TLS encryption.

Learn more about the capabilities of High Performance VE in the BIG-IP VE datasheet.

Future – Hyperscale DDoS Mitigation using a SmartNIC from Intel

Distributed-Denial-of-Service (DDoS) attacks continue to be one of the most effective and widely used forms of cyberattack—with everybody from frustrated online gamers to nation state cyber teams utilizing them to take targeted apps and services offline. Potentially using thousands of disparate connections from machines located across the globe, these attacks can quickly overwhelm security solutions, especially those with insufficient capacity to mitigate them. And with the global transition to 5G, DDoS attacks are only going to increase in size, severity, and complexity as it becomes easier to form massive, resource crippling botnets with fewer devices.

Fortunately, however, you will soon have the ability to offload specific CPU-intensive functions including DDoS mitigation from BIG-IP VE onto Intel’s (N3000 Programmable Acceleration Card)—a SmartNIC with embedded Field Programmable Gate Array (FPGA). When correctly programmed by F5 leveraging our extensive 10+ years of experience using FPGAs, this SmartNIC is capable of exponentially increasing VE's DDoS mitigation capabilities. In fact, early testing by F5 has shown that this combined solution will be able to withstand a DDoS attack 70 times greater in magnitude than VE using CPU only—helping to keep your apps and network secure.

This integration is expected to be generally available later this year, and additional information is available in this solution brief.