Welcome!

Open Source Cloud Authors: Elizabeth White, Yeshim Deniz, Pat Romanski, Liz McMillan, Zakia Bouachraoui

Related Topics: Microsoft Cloud, Open Source Cloud, Machine Learning , Silverlight, Release Management , CRM

Microsoft Cloud: Blog Feed Post

Performance Tuning Windows 2012: Network Subsystem | Part 1

NDIS, the protocol stack, and user mode applications

Offload Capabilities
Offloading tasks can reduce CPU usage on the server, which improves the overall system performance. The network stack in Windows 2012 (and prior versions of the OS) can offload one or more tasks to a network adapter permitted you have an adapter with offload capabilities. The table below lists the details about offload capabilities:

Receive-side scaling (RSS) is a network driver technology that enables the efficient distribution of network receive processing across multiple CPUs in multiprocessor systems.

Checksum calculation

The network stack can offload the calculation and validation of Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) checksums on send and receive code paths. It can also offload the calculation and validation of IPv4 and IPv6 checksums on send and receive code paths.

IP security authentication and encryption

The TCP/IP transport layer can offload the calculation and validation of encrypted checksums for authentication headers and Encapsulating Security Payloads (ESPs). The TCP/IP transport layer can also offload the encryption and decryption of ESPs.

Segmentation of large TCP packets

The TCP/IP transport layer supports Large Send Offload v2 (LSOv2). With LSOv2, the TCP/IP transport layer can offload the segmentation of large TCP packets to the hardware.

Receive Segment Coalescing (RSC)

RSC is the ability to group packets together to minimize the header processing that is necessary for the host to perform. A maximum of 64 KB of received payload can be coalesced into a single larger packet for processing.

Receive-Side Scaling (RSS)

Receive-Side Scaling (RSS)
Windows Server 2012 (as well as Windows Server 2008 R2, and Windows Server 2008) supports Receive Side Scaling (RSS). RSS directs network processing to up to one logical processor per core. For example, given a server with Intel Hyper-Threading and 4 cores (8 logical processors), RSS will use no more than 4 logical processors for network processing.
RSS distributes incoming network I/O packets among logical processors so that packets that belong to the same TCP connection are processed on the same logical processor. RSS also load balances UDP unicast and multicast traffic from Windows Server 2012, and it routes related flows (as determined by hashing the source and destination addresses) to the same logical processor, preserving the order of related arrivals. Windows Server 2012 provides the following methods to tune RSS behavior:

· Windows PowerShell: Get-NetAdapterRSS, Set-NetAdapterRSS, Enable-NetAdapterRss, Disable-NetAdapterRss. These cmdlets allow you to view and modify RSS parameters.

· RSS Profiles: Used to determine which logical processors are assigned to which network adapter. Possible profiles are:

o Closest. Logical processor numbers near the network adapter’s base RSS processor are preferred. Windows may rebalance logical processors dynamically based on load.

o ClosestStatic. Logical processor numbers near the network adapter’s base RSS processor are preferred. Windows will not rebalance logical processors dynamically based on load.

o NUMA. Logical processor numbers will tend to be selected on different NUMA nodes to distribute the load. Windows may rebalance logical processors dynamically based on load.

o NUMAStatic. This is the default profile. Logical processor numbers will tend to be selected on different NUMA nodes to distribute the load. Windows will not rebalance logical processors dynamically based on load.

o Conservative: RSS uses as few processors as possible to sustain the load. This option helps reduce the number of interrupts.

You can use the set-netadapterRSS cmdlet to choose how many logical processors can be used for RSS on a per-network adapter basis, the starting offset for the range of logical processors, and which node the network adapter allocates memory from:

· MaxProcessors: Sets the maximum number of RSS processors to be used, ensuring application traffic is bound to a maximum number of processors on an interface.

set-netadapterRSS –Name “Ethernet” –MaxProcessors <value>

· BaseProcessorGroup: Sets the base processor group of a NUMA node, affecting the processor array used by RSS.

set-netadapterRSS –Name “Ethernet” –BaseProcessorGroup <value>

· MaxProcessorGroup: Sets the Max processor group of a NUMA node, affecting the processor array used by RSS.

set-netadapterRSS –Name “Ethernet” –MaxProcessorGroup <value>

· BaseProcessorNumber: Sets the base processor number of a NUMA node, allowing partitioning processors across network adapters and specifying the first logical processor in the range of RSS processors that is assigned to each adapter.

set-netadapterRSS –Name “Ethernet” –BaseProcessorNumber <Byte Value>

· NumaNode: The NUMA node that each network adapter can allocate memory from.

set-netadapterRSS –Name “Ethernet” –NumaNodeID <value>

· NumberofReceiveQueues: If your logical processors seem to be underutilized for receive traffic, you can try increasing the number of RSS queues from the default of 2 to the maximum number supported.

set-netadapterRSS –Name “Ethernet” –NumberOfReceiveQueues <value>

RSS does not provide any interaction with virtual machines, instead you can configure VMQ. RSS can be enabled for virtual machines in the case of SR-IOV because the virtual function driver supports RSS capability. In this case, the guest and the host will have the benefit of RSS. The host however, does not get RSS capability because the virtual switch is enabled with SR-IOV.

Receive-Segment Coalescing (RSC)
Receive Segment Coalescing can improve performance by reducing the number of IP headers that are processed for a given amount of received data.  You should use RSC to tune performance of received data by grouping (or coalescing) smaller packets into larger units. This can reduce latency and increase throughput for received heavy workloads. On network adapters supporting RSC, make sure that it is enabled, unless you have low latency, low throughput networking needs that benefit from RSC being turned off.

In Windows Server 2012 you can use the following PowerShell cmdlets to configure RSC capable adapters: Enable-NetAdapterRsc, Disable-NetRsc, Get-NetAdapterAdvancedProperty, and Set-NetAdapterAdvancedProperty. RSC can be examined using the cmdlets Get-NetAdapterRSC and Get-NetAdapterStatistics. The Get cmdlet shows if RSC is enabled and if TCP enables RSC to be in operational state. In the example above, IPv4 RSC is enabled. To understand failures, you can view the coalesced bytes or exceptions caused by entering the following command:

PS C:\Users\Administrator> $x = Get-NetAdapterStatistics “myAdapter”

PS C:\Users\Administrator> $x.rscstatistics

CoalescedBytes : 0

CoalescedPackets : 0

CoalescingEvents : 0

CoalescingExceptions : 0

RSC and virtualization
If the host adapter is not bound to the virtual switch, RSC is supported on the physical host. If the adapter is bound to the virtual switch, Windows 2012 will disable RSC on the physical host.
RSC can be enabled for a virtual machine when SR-IOV is enabled. In this case, virtual functions will support RSC capability; hence, virtual machines will also get the benefit of RSC.

Network Adapter Resources

A few network adapters actively manage their resources to achieve optimum performance. Several network adapters let the administrator manually configure resources by using the Advanced Networking tab for the adapter. For such adapters, you can set the values of a number of parameters including the number of receive buffers and send buffers.  In Windows Server 2012, you can configure advanced network settings using the following PowerShell cmdlets:

  • Get-NetAdapterAdvancedProperty
  • SetNetAdapterAdvancedProperty
  • Enable-NetAdapter
  • Enable-NetAdapterBinding
  • Enable-NetAdapterChecksumOffload
  • Enable-NetAdapterLso
  • Enable-NetAdapterIPSecOffload
  • Enable-NetAdapterPowerManagemetn
  • Enable-NetAdapterQos
  • Enable-NetAdapterRDMA
  • Enable-NetAdapter
  • Enable-NetAdapterSriov

Message-Signaled Interrupts (MSI/MSI-X)
Network adapters that support MSI/MSI-X can target specific logical processors. If your network adapter also support RSS, then a logical processor can be dedicated to servicing interrupts and deferred procedure calls (DPCs) for a given TCP connection. This will greatly improve performance, by preserving the TCP cache.

Interrupt Moderation
Lastly, we’ll discuss interrupt moderation. Some network adapters expose different interrupt moderation levels, or buffer coalescing parameters, or both. You definitely should consider buffer coalescing when the network adapter does not perform interrupt moderation. Interrupt moderation will reduce CPU utilization because it minimizes the per-buffer processing cost, but you should consider that interrupt-moderation  and buffer coalescing can have a negative impact on latency-sensitive situations. The table below lists the suggested adapter features for various server roles.

Role

Checksum offload

Large Send Offload (LSO)

Receive-side scaling (RSS)

Receive Segment Coalescing (RSC)

File server

X

X

X

X

Web server

X

X

X

Mail server (short-lived connections)

X

X

Database server

X

X

X

FTP server

X

X

X

Media server

X

X

X

These settings serve as guidelines only . Depending on the workload, your network adapter(s), and your specific situation, your experience can be different. In our next article we’ll go deeper into tuning the network adapter and utilizing some of the features we discussed.

Read the original blog entry...

More Stories By Hovhannes Avoyan

Hovhannes Avoyan is the CEO of PicsArt, Inc.,

IoT & Smart Cities Stories
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
Cell networks have the advantage of long-range communications, reaching an estimated 90% of the world. But cell networks such as 2G, 3G and LTE consume lots of power and were designed for connecting people. They are not optimized for low- or battery-powered devices or for IoT applications with infrequently transmitted data. Cell IoT modules that support narrow-band IoT and 4G cell networks will enable cell connectivity, device management, and app enablement for low-power wide-area network IoT. B...
The hierarchical architecture that distributes "compute" within the network specially at the edge can enable new services by harnessing emerging technologies. But Edge-Compute comes at increased cost that needs to be managed and potentially augmented by creative architecture solutions as there will always a catching-up with the capacity demands. Processing power in smartphones has enhanced YoY and there is increasingly spare compute capacity that can be potentially pooled. Uber has successfully ...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...