Linux networking stack from the ground, part 2

10:41:00 PM
Linux networking stack from the ground, part 2 -

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

Overview

This post will pick up where Part 1 is stopped, to start by explaining what ethtool is, how device drivers register code ethtool , how drivers enable NAPI, and how drivers allow interruptions.

ethtool configuration

ethtool is a command line program that you can use to get and set driver
information. You can install it on Ubuntu by running apt-get install ethtool .

Some ethtool parameters of interest are described later in this document.

on ethtool program speaks device drivers using ioctl system call.
device drivers record a series of functions that work for ethtool operations and the core provides the glue.

When ioctl call is made from ethtool , the kernel finds the ethtool structure registered with the appropriate driver and running the registered functions.

e1000e

on ethtool features are installed in the e1000e driver in the PCI probe function (drivers / net / Ethernet / intel / e1000e / netdev.c: 6627)

 e1000e_set_ethtool_ops (netdev); 

which records a function pointers structure for each of the functions supported by ethtool of e1000e (access to statistics, by changing the size of the ring buffers, etc.) from drivers / net / Ethernet / intel / e1000e / ethtool.c :. 2316

igb

on ethtool functions are installed in the igb driver in the PCI probe function (drivers / net / Ethernet / intel / igb / igb_main.c: 2091)

 igb_set_ethtool_ops (netdev); 

De drivers / net / Ethernet / intel / igb / igb_ethtool.c :. 105

ixgbe

The ethtool features are installed in the ixgbe driver in the PCI probe function (drivers / net / Ethernet / intel / ixgbe / ixgbe_main.c: 7883)

 ixgbe_set_ethtool_ops (netdev); 

De drivers / net / Ethernet / intel / ixgbe / ixgbe_ethtool.c :. 7686

tg3

The ethtool features are installed in the tg3 driver in the PCI probe function (drivers / net / Ethernet / broadcom / tg3.c: 17456):

 dev-> = & ethtool_ops tg3_ethtool_ops; 

be2net

on ethtool features are installed in the be2net driver in a function called from the PCI probe function (drivers / net / Ethernet / emulex / benet / be_main.c: 4094)

 SET_ETHTOOL_OPS (netdev, & be_ethtool_ops); 

bnx2

The ethtool features are installed in the bnx2 driver in a function called from the PCI probe function (drivers / net / Ethernet / broadcom / bnx2.c: 8539)

 dev-> = & ethtool_ops bnx2_ethtool_ops; 

NAPI poll

Before the existence of NAPI, NPI generate an interrupt for each packet received indicating that the data is available for processing by the kernel.

NAPI changes this by allowing a device driver to record a function poll that the subsystem NAPI will appeal to collect packets. This method of collecting packets reduced overhead compared to the old method, as many packets can be consumed at one time instead of only handle one packet interruption.

The device driver uses a survey function and records with NAPI using netif_napi_add . When recording a NAPI Poll function with netif_napi_add , the driver will also specify the "weight". Most drivers hardcode a value of 64 . this value and its significance will be described in more detail below.

Generally, drivers record their NAPI survey functions when initializing the driver. In the pilot of this paper reviews the poll function is registered in the probe PCI function itself or a helper function called from there.

e1000e

the e1000e driver registers its NAPI survey function [ in the e1000_probe function (drivers / net / Ethernet / intel / e1000e / netdev.c: 6629)

 netif_napi_add (netdev, and adapter-> Napi, e1000e_poll, 64); 

e1000e saves one NAPI survey function because this device supports only one receiving queue. All other multiple carrier during examination pilots receive queues and call this function to record multiple NAPI Poll functions.

igb

The igb driver registers its NAPI survey function igb_alloc_q_vector (drivers / net / Ethernet / intel / igb / igb_main.c 1180):

 netif_napi_add (adapter-> netdev, & q_vector-> Napi, igb_poll, 64); 

This function is called from igb_alloc_q_vectors . igb_alloc_q_vector is called multiple times to set each of RX and TX queues. igb_alloc_q_vectors is called from igb_init_interrupt_scheme is called from several places, one of which is igb_probe .

ixgbe

The ixgbe driver registers its NAPI survey function ixgbe_alloc_q_vector (drivers / net / Ethernet / intel / ixgbe / ixgbe_lib.c: 813):

 netif_napi_add (adapter-> netdev, & q_vector-> Napi, ixgbe_poll, 64); 

similar to igb , this function is called from ixgbe_alloc_q_vectors and it is called multiple times to set each of RX and TX queues. ixgbe_alloc_q_vectors is called from ixgbe_init_interrupt_scheme is called from several places, once that is ixgbe_probe .

tg3

The tg3 driver registers its NAPI survey function in tg3_napi_init (drivers / net / Ethernet / Broadcom / tg3.c: 7366)

 netif_napi_add (tp-> dev, & tp-> napi [i] .napi, tg3_poll_msix, 64); 

This is called a loop in the recording of a NAPI function to poll each of RX and TX queue. It is known from tg3_start , which is called from tg3_open . Unlike other drivers, tg3 recorded his NAPI poll function in ndo_open and not by PCI probe .

be2net

The be2net driver registers its NAPI survey function be_evt_queues_create (drivers / net / Ethernet / emulex / benet / be_main.c: 2053)

 netif_napi_add (adapter-> netdev, & eqo-> Napi, be_poll, BE_NAPI_WEIGHT); 
This is called a loop recording a NAPI survey function

for each RX and TX queue. It is known from be_setup_queues , which is called from be_setup , which is called from be_probe .

bnx2

The bnx2 driver registers its NAPI survey function in bnx2_init_napi (drivers / net / Ethernet / Broadcom / bnx2.c: 6322)

 netif_napi_add (BP-> dev, & BP> bnx2_napi [i] .napi, poll, 64); 

This is called a loop recording a NAPI function poll for each RX and TX queue. As tg3 bnx2 allocates the queue memory in his ndo_open function bnx2_open and not by PCI probe .

Interrupt number

interrupt number is obtained from [] struct pci_dev [1945012[1945011thestructureandstoredon net_device structure

 netdev-> pdev- irq => iRQ; 

Later in the initialization of the device, the IRQ handlers for IRQ number will be recorded

driver initialization

When a network device is set up (for example, with ifconfig eth0 up ), a open function is called in the device driver. A pointer to this function is installed in an net_device_ops structure in a field called ndo_open . In e1000e, this function is called e1000_open (drivers / net / Ethernet / intel / e1000e / netdev.c: 4241)

The open function will usually do things like :.

  1. Allocate memory RX and TX queue
  2. Enable NAPI
  3. Registers an interrupt handler
  4. Enable hardware interrupts

and more.

file to memory allocation waiting RX and TX

For example, the pilot e1000e (found in drivers / net / Ethernet / intel / e1000e /) in netdev.c around line 4279:

 / * allocate transmit descriptors * / err = e1000e_setup_tx_resources (adapter-> tx_ring); if (err) goto err_setup_tx; / * Allocate receive descriptors * / err = e1000e_setup_rx_resources (adapter-> rx_ring); if (err) goto err_setup_rx; 

The e1000e_setup_rx_resources and e1000e_setup_tx_resources allocate receive and transmit queues and initialize the associated data structures. It is important to note that these queues are read and written directly from the NIC via DMA. In other words: when the data arrives from the network, the data is written directly to the receive queue by the network card via DMA. The size of the queue defaults to E1000_DEFAULT_RXD (256) and the max is E1000_MAX_RXD (4096). These values ​​are specific to the driver.

If the data arrives faster than it can be treated, it will fill the queue. Once the queue is full, additional data that arrives will be dropped.

You can determine whether drops occur and increase the queue size using the command line tool ethtool . ethtool communicates with the device driver using ioctl system call.

Most drivers have a file named ethtool.c or * _ethtool.c implementation of this interface. all drivers not put all possible methods ethtool , so you should check the driver code and ethtool output to determine if what you do is supported by the driver or not.

You can get statistics of ethtool using the -S option, for example:

 ethtool eth0 -S 

the names of the statistics differ drivers, so you should read the release carefully and grep for things like "drop" "miss" and "error"

regarding e1000e is concerned :.

  • on rx_no_buffer_count Statistics (also known as CBRN) indicates that there was no where the DMA packet. Increasing rx ring (explained below) can help reduce the number of rx_no_buffer_count seen over time.
  • on rx_missed_errors statistic indicates that rx_no_buffer_count happened enough times that packets were dropped. increasing the size of the rx queue can help reduce this number

To increase the rx (or tx) size of the queue, you can run :.

 ethtool -G eth0 rx 4096 

to increase rx queue for eth0 to 4096 .

Some cards have multiple queues RX and TX queue to add performance. We shall soon see why having more than one queue to the RX can be beneficial.

You can check if your network card supports multiple queues using ethtool and -l flag

 ethtool eth0 -l 

You can increase the number of queues using -L flag

 -L ethtool eth0 rx 8 

Note that all device drivers do not support this ethtool [function so you may need to see your driver device. source code

Enable NAPI

e1000e NAPI allows calling napi_enable (drivers / net / Ethernet / intel / e1000e / netdev.c: 4332) a static line function (of include / linux / netdevice.h: 500 ):

 napi_enable (& adapter-> napi); 

This simply clears a bit on the condition of napi_struct

Registers an interrupt handler

There different methods of a device may be used to signal an interruption :. MSI-X, MSI, and the legacy interrupts.

the pilot should determine which method is supported by the device and record the appropriate handler function that will execute when the interruption is received.

e1000e the driver tries to register an MSI interrupt handler first -X, falling back to MSI for failure, falling to a legacy interrupt handler if the manager of MSI registration fails.

This logic is abstract in e1000_request_irq is called the initialization driver courses (drivers / net / Ethernet / intel / e1000e / netdev.c: 4303) and can be found in drivers / net / Ethernet / intel / e1000e / netdev.c :. 2132

MSI-X interrupts are the preferred method, especially for NICs that support multiple queues RX and TX. Indeed, every RX and TX queue can have its own hardware interrupt assigned, which can then be processed by a specific processor (with irqbalance or modifying / proc / irq / IRQ_NUMBER / smp_affinity ). This way the arriving packets can be handled by separate processors cut-off level of equipment.

If MSI-X is available, MSI still has advantages over the legacy interrupt (read more here and here).

In e1000e pilot, functions e1000_intr_msix_rx e1000_intr_msi and e1000_intr methods are used interrupt handler for the MSI- X, MSI, and the legacy of interruption modes, respectively.

Manager is registered to get IRQ number when the PCI system called the probe previous position.

for example, registration of the interrupt handler for an MSI interrupt drivers / net / Ethernet / intel / e1000e / netdev.c: 2147:

 err = request_irq (adapter-> pdev-> irq, e1000_intr_msi, 0, netdev-> name netdev); 

Enable interrupts

Finally, once initialization is complete, interrupts are enabled on the device. Incoming packets will now trigger an interrupt to be high, which causes the above registered function must be executed to process the incoming data.

Enabling interrupts is specific device, but on the e1000e e1000_irq_enable The function is called writing a value in a device register to enable interrupts.

Previous
Next Post »
0 Komentar