Part 1 | Part 2 | Part 3 | Part 4 | Part 5
Overview
This post will pick up where Part 1 is stopped, to start by explaining what ethtool
is, how device drivers register code ethtool
, how drivers enable NAPI, and how drivers allow interruptions.
ethtool configuration
ethtool
is a command line program that you can use to get and set driver
information. You can install it on Ubuntu by running apt-get install ethtool
.
Some ethtool
parameters of interest are described later in this document.
on ethtool
program speaks device drivers using ioctl
system call.
device drivers record a series of functions that work for ethtool
operations and the core provides the glue.
When ioctl
call is made from ethtool
, the kernel finds the ethtool
structure registered with the appropriate driver and running the registered functions.
e1000e
on ethtool
features are installed in the e1000e
driver in the PCI probe
function (drivers / net / Ethernet / intel / e1000e / netdev.c: 6627)
e1000e_set_ethtool_ops (netdev);
which records a function pointers structure for each of the functions supported by ethtool of e1000e
(access to statistics, by changing the size of the ring buffers, etc.) from drivers / net / Ethernet / intel / e1000e / ethtool.c :. 2316
igb
on ethtool
functions are installed in the igb
driver in the PCI probe
function (drivers / net / Ethernet / intel / igb / igb_main.c: 2091)
igb_set_ethtool_ops (netdev);
De drivers / net / Ethernet / intel / igb / igb_ethtool.c :. 105
ixgbe
The ethtool
features are installed in the ixgbe
driver in the PCI probe
function (drivers / net / Ethernet / intel / ixgbe / ixgbe_main.c: 7883)
ixgbe_set_ethtool_ops (netdev);
De drivers / net / Ethernet / intel / ixgbe / ixgbe_ethtool.c :. 7686
tg3
The ethtool
features are installed in the tg3
driver in the PCI probe
function (drivers / net / Ethernet / broadcom / tg3.c: 17456):
dev-> = & ethtool_ops tg3_ethtool_ops;
be2net
on ethtool
features are installed in the be2net
driver in a function called from the PCI probe
function (drivers / net / Ethernet / emulex / benet / be_main.c: 4094)
SET_ETHTOOL_OPS (netdev, & be_ethtool_ops);
bnx2
The ethtool
features are installed in the bnx2
driver in a function called from the PCI probe
function (drivers / net / Ethernet / broadcom / bnx2.c: 8539)
dev-> = & ethtool_ops bnx2_ethtool_ops;
NAPI poll
Before the existence of NAPI, NPI generate an interrupt for each packet received indicating that the data is available for processing by the kernel.
NAPI changes this by allowing a device driver to record a function poll
that the subsystem NAPI will appeal to collect packets. This method of collecting packets reduced overhead compared to the old method, as many packets can be consumed at one time instead of only handle one packet interruption.
The device driver uses a survey
function and records with NAPI using netif_napi_add
. When recording a NAPI Poll function
with netif_napi_add
, the driver will also specify the "weight". Most drivers hardcode a value of 64
. this value and its significance will be described in more detail below.
Generally, drivers record their NAPI survey
functions when initializing the driver. In the pilot of this paper reviews the poll function
is registered in the probe PCI function
itself or a helper function called from there.
e1000e
the e1000e
driver registers its NAPI survey function [
in the e1000_probe
function (drivers / net / Ethernet / intel / e1000e / netdev.c: 6629)
netif_napi_add (netdev, and adapter-> Napi, e1000e_poll, 64);
e1000e
saves one NAPI survey function
because this device supports only one receiving queue. All other multiple carrier during examination pilots receive queues and call this function to record multiple NAPI Poll
functions.
igb
The igb
driver registers its NAPI survey
function igb_alloc_q_vector
(drivers / net / Ethernet / intel / igb / igb_main.c 1180):
netif_napi_add (adapter-> netdev, & q_vector-> Napi, igb_poll, 64);
This function is called from igb_alloc_q_vectors
. igb_alloc_q_vector
is called multiple times to set each of RX and TX queues. igb_alloc_q_vectors
is called from igb_init_interrupt_scheme
is called from several places, one of which is igb_probe
.
ixgbe
The ixgbe
driver registers its NAPI survey
function ixgbe_alloc_q_vector
(drivers / net / Ethernet / intel / ixgbe / ixgbe_lib.c: 813):
netif_napi_add (adapter-> netdev, & q_vector-> Napi, ixgbe_poll, 64);
similar to igb
, this function is called from ixgbe_alloc_q_vectors
and it is called multiple times to set each of RX and TX queues. ixgbe_alloc_q_vectors
is called from ixgbe_init_interrupt_scheme
is called from several places, once that is ixgbe_probe
.
tg3
The tg3
driver registers its NAPI survey function
in tg3_napi_init
(drivers / net / Ethernet / Broadcom / tg3.c: 7366)
netif_napi_add (tp-> dev, & tp-> napi [i] .napi, tg3_poll_msix, 64);
This is called a loop in the recording of a NAPI function
to poll each of RX and TX queue. It is known from tg3_start
, which is called from tg3_open
. Unlike other drivers, tg3
recorded his NAPI poll function
in ndo_open
and not by PCI probe
.
be2net
The be2net
driver registers its NAPI survey
function be_evt_queues_create
(drivers / net / Ethernet / emulex / benet / be_main.c: 2053)
netif_napi_add (adapter-> netdev, & eqo-> Napi, be_poll, BE_NAPI_WEIGHT);This is called a loop recording a NAPI
survey function
for each RX and TX queue. It is known from be_setup_queues
, which is called from be_setup
, which is called from be_probe
.
bnx2
The bnx2
driver registers its NAPI survey function
in bnx2_init_napi
(drivers / net / Ethernet / Broadcom / bnx2.c: 6322)
netif_napi_add (BP-> dev, & BP> bnx2_napi [i] .napi, poll, 64);
This is called a loop recording a NAPI function poll
for each RX and TX queue. As tg3
bnx2
allocates the queue memory in his ndo_open function
bnx2_open
and not by PCI probe
.
Interrupt number
interrupt number is obtained from [] struct pci_dev [1945012[1945011thestructureandstoredon net_device
structure
netdev-> pdev- irq => iRQ;
Later in the initialization of the device, the IRQ handlers for IRQ number will be recorded
driver initialization
When a network device is set up (for example, with ifconfig eth0 up
), a open function
is called in the device driver. A pointer to this function is installed in an net_device_ops
structure in a field called ndo_open
. In e1000e, this function is called e1000_open
(drivers / net / Ethernet / intel / e1000e / netdev.c: 4241)
The open function will usually do things like :.
- Allocate memory RX and TX queue
- Enable NAPI
- Registers an interrupt handler
- Enable hardware interrupts
and more.
file to memory allocation waiting RX and TX
For example, the pilot e1000e (found in drivers / net / Ethernet / intel / e1000e /) in netdev.c
around line 4279:
/ * allocate transmit descriptors * / err = e1000e_setup_tx_resources (adapter-> tx_ring); if (err) goto err_setup_tx; / * Allocate receive descriptors * / err = e1000e_setup_rx_resources (adapter-> rx_ring); if (err) goto err_setup_rx;
The e1000e_setup_rx_resources
and e1000e_setup_tx_resources
allocate receive and transmit queues and initialize the associated data structures. It is important to note that these queues are read and written directly from the NIC via DMA. In other words: when the data arrives from the network, the data is written directly to the receive queue by the network card via DMA. The size of the queue defaults to E1000_DEFAULT_RXD
(256) and the max is E1000_MAX_RXD
(4096). These values are specific to the driver.
If the data arrives faster than it can be treated, it will fill the queue. Once the queue is full, additional data that arrives will be dropped.
You can determine whether drops occur and increase the queue size using the command line tool ethtool
. ethtool
communicates with the device driver using ioctl
system call.
Most drivers have a file named ethtool.c
or * _ethtool.c
implementation of this interface. all drivers not put all possible methods ethtool
, so you should check the driver code and ethtool
output to determine if what you do is supported by the driver or not.
You can get statistics of ethtool
using the -S option, for example:
ethtool eth0 -S
the names of the statistics differ drivers, so you should read the release carefully and grep for things like "drop" "miss" and "error"
regarding e1000e is concerned :.
- on
rx_no_buffer_count
Statistics (also known as CBRN) indicates that there was no where the DMA packet. Increasing rx ring (explained below) can help reduce the number ofrx_no_buffer_count
seen over time. - on
rx_missed_errors
statistic indicates thatrx_no_buffer_count
happened enough times that packets were dropped. increasing the size of the rx queue can help reduce this number
To increase the rx (or tx) size of the queue, you can run :.
ethtool -G eth0 rx 4096
to increase rx queue for eth0
to 4096
.
Some cards have multiple queues RX and TX queue to add performance. We shall soon see why having more than one queue to the RX can be beneficial.
You can check if your network card supports multiple queues using ethtool
and -l
flag
ethtool eth0 -l
You can increase the number of queues using -L
flag
-L ethtool eth0 rx 8
Note that all device drivers do not support this ethtool [function
so you may need to see your driver device. source code
Enable NAPI
e1000e NAPI allows calling napi_enable
(drivers / net / Ethernet / intel / e1000e / netdev.c: 4332) a static line function (of include / linux / netdevice.h: 500
):
napi_enable (& adapter-> napi);
This simply clears a bit on the condition
of napi_struct
Registers an interrupt handler
There different methods of a device may be used to signal an interruption :. MSI-X, MSI, and the legacy interrupts.
the pilot should determine which method is supported by the device and record the appropriate handler function that will execute when the interruption is received.
e1000e the driver tries to register an MSI interrupt handler first -X, falling back to MSI for failure, falling to a legacy interrupt handler if the manager of MSI registration fails.
This logic is abstract in e1000_request_irq
is called the initialization driver courses (drivers / net / Ethernet / intel / e1000e / netdev.c: 4303) and can be found in drivers / net / Ethernet / intel / e1000e / netdev.c :. 2132
MSI-X interrupts are the preferred method, especially for NICs that support multiple queues RX and TX. Indeed, every RX and TX queue can have its own hardware interrupt assigned, which can then be processed by a specific processor (with irqbalance or modifying / proc / irq / IRQ_NUMBER / smp_affinity
). This way the arriving packets can be handled by separate processors cut-off level of equipment.
If MSI-X is available, MSI still has advantages over the legacy interrupt (read more here and here).
In e1000e pilot, functions e1000_intr_msix_rx
e1000_intr_msi
and e1000_intr
methods are used interrupt handler for the MSI- X, MSI, and the legacy of interruption modes, respectively.
Manager is registered to get IRQ number when the PCI system called the probe
previous position.
for example, registration of the interrupt handler for an MSI interrupt drivers / net / Ethernet / intel / e1000e / netdev.c: 2147:
err = request_irq (adapter-> pdev-> irq, e1000_intr_msi, 0, netdev-> name netdev);
Enable interrupts
Finally, once initialization is complete, interrupts are enabled on the device. Incoming packets will now trigger an interrupt to be high, which causes the above registered function must be executed to process the incoming data.
Enabling interrupts is specific device, but on the e1000e e1000_irq_enable
The function is called writing a value in a device register to enable interrupts.
0 Komentar