Linux networking stack from the ground, part 1 -
Part 1 | Part 2 | Part 3 | Part 4 | Part 5
Purpose
This series of multi-part blog is to describe the path of a packet of the wire through the network driver and the kernel until it reaches the receive queue for the socket. This information includes the Linux kernel, release 3.13.0. Links to the source code on GitHub are provided throughout to help with the context
This document will describe the code throughout the Linux networking stack, and the device driver code following Ethernet :.
- e1000e: Intel PRO / 1000 Linux driver
- igb: Intel Gigabit Linux driver
- ixgbe: Intel driver 10 Gigabit PCI Express Linux
- tg3: Tigon3 Broadcom ethernet driver
- be2net: HP Emulex PCI Express 10 Gigabit Linux driver
- bnx2: Broadcom network driver NX2
other nuclei or drivers will probably similar, but the interior line and functioning detailed numbers will probably be different.
technical data / reference manuals Programmer
driver code can be cryptic, especially when trying to understand the meaning of that bed stastistics counters camera driver. In many cases, referring to the documentation on the device may help to clarify things
WARNING . All PDF files are large. You may or may not want to download them on mobile devices
- e1000e :.
- Intel 82574 Gigabit Ethernet Controller
- Intel 82579 Gigabit Ethernet Controller
- igb:
- Intel Ethernet Server Adapter i350
- Intel Ethernet Server Adapter I210
- ixgbe:
- Intel Ethernet Controller X540
- tg3:
- Broadcom NetXtreme / NetLink BCM5717 BCM5718 BCM5719 BCM5720
- be2net:
- Emulex® OneConnect ™ UCNAs and LightPulse® Fibre Channel HBAs
- bnx2:
- Broadcom NetXtreme II BCM5706 BCM5708S of BCM5708C BCM5709C BCM5709S BCM5716
- Broadcom BCM57XX
Overview
overview of high level of a packet path:
- Driver is loaded and initialized [
- packet arrives at the NIC from the network.
- Packet is copied (via DMA) to a circular buffer in kernel memory.
- hardware interrupt is generated for the system knows a packet is stored.
- Driver calls into NAPI to start a polling loop if it was not running already.
- ksoftirqd processes running on each CPU on the system. They are recorded at startup. The process ksoftirqd pull packets off the ring buffer by calling the NAPI poll function as the registered device driver during initialization.
- memory regions in the ring buffer that have network data written them are unmapped.
- data that was DMA'd memory is transmitted to the network layer as a "skb" for treatment.
- Packet management gets to distribute processing load packets to multiple processors (in leu of a network card with multiple receive queues), if enabled.
- packets are delivered to the protocol layers from the queues.
- protocol layers add them to receive buffers attached to sockets.
detailed look
loading driver / PCI
PCI devices to identify themselves with a series of registers in the PCI configuration space.
When a device driver is compiled, a macro named MODULE_DEVICE_TABLE
is used to export a PCI device ID table identification devices that the device driver can control. The kernel uses this table to determine which device driver to be loaded to control the device.
When the driver is loaded, a function named pci_register_driver
is called in the initialization function.
This function saves a structure of function pointers that the kernel can be used to initialize the PCI device.
e1000e
in the e1000e, this structure can be found in drivers / net / Ethernet / intel / e1000e / netdev.c
around the 7035 line:
pci_driver static struct {.name = = e1000_driver e1000e_driver_name, .id_table = e1000_pci_tbl, .probe e1000_probe = / * more stuff * /}
This is part e1000_init_module
in the same file around line 7043:
/ ** * e1000_init_module - registration routine * * e1000_init_module drivers is the first routine called when the driver is loaded * . All he does is register with the PCI subsystem. ** / Static int __init e1000_init_module (void) {int ret; pr_info ( "Intel (R) PRO / 1000 Network Driver -% s n", e1000e_driver_version); pr_info ( "Copyright (c) 1999-2013 Intel Corporation. n"); ret = pci_register_driver (& e1000_driver); return ret; } Module_init (e1000_init_module);
igb
In the igb driver, this structure can be found in drivers / net / Ethernet / intel / igb / igb_main.c
around the line 238:
pci_driver static struct {.name = = igb_driver igb_driver_name, .id_table = igb_pci_tbl, .probe = igb_probe, .remove = igb_remove, #ifdef CONFIG_PM .driver.pm = & igb_pm_ops, #endif .shutdown = igb_shutdown, .sriov_configure = igb_pci_sriov_configure, .err_handler = & igb_err_handler};
is registered in igb_init_module
in the same file around line 682:
static igb_init_module __init int (void) {int ret; pr_info ( "% s - Version% s n", igb_driver_string, igb_driver_version); pr_info ( "% s n", igb_copyright); #ifdef CONFIG_IGB_DCA dca_register_notify (& dca_notifier); #endif pci_register_driver ret = (& igb_driver); return ret; }
ixgbe
In the ixgbe driver, this structure can be found in drivers / net / Ethernet / intel / ixgbe / ixgbe_main.c
from around the line 8448:
pci_driver static struct {.name = = ixgbe_driver ixgbe_driver_name, .id_table = ixgbe_pci_tbl, .probe = ixgbe_probe, .remove = ixgbe_remove, #ifdef CONFIG_PM .suspend = ixgbe_suspend, .resume = ixgbe_resume , # endif .shutdown = ixgbe_shutdown, .sriov_configure = ixgbe_pci_sriov_configure, .err_handler = & ixgbe_err_handler};
is registered in ixgbe_init_module
in the same file around the 8468 line:
static ixgbe_init_module __init int (void) {int ret; pr_info ( "% s - Version% s n", ixgbe_driver_string, ixgbe_driver_version); pr_info ( "% s n", ixgbe_copyright); ixgbe_dbg_init (); ret = pci_register_driver (& ixgbe_driver); if (ret) {ixgbe_dbg_exit (); return ret; } #ifdef CONFIG_IXGBE_DCA dca_register_notify (& dca_notifier); #endif return 0; }
tg3
In the tg3 driver, this structure can be found in drivers / net / Ethernet / Broadcom / tg3.c
around the 17999 line
pci_driver static struct {.name = = tg3_driver DRV_MODULE_NAME, .id_table = tg3_pci_tbl, .probe = tg3_init_one, .remove = tg3_remove_one, .err_handler = & tg3_err_handler, .driver.pm = & tg3_pm_ops, .shutdown = tg3_shutdown,};
is saved in the same file, using a macro module_pci_driver
(defined in include / linux / pci.h
: 1104) just au below the structure definition:
module_pci_driver (tg3_driver);
be2net
In be2net driver, this structure can be found in drivers / net / Ethernet / emulex / benet / be_main.c
around the line 4819:
pci_driver static struct {.name = = be_driver DRV_NAME, .id_table = be_dev_ids, .probe = be_probe, .remove = be_remove, .suspend = be_suspend, .resume = be_resume, .shutdown = be_shutdown, .err_handler = & be_eeh_handlers};
is registered in be_init_module
in the same file around the 4764 line:
static be_init_module __init int (void) {if (rx_frag_size = 8192 = 4096 && && rx_frag_size rx_frag_size = 2048) {printk (KERN_WARNING DRV_NAME "Module parameters must be rx_frag_size 2048/4096/8192." "Using 2048 n");! rx_frag_size = 2.048; } Return pci_register_driver (& be_driver); }
bnx2
In bnx2 driver, this structure can be found in drivers / net / Ethernet / Broadcom / bnx2.c
around line 8788
pci_driver static struct {.name = = bnx2_pci_driver DRV_MODULE_NAME, .id_table = bnx2_pci_tbl, .probe = bnx2_init_one, .remove = bnx2_remove_one, .driver.pm = BNX2_PM_OPS, .err_handler = & bnx2_err_handler,. shutdown = bnx2_shutdown,};
is saved in the same just below the structure definition file using the module_pci_driver
macro (include / linux / pci.h):
module_pci_driver (bnx2_pci_driver);
probe PCI
Each driver registers a probe
function with the PCI system in the kernel.
The kernel calls this function for fast initialization of the device.
most drivers have a lot of working code to get the device ready for use. The
exact things done vary drivers.
The name of the registered function as a probe function
and a very overview (very) high level
to what is provided below for each pilot
in general, drivers are quite similar in terms of what they do at this stage :.
- The
ethtool
(described in the next parts of this series) driver support functions - The survey NAPI function (described further in the next parts in this series) for the harvest of incoming packets
- MAC NIC address
- the upper level
net_device
structure - hardware IRQ number that will be used by the device when interrupts are (eventually) enabled
- Everything watchdog necessary tasks (eg
e1000e
has a supervisory task to check if the material is suspended) - Other device specific things, such as workarounds or deal with quirks or similar
to dig deeper into what the function of each probe driver, see:
-
e1000_probe
(drivers / net / Ethernet / intel / e1000e / netdev.c: 6517) to e1000e
-
igb_probe
(drivers / net / Ethernet / intel / igb / igb_main.c: 05) igb
-
ixgbe_probe
(drivers / net / Ethernet / intel / ixgbe / ixgbe_main.c: 7796) to ixgbe
-
tg3_init_one
(drivers / net / Ethernet / broadcom / tg3.c: 17315) for tg3
-
be_probe
(drivers / net / Ethernet / emulex / benet / be_main.c: 4501) to be2net
-
bnx2_init_one
(drivers / net / Ethernet / broadcom / bnx2.c: 8517) to bnx2