Linux networking stack from the ground, part 3

7:50:00 PM
Linux networking stack from the ground, part 3 -

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

Overview

This post will pick up where Part 2 is stopped, starting with the description of a package arriving, softirqs examine, and consider how the e1000e pilot packet passes the network stack.

a packet arrives!

So, finally a packet arrives from the network. Assuming that the rx ring buffer has enough space, the packet is written in the ring buffer via DMA and the device raises the interruption that is assigned to (or in the case of MSI-X, IRQ related to the rx queue the packet arrived on).

You can find statistics on the hardware interrupts by checking the / proc / interrupts file.

In general, the interrupt handler that runs when an interrupt is raised should try to delay as much as possible the transformation to occur outside of the interrupt context. This is crucial because if an interrupt is being processed, other interrupts are blocked.

If we look at e1000_intr_msi e1000e function, we can see after a specific code and device solutions bugs materials that interrupt signals of NAPI Manager (drivers / net / Ethernet / intel / e1000e / netdev.c: 1777)

 if (napi_schedule_prep (& adapter-> napi)) {adapter-> total_tx_bytes = 0; adapter-> total_tx_packets = 0; adapter-> total_rx_bytes = 0; adapter-> total_rx_packets = 0; __napi_schedule (& adapter-> napi); } 

This code checks if NAPI is already running and if not, statistical structures are reset and NAPI is scheduled to run to process the packets.

At a high level, NAPI is scheduled to run from the hardware interrupt handler, but the NAPI code that makes the packet processing is executed outside the hardware interrupt context. This is accomplished with softirqs, which will be detailed next

The __ napi_schedule function

 / ** * __napi_schedule - time for receipt @n * :. Entrance to plan * * reception function of the input will be programmed to execute * / void __napi_schedule (struct napi_struct * n) {unsigned long flags; local_irq_save (flags); Napi_schedule ____ (__ & get_cpu_var (softnet_data), n); local_irq_restore (flags); } EXPORT_SYMBOL (__ napi_schedule); 

Annexes of the sampling function NAPI to run. It does this by getting softnet_data current CPU structure and by the way, and the driver provided napi_struct to ____ napi_schedule

 / * Called with irq disabled * / static inline void napi_schedule ____ (struct softnet_data sd *, struct napi_struct * napi) {list_add_tail (& napi-> poll_list, & SD-> poll_list); __raise_softirq_irqoff (NET_RX_SOFTIRQ); } 

adding the pilot survey provided the NAPI structure softnet_data list for the current CPU.

Next, the function calls __ raise_softirq_irqoff (from kernel / softirq.c ) that contains the following code:

 struct task_struct * tsk = __this_cpu_read (ksoftirqd); if (! tsk && tsk-> state = TASK_RUNNING) wake_up_process (tsk); 

raises an important point: the hardware interrupt handler wakes up the process of softirq NAPI on the same processor as the interrupt handler equipment.

softirq

softirq is a mechanism for implementation of the code outside of the context of hardware interrupt handler. As mentioned above, this is important because only minimal work should be done in a hardware interrupt handler; heavy lifting must be left for further processing.

The softirq system is a set of kernel threads, one per CPU, the function handler execution that were recorded for different softirqs.

The softirq threads are started early in the kernel boot process ( `kernel / softirq.c`: 754):

 static int __init spawn_ksoftirqd (void) {register_cpu_notifier (& cpu_nfb ); BUG_ON (smpboot_register_percpu_thread (& softirq_threads)); return 0; } Early_initcall (spawn_ksoftirqd); 

The softirq_threads structure exports some fields, but the two most important are thread_should_run and thread_fn both are called from kernel / smpboot.c .

the thread_should_run function checks for all softirqs waiting and if one or more of the code kernel /smpboot.c calls thread_fn , which, for the softirq system happens to be run_ksoftirqd .

on run_ksoftirqd [function runs the office manager function for each softirq which is ongoing and statistical increases that are in / proc / softirqs .

NAPI and softirq

Recall earlier, we saw that the device driver calls __ napi_schedule who finally called __ raise_softirq_irqoff (NET_RX_SOFTIRQ) ;.

This __ raise_softirq_irqoff function marks the softirq NET_RX_SOFTIRQ as pending and wakes up the thread of softirq on the current CPU to run the NET_RX_SOFTIRQ manager.

NET_RX_SOFTIRQ and NET_TX_SOFTIRQ managers

Early in the initialization code of the networking subsystem (net /core/dev.c:7114) we find the following code:

 open_softirq (NET_TX_SOFTIRQ, net_tx_action); open_softirq (NET_RX_SOFTIRQ, net_rx_action); 

These function calls recording functions net_tx_action and net_rx_action as softirq managers that will be executed when NET_TX_SOFTIRQ and NET_RX_SOFTIRQ softirqs are pending.

treatment rx packet starts

once the code determines that softirq softirq is underway, should be treated, and invokes the net_rx_action function recorded on the NET_RX_SOFTIRQ , packet processing begins.

net_rx_action processing loop

net_rx_action begins processing packets Memory packages were DMA'd taken by the device driver .

iterates functions through the list of NAPI structures that are waiting for the current CPU, dequeuing each structure, one at a time, and operating on it.

the processing loop limits the amount of work and the execution time that can be consumed by probing functions. It does this by keeping track of a "budget" work (which is adjustable) and check the elapsed time (/ core / dev.c net: 4366)

 / * If softirq window exhuasted is then punt. * Allow it to run for 2 jiffies since that will * an average latency of 1.5 / HZ. * / If (unlikely (budget <= 0 || time_after_eq (jiffies, time_limit))) goto softnet_break; 

NAPI also uses a "weight" to prevent individual calls to survey to consume all of the CPU and never complete. The weight is attached to the call to netif_napi_add during initialization of the device driver; remember that it was hardcoded to 64 in the driver.

The weight went on the Poll function from the driver that is registered in NAPI. This amount dictates the poll maximum working can do it before he should return. In this case, it can process up to 64 packets.

The Poll function returns the number of processed packets, which can be less than or equal to the weight. This work is then subtracted from the budget:

 = weight n-> weight; / * This test is NAPI_STATE_SCHED to avoid a race with * poll_napi of netpoll (). * Only the entity obtains the lock and see all NAPI_STATE_SCHED * will actually make -> poll () call. Therefore, we avoid accidentally calling * -> poll () when NAPI is not * / Work = 0; if (test_bit (NAPI_STATE_SCHED, & n-> Status)) {n work => poll (n, w); trace_napi_poll (n); } WARN_ON_ONCE (Working> weight); Budget - = work; local_irq_disable (); 

If the work was equal to the weight, the NAPI structure is moved to the end of the queue and will be examined again. The loop begins again, with the budget and the time check

So you can adjust the number of packets processed during the NAPI polling loop by setting the sysctl netdev_budget :.

 sysctl net.core .netdev_budget = 0 

You can get detailed statistics on the softirq networking system by examining the file / proc / net / softnet_stat which provides information on the number of packages, the number of drops, and the time counter tightening that follows the number of times the budget limit or time have been consumed, but more work was available.

== NAPI poll e1000e_poll

It is up to the device driver to clear the packets that were DMA'd in the ring buffer rx . This is accomplished by survey method called in the code sample above, which in the case of e1000e, is actually a function pointer to the e1000e_poll (drivers / net / Ethernet / intel /e1000e/netdev.c:2638).

[Lepilotee1000e e1000e_poll [function calls a function called via a function pointer clean_rx . It is expected the weight (which was hard-coded to 64 during initialization of the driver), and a location to write the amount of work done (drivers / net / Ethernet / intel / e1000e / netdev.c: 2638)

 adapter-> clean_rx (adapter-> rx_ring, & work_done, weight); 

This function pointer is in e1000_open when the driver is initialized and the device is set up. It is set to the appropriate function based on the MTU

For our purposes, this is the e1000_clean_rx_irq function

e1000_clean_rx_irq - .. unmap regions DMA and transmit data to the stack

on e1000_clean_rx_irq function in a loop, and bursts when the work reaches the past weight in the function.

function unmaps memory regions that the device has DMA'd data. These memory regions are unmapped therefore they can not be written by the device.

Stat counters are incremented to total_rx_bytes and total_rx_packets and some additional memory regions for DMA are added back to the ring rx

Finally e1000_receive_skb is called to replace the skb the

network stack

e1000_receive_skb - .. transmit data to the stack

the function e1000_receive_skb starts a chain of function calls that deal with bookkeeping for things like hardware acceleration vLAN tagging and the generic unloading reception.

The function string name is

  • e1000_receive_skb calls napi_gro_receive
  • napi_gro_receive calls napi_skb_finish
  • napi_skb_finish calls netif_receive_skb

And from netif_receive_skb carried the heavy lifting job. Before we can examine what happens here, we must first describe Receive Packet Steering.

Previous
Next Post »
0 Komentar