Linux networking stack from the ground, part 5

11:30:00 AM
Linux networking stack from the ground, part 5 -

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

Overview

This post picks up where Part 4 is stopped and begins by examining the last part of the IP protocol stack, the transfer to the UDP protocol stack , and finally by the data queue to the queue of a socket so that it can be read by user programs.

ip_rcv_finish

once the net filter had the opportunity to take a look at the package and decide what to do with it, ip_rcv_finish is called.

ip_rcv_finish begins with optimization. In order to deliver the package to place a dst_entry from the routing system is in place. To get one, the code first tries to call early_demux function of the higher level protocol.

The early_demux routine is an optimization that attempts to find the dst_entry needed to deliver the package by checking whether dst_entry is set cover over the socket

this is what that looks like (/ ipv4 / net ip_input.c :. 317):

 if (&& sysctl_ip_early_demux skb_dst (SKB) && skb-> sk ==! NULL) {const struct * net_protocol ipprot; HPI int protocol => Protocol; ipprot = rcu_dereference (inet_protos [protocol]); if (&& ipprot ipprot-> early_demux) {ipprot-> early_demux (SKB) / * Must recharge iph, skb-> head might have changed * / = IPH ip_hdr (SKB) }} 

If optimization is disabled or there is no cache entry (because it is the first UDP packet arriving), the package will be transferred to the routing system in the kernel where dst_entry will be calculated and awarded.

Once the complete routing layer, statistics counters are updated and the function ends by calling dst_input (SKB) , which in turn calls the entry [ function pointer to dst_entry the package structure that was set by the routing system.

If the final destination of the packet is the local system, the routing system will join the ip_local_deliver to input function pointer in the dst_entry structure on the package.

ip_local_deliver and netfilter

Remember how we saw the following model in the IP protocol layer :.

  1. calls ip_rcv do some initial accounts
  2. Packet is transferred to netfilter for treatment, with a pointer to a callback function to run during the finishing treatment.
  3. ip_rcv_finish is the reminder that finished the treatment and continued to work towards pushing the package to the network stack

ip_local_deliver has the same pattern (net / ipv4 / ip_input.c: 242):

 / * * Deliver IP packets on protocol layers above. * / Int ip_local_deliver (struct sk_buff * skb) {/ * * Replace IP fragments. * / If (ip_is_fragment (ip_hdr (SKB))) {if (ip_defrag (skb, IP_DEFRAG_LOCAL_DELIVER)) return 0; } Return NF_HOOK (NFPROTO_IPV4, NF_INET_LOCAL_IN, skb, skb-> dev, NULL, ip_local_deliver_finish); } 

Except that in this case the netfilter chain is NF_INET_LOCAL_IN and okfn to be called in the end is ip_local_deliver_finish .

We examined how the packets are moved through netfilter briefly earlier, so we'll skip to the end ip_local_deliver_finish .

ip_local_deliver_finish

ip_local_deliver_finish gets the packet protocol, looks a net_protocol recorded structure for this protocol, and calls the function pointed by manager in the net_protocol structure.

This gives the packet to the higher level protocol layer.

higher level protocol recording

In our case, we care primarily over UDP, but TCP protocol handlers are stored in the same manner and at the same time

net / ipv4 / af_inet.c :. 1553, we find the structure definitions that contains the handler functions for connecting the UDP, TCP and ICMP protocols in the IP layer:

 static const struct {.early_demux net_protocol tcp_protocol = = tcp_v4_early_demux, .handler = tcp_v4_rcv, .err_handler = tcp_v4_err, .no_policy = 1, .netns_ok = 1}; static const struct {.early_demux net_protocol udp_protocol = = udp_v4_early_demux, .handler = udp_rcv, .err_handler = udp_err, .no_policy = 1, .netns_ok = 1}; static const struct {.handler net_protocol icmp_protocol = = icmp_rcv, .err_handler = icmp_err, .no_policy = 1, .netns_ok = 1}; 

These structures are stored in the initialization code of the inet address family (net /ipv4/af_inet.c:1716):

 / * * Add all basic protocols . * / If (inet_add_protocol (& icmp_protocol, IPPROTO_ICMP) <0) pr_crit ( "% s: Failed to add ICMP  n protocol" __func__); if (inet_add_protocol (& udp_protocol, IPPROTO_UDP) <0) pr_crit ( "% s: Can not add UDP  n", __func__); if (inet_add_protocol (& tcp_protocol, IPPROTO_TCP) <0) pr_crit ( "% s: Failed to add TCP  n", __func __); 

In our case research, we care especially UDP. So let's examine the UDP handler function is called from ip_local_deliver_finish .

As seen in the definition of the above structure, this function is called udp_rcv

UDP

code layer UDP protocol can be found in: .. net / ipv4 / udp.c

udp_rcv

on udp_rcv (net / ipv4 / udp.c: 1954) function one line that calls directly into __ udp4_lib_rcv to manage the receipt of the package

[ __ udp4_lib_rcv

on __ udp4_lib_rcv (net / ipv4 / udp.c: 1708) will check to make sure the package is valid and get the UDP header, UDP datagram length, source address and destination address. Then some additional integrity checks and verifying the checksum.

Recall that earlier in the IP layer, we have seen that the optimization is performed to attach a dst_entry for the package before it is handed off to the protocol upper layer (UDP in our case).

If a socket and corresponding dst_entry found __ udp4_lib_rcv will queue the packet to be received by the socket:

 sk = skb_steal_sock (SKB) if (sk) {struct dst_entry * dst = skb_dst (SKB) int ret; if (unlikely (sk-> sk_rx_dst dst =)!) udp_sk_rx_dst_set (sk, dst); ret = udp_queue_rcv_skb (sk, skb); sock_put (sk); / * A return value> 0 means to resubmit the entrance, but * he wants to return to be -protocol, or 0 * / if (ret> 0) return -ret; return 0; } Else {

If there is no accompanying the decision early_demux operation, a receiving socket will now be watched by calling __ udp4_lib_lookup_skb .

In both cases described above, the datagram will be waiting at the terminal:

 ret = udp_queue_rcv_skb (sk, skb); sock_put (sk); 

If no decision has been found, the datagram will be dropped:

 / * No outlet. Drop packet silently if the checksum is wrong * / if (udp_lib_checksum_complete (SKB)) goto csum_error; UDP_INC_STATS_BH (Clean, UDP_MIB_NOPORTS, proto == IPPROTO_UDPLITE); icmp_send (skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); / * * Hmm. We got a UDP packet to a port to which we * will not listen. Ignore. * / Kfree_skb (SKB) return 0; 

udp_queue_rcv_skb

The initial parts of this function are:

  1. Determine whether the socket associated with a datagram encapsulation outlet. If so, pass the packet to the function of this layer before proceeding manager.
  2. Determine if the datagram is a UDP datagram-Lite and do some integrity checks.
  3. Check the checksum UDP datagram and drop it if the checksum fails

Finally, we come to the logical receiving queue (net / ipv4 / udp.c 1548). that first checks if the receiving queue is full for decision:

 if (sk_rcvqueues_full (sk, skb, sk-> sk_rcvbuf)) goto drop; 

sk_rcvqueues_full and adjusting receive memory queue

The sk_rcvqueues_full function (include / net / sock.h: 788 ) checks the length and decision-making backlog sk_rmem_alloc to determine if the sum is greater than the sk_rcvbuf for decision ( sk-> sk_rcvbuf above):

 / * * take into account the size of the queue and receive queue backlog * do not consider this skb truesize, * to allow even one great package come. * / Static inline bool sk_rcvqueues_full (const struct sock * sk, const struct sk_buff * skb, unsigned int limit) {unsigned int QSize = sk-> sk_backlog.len atomic_read + (& sk-> sk_rmem_alloc); return QSize> limit; } 

Tuning these values ​​is a bit difficult because there are many things that can be adjusted.

The sk-> sk_rcvbuf (called limit in the function) value above can be increased to
net.core.rmem_max . You can define that max by setting the sysctl :. sysctl -w net.core.rmem_max = 8388608

sk-> sk_rcvbuf begins in net.core.rmem_default value, which can also be adjusted by setting the sysctl :. sysctl -w net.core.rmem_default = 8388608

You can also set sk-> sk_rcvbuf size [enappelant setsockopt and from SO_RCVBUF . The maximum you can set with setsockopt is net.core.rmem_max .

You can change the limit SO_RCVBUF calling setsockopt and from SO_RCVBUFFORCE , but the user running the application will need the CAP_NET_ADMIN capacity.

on sk-> sk_rmem_alloc value is incremented by calls to skb_set_owner_r which set the owner of a datagram socket. We shall see later called the UDP layer.

The sk-> sk_backlog.len is incremented by calls to sk_add_backlog , which we will see next.

back to udp_queue_rcv_skb

Once we have verified that the queue is not full, we can continue to the tail of the datagram:

 bh_lock_sock (sk); if rc = __udp_queue_rcv_skb (sk, skb) (sock_owned_by_user (sk)!); else if (sk_add_backlog (sk, skb, sk-> sk_rcvbuf)) {bh_unlock_sock (sk); drop goto; } Bh_unlock_sock (sk); rc return; 

The first step is to determine whether taking currently no system calls against it from a userland program. If it does not, the datagram can be added to the expectation of receiving queue with a call to __ udp_queue_rcv_skb . If it does, the datagram is waiting in the backlog queue.

datagrams on arrears are added to the queue receive when making system calls release the sock with a call to release_sock .

__ udp_queue_rcv_skb

on __ udp_queue_rcv_skb (/ udp.c net / ipv4 1422) adds datagrams to receive queue statistics of waiting and bumps counters if the datagram could not be added to the queue to receive the terminal:

 rc = sock_queue_rcv_skb (sk, skb); if (rc <0) {int = is_udplite IS_UDPLITE (sk); / * Note that error is ENOMEM charged twice * / if (rc == -ENOMEM) UDP_INC_STATS_BH (sock_net (sk), UDP_MIB_RCVBUFERRORS, is_udplite); UDP_INC_STATS_BH (sock_net (sk), UDP_MIB_INERRORS, is_udplite); kfree_skb (SKB) trace_udp_fail_queue_rcv_skb (rc, sk); return -1; } 

To add the datagram to the queue, sock_queue_rcv_skb is called.

sock_queue_rcv_skb

sock_queue_rcv (net / core / sock.c: 388) did some things before adding the datagram to the queue waiting:

  1. memory allocated socket is checked to determine if it exceeds the size of receiving buffer. If so, the number of drops for the socket is incremented.
  2. Next sk_filter is used to treat all of Berkeley filter packet filters that have been applied to the outlet.
  3. sk_rmem_schedule is executed to ensure enough reception buffer space exists to accept this datagram.
  4. Depending on the size of the datagram is responsible for the decision with a call to skb_set_owner_r . This increments sk-> sk_rmem_alloc .
  5. The data is added to the queue with a call to __ skb_queue_tail
  6. Finally, all processes waiting on the data to reach the socket are notified by a call to sk_data_ready function notification manager.

end

This is how the data comes from the ends of the network to the receiving queue for a loan taken to be read by a user process.

Previous
Next Post »
0 Komentar