Part 1 | Part 2 | Part 3 | Part 4 | Part 5
Overview
This post picks up where Part 4 is stopped and begins by examining the last part of the IP protocol stack, the transfer to the UDP protocol stack , and finally by the data queue to the queue of a socket so that it can be read by user programs.
ip_rcv_finish
once the net filter had the opportunity to take a look at the package and decide what to do with it, ip_rcv_finish
is called.
ip_rcv_finish
begins with optimization. In order to deliver the package to place a dst_entry
from the routing system is in place. To get one, the code first tries to call early_demux
function of the higher level protocol.
The early_demux
routine is an optimization that attempts to find the dst_entry
needed to deliver the package by checking whether dst_entry
is set cover over the socket
this is what that looks like (/ ipv4 / net ip_input.c :. 317):
if (&& sysctl_ip_early_demux skb_dst (SKB) && skb-> sk ==! NULL) {const struct * net_protocol ipprot; HPI int protocol => Protocol; ipprot = rcu_dereference (inet_protos [protocol]); if (&& ipprot ipprot-> early_demux) {ipprot-> early_demux (SKB) / * Must recharge iph, skb-> head might have changed * / = IPH ip_hdr (SKB) }}
If optimization is disabled or there is no cache entry (because it is the first UDP packet arriving), the package will be transferred to the routing system in the kernel where dst_entry
will be calculated and awarded.
Once the complete routing layer, statistics counters are updated and the function ends by calling dst_input (SKB)
, which in turn calls the entry [
function pointer to dst_entry
the package structure that was set by the routing system.
If the final destination of the packet is the local system, the routing system will join the ip_local_deliver
to input
function pointer in the dst_entry
structure on the package.
ip_local_deliver
and netfilter
Remember how we saw the following model in the IP protocol layer :.
- calls
ip_rcv
do some initial accounts - Packet is transferred to netfilter for treatment, with a pointer to a callback function to run during the finishing treatment.
-
ip_rcv_finish
is the reminder that finished the treatment and continued to work towards pushing the package to the network stack
ip_local_deliver
has the same pattern (net / ipv4 / ip_input.c: 242):
/ * * Deliver IP packets on protocol layers above. * / Int ip_local_deliver (struct sk_buff * skb) {/ * * Replace IP fragments. * / If (ip_is_fragment (ip_hdr (SKB))) {if (ip_defrag (skb, IP_DEFRAG_LOCAL_DELIVER)) return 0; } Return NF_HOOK (NFPROTO_IPV4, NF_INET_LOCAL_IN, skb, skb-> dev, NULL, ip_local_deliver_finish); }
Except that in this case the netfilter chain is NF_INET_LOCAL_IN
and okfn
to be called in the end is ip_local_deliver_finish
.
We examined how the packets are moved through netfilter briefly earlier, so we'll skip to the end ip_local_deliver_finish
.
ip_local_deliver_finish
ip_local_deliver_finish
gets the packet protocol, looks a net_protocol
recorded structure for this protocol, and calls the function pointed by manager
in the net_protocol
structure.
This gives the packet to the higher level protocol layer.
higher level protocol recording
In our case, we care primarily over UDP, but TCP protocol handlers are stored in the same manner and at the same time
net / ipv4 / af_inet.c :. 1553, we find the structure definitions that contains the handler functions for connecting the UDP, TCP and ICMP protocols in the IP layer:
static const struct {.early_demux net_protocol tcp_protocol = = tcp_v4_early_demux, .handler = tcp_v4_rcv, .err_handler = tcp_v4_err, .no_policy = 1, .netns_ok = 1}; static const struct {.early_demux net_protocol udp_protocol = = udp_v4_early_demux, .handler = udp_rcv, .err_handler = udp_err, .no_policy = 1, .netns_ok = 1}; static const struct {.handler net_protocol icmp_protocol = = icmp_rcv, .err_handler = icmp_err, .no_policy = 1, .netns_ok = 1};
These structures are stored in the initialization code of the inet address family (net /ipv4/af_inet.c:1716):
/ * * Add all basic protocols . * / If (inet_add_protocol (& icmp_protocol, IPPROTO_ICMP) <0) pr_crit ( "% s: Failed to add ICMP n protocol" __func__); if (inet_add_protocol (& udp_protocol, IPPROTO_UDP) <0) pr_crit ( "% s: Can not add UDP n", __func__); if (inet_add_protocol (& tcp_protocol, IPPROTO_TCP) <0) pr_crit ( "% s: Failed to add TCP n", __func __);
In our case research, we care especially UDP. So let's examine the UDP handler function
is called from ip_local_deliver_finish
.
As seen in the definition of the above structure, this function is called udp_rcv
UDP
code layer UDP protocol can be found in: .. net / ipv4 / udp.c
udp_rcv
on udp_rcv
(net / ipv4 / udp.c: 1954) function one line that calls directly into __ udp4_lib_rcv
to manage the receipt of the package
[ __ udp4_lib_rcv
on __ udp4_lib_rcv
(net / ipv4 / udp.c: 1708) will check to make sure the package is valid and get the UDP header, UDP datagram length, source address and destination address. Then some additional integrity checks and verifying the checksum.
Recall that earlier in the IP layer, we have seen that the optimization is performed to attach a dst_entry
for the package before it is handed off to the protocol upper layer (UDP in our case).
If a socket and corresponding dst_entry
found __ udp4_lib_rcv
will queue the packet to be received by the socket:
sk = skb_steal_sock (SKB) if (sk) {struct dst_entry * dst = skb_dst (SKB) int ret; if (unlikely (sk-> sk_rx_dst dst =)!) udp_sk_rx_dst_set (sk, dst); ret = udp_queue_rcv_skb (sk, skb); sock_put (sk); / * A return value> 0 means to resubmit the entrance, but * he wants to return to be -protocol, or 0 * / if (ret> 0) return -ret; return 0; } Else {
If there is no accompanying the decision early_demux
operation, a receiving socket will now be watched by calling __ udp4_lib_lookup_skb
.
In both cases described above, the datagram will be waiting at the terminal:
ret = udp_queue_rcv_skb (sk, skb); sock_put (sk);
If no decision has been found, the datagram will be dropped:
/ * No outlet. Drop packet silently if the checksum is wrong * / if (udp_lib_checksum_complete (SKB)) goto csum_error; UDP_INC_STATS_BH (Clean, UDP_MIB_NOPORTS, proto == IPPROTO_UDPLITE); icmp_send (skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); / * * Hmm. We got a UDP packet to a port to which we * will not listen. Ignore. * / Kfree_skb (SKB) return 0;
udp_queue_rcv_skb
The initial parts of this function are:
- Determine whether the socket associated with a datagram encapsulation outlet. If so, pass the packet to the function of this layer before proceeding manager.
- Determine if the datagram is a UDP datagram-Lite and do some integrity checks.
- Check the checksum UDP datagram and drop it if the checksum fails
Finally, we come to the logical receiving queue (net / ipv4 / udp.c 1548). that first checks if the receiving queue is full for decision:
if (sk_rcvqueues_full (sk, skb, sk-> sk_rcvbuf)) goto drop;
sk_rcvqueues_full
and adjusting receive memory queue
The sk_rcvqueues_full
function (include / net / sock.h: 788 ) checks the length and decision-making backlog sk_rmem_alloc
to determine if the sum is greater than the sk_rcvbuf
for decision ( sk-> sk_rcvbuf
above):
/ * * take into account the size of the queue and receive queue backlog * do not consider this skb truesize, * to allow even one great package come. * / Static inline bool sk_rcvqueues_full (const struct sock * sk, const struct sk_buff * skb, unsigned int limit) {unsigned int QSize = sk-> sk_backlog.len atomic_read + (& sk-> sk_rmem_alloc); return QSize> limit; }
Tuning these values is a bit difficult because there are many things that can be adjusted.
The sk-> sk_rcvbuf
(called limit
in the function) value above can be increased to
net.core.rmem_max
. You can define that max by setting the sysctl
:. sysctl -w net.core.rmem_max = 8388608
sk-> sk_rcvbuf
begins in net.core.rmem_default
value, which can also be adjusted by setting the sysctl
:. sysctl -w net.core.rmem_default = 8388608
You can also set sk-> sk_rcvbuf
size [enappelant setsockopt
and from SO_RCVBUF
. The maximum you can set with setsockopt
is net.core.rmem_max
.
You can change the limit SO_RCVBUF
calling setsockopt
and from SO_RCVBUFFORCE
, but the user running the application will need the CAP_NET_ADMIN
capacity.
on sk-> sk_rmem_alloc
value is incremented by calls to skb_set_owner_r
which set the owner of a datagram socket. We shall see later called the UDP layer.
The sk-> sk_backlog.len
is incremented by calls to sk_add_backlog
, which we will see next.
back to udp_queue_rcv_skb
Once we have verified that the queue is not full, we can continue to the tail of the datagram:
bh_lock_sock (sk); if rc = __udp_queue_rcv_skb (sk, skb) (sock_owned_by_user (sk)!); else if (sk_add_backlog (sk, skb, sk-> sk_rcvbuf)) {bh_unlock_sock (sk); drop goto; } Bh_unlock_sock (sk); rc return;
The first step is to determine whether taking currently no system calls against it from a userland program. If it does not, the datagram can be added to the expectation of receiving queue with a call to __ udp_queue_rcv_skb
. If it does, the datagram is waiting in the backlog queue.
datagrams on arrears are added to the queue receive when making system calls release the sock with a call to release_sock
.
__ udp_queue_rcv_skb
on __ udp_queue_rcv_skb
(/ udp.c net / ipv4 1422) adds datagrams to receive queue statistics of waiting and bumps counters if the datagram could not be added to the queue to receive the terminal:
rc = sock_queue_rcv_skb (sk, skb); if (rc <0) {int = is_udplite IS_UDPLITE (sk); / * Note that error is ENOMEM charged twice * / if (rc == -ENOMEM) UDP_INC_STATS_BH (sock_net (sk), UDP_MIB_RCVBUFERRORS, is_udplite); UDP_INC_STATS_BH (sock_net (sk), UDP_MIB_INERRORS, is_udplite); kfree_skb (SKB) trace_udp_fail_queue_rcv_skb (rc, sk); return -1; }
To add the datagram to the queue, sock_queue_rcv_skb
is called.
sock_queue_rcv_skb
sock_queue_rcv
(net / core / sock.c: 388) did some things before adding the datagram to the queue waiting:
- memory allocated socket is checked to determine if it exceeds the size of receiving buffer. If so, the number of drops for the socket is incremented.
- Next
sk_filter
is used to treat all of Berkeley filter packet filters that have been applied to the outlet. -
sk_rmem_schedule
is executed to ensure enough reception buffer space exists to accept this datagram. - Depending on the size of the datagram is responsible for the decision with a call to
skb_set_owner_r
. This incrementssk-> sk_rmem_alloc
. - The data is added to the queue with a call to
__ skb_queue_tail
- Finally, all processes waiting on the data to reach the socket are notified by a call to
sk_data_ready
function notification manager.
end
This is how the data comes from the ends of the network to the receiving queue for a loan taken to be read by a user process.
0 Komentar