linux路由表之route
程序员文章站
2022-06-02 18:56:14
...
1. 前言
https://blog.csdn.net/vevenlcf/article/details/48026965
描述了主机路由和网络路由的区别:https://blog.csdn.net/buhuiguowang/article/details/81026050
2. route 命令参数
[email protected]:~/workspace/DCU-LEDE$ man route > log.txt
[email protected]:~/workspace/DCU-LEDE$ cat log.txt
ROUTE(8) Linux System Administrator's Manual ROUTE(8)
NAME
route - show / manipulate the IP routing table
SYNOPSIS
route [-CFvnNee] [-A family |-4|-6]
route [-v] [-A family |-4|-6] add [-net|-host] target [netmask Nm]
[gw Gw] [metric N] [mss M] [window W] [irtt I] [reject] [mod]
[dyn] [reinstate] [[dev] If]
route [-v] [-A family |-4|-6] del [-net|-host] target [gw Gw] [net‐
mask Nm] [metric M] [[dev] If]
route [-V] [--version] [-h] [--help]
DESCRIPTION
Route manipulates the kernel's IP routing tables. Its primary use is
to set up static routes to specific hosts or networks via an interface
after it has been configured with the ifconfig(8) program.
When the add or del options are used, route modifies the routing
tables. Without these options, route displays the current contents of
the routing tables.
OPTIONS
-A family
use the specified address family (eg `inet'). Use route --help
for a full list. You can use -6 as an alias for --inet6 and -4
as an alias for -A inet
-F operate on the kernel's FIB (Forwarding Information Base) rout‐
ing table. This is the default.
-C operate on the kernel's routing cache.
-v select verbose operation.
-n show numerical addresses instead of trying to determine sym‐
bolic host names. This is useful if you are trying to determine
why the route to your nameserver has vanished.
-e use netstat(8)-format for displaying the routing table. -ee
will generate a very long line with all parameters from the
routing table.
del delete a route.
add add a new route.
target the destination network or host. You can provide an addresses
or symbolic network or host name. Optionally you can use /pre‐
fixlen notation instead of using the netmask option.
-net the target is a network.
-host the target is a host.
netmask NM
when adding a network route, the netmask to be used.
gw GW route packets via a gateway.
NOTE: The specified gateway must be reachable first. This usu‐
ally means that you have to set up a static route to the gate‐
way beforehand. If you specify the address of one of your local
interfaces, it will be used to decide about the interface to
which the packets should be routed to. This is a BSDism compat‐
ibility hack.
metric M
set the metric field in the routing table (used by routing dae‐
mons) to M. If this option is not specified the metric for
inet6 (IPv6) address family defaults to '1', for inet (IPv4) it
defaults to '0'. You should always specify an explicit metric
value to not rely on those defaults - they also differ from
iproute2.
mss M sets MTU (Maximum Transmission Unit) of the route to M bytes.
Note that the current implementation of the route command does
not allow the option to set the Maximum Segment Size (MSS).
window W
set the TCP window size for connections over this route to W
bytes. This is typically only used on AX.25 networks and with
drivers unable to handle back to back frames.
irtt I set the initial round trip time (irtt) for TCP connections over
this route to I milliseconds (1-12000). This is typically only
used on AX.25 networks. If omitted the RFC 1122 default of
300ms is used.
reject install a blocking route, which will force a route lookup to
fail. This is for example used to mask out networks before
using the default route. This is NOT for firewalling.
mod, dyn, reinstate
install a dynamic or modified route. These flags are for diag‐
nostic purposes, and are generally only set by routing daemons.
dev If force the route to be associated with the specified device, as
the kernel will otherwise try to determine the device on its
own (by checking already existing routes and device specifica‐
tions, and where the route is added to). In most normal net‐
works you won't need this.
If dev If is the last option on the command line, the word dev
may be omitted, as it's the default. Otherwise the order of the
route modifiers (metric netmask gw dev) doesn't matter.
EXAMPLES
route add -net 127.0.0.0 netmask 255.0.0.0 metric 1024 dev lo
adds the normal loopback entry, using netmask 255.0.0.0 and
associated with the "lo" device (assuming this device was pre‐
viously set up correctly with ifconfig(8)).
route add -net 192.56.76.0 netmask 255.255.255.0 metric 1024 dev eth0
adds a route to the local network 192.56.76.x via "eth0". The
word "dev" can be omitted here.
route del default
deletes the current default route, which is labeled "default"
or 0.0.0.0 in the destination field of the current routing ta‐
ble.
route del -net 192.56.76.0 netmask 255.255.255.0
deletes the route. Since the Linux routing kernel uses class‐
less addressing, you pretty much always have to specify the
netmask that is same as as seen in 'route -n' listing.
route add default gw mango
adds a default route (which will be used if no other route
matches). All packets using this route will be gatewayed
through the address of a node named "mango". The device which
will actually be used for that route depends on how we can
reach "mango" - "mango" must be on directly reachable route.
route add mango sl0
Adds the route to the host named "mango" via the SLIP interface
(assuming that "mango" is the SLIP host).
route add -net 192.57.66.0 netmask 255.255.255.0 gw mango
This command adds the net "192.57.66.x" to be gatewayed through
the former route to the SLIP interface.
route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0
This is an obscure one documented so people know how to do it.
This sets all of the class D (multicast) IP routes to go via
"eth0". This is the correct normal configuration line with a
multicasting kernel.
route add -net 10.0.0.0 netmask 255.0.0.0 metric 1024 reject
This installs a rejecting route for the private network
"10.x.x.x."
route -6 add 2001:0002::/48 metric 1 dev eth0
This adds a IPv6 route with the specified metric to be directly
reachable via eth0.
OUTPUT
The output of the kernel routing table is organized in the following
columns
Destination
The destination network or destination host.
Gateway
The gateway address or '*' if none set.
Genmask
The netmask for the destination net; '255.255.255.255' for a
host destination and '0.0.0.0' for the default route.
Flags Possible flags include
U (route is up)
H (target is a host)
G (use gateway)
R (reinstate route for dynamic routing)
D (dynamically installed by daemon or redirect)
M (modified from routing daemon or redirect)
A (installed by addrconf)
C (cache entry)
! (reject route)
Metric The 'distance' to the target (usually counted in hops).
Ref Number of references to this route. (Not used in the Linux ker‐
nel.)
Use Count of lookups for the route. Depending on the use of -F and
-C this will be either route cache misses (-F) or hits (-C).
Iface Interface to which packets for this route will be sent.
MSS Default maximum segment size for TCP connections over this
route.
Window Default window size for TCP connections over this route.
irtt Initial RTT (Round Trip Time). The kernel uses this to guess
about the best TCP protocol parameters without waiting on (pos‐
sibly slow) answers.
HH (cached only)
The number of ARP entries and cached routes that refer to the
hardware header cache for the cached route. This will be -1 if
a hardware address is not needed for the interface of the
cached route (e.g. lo).
Arp (cached only)
Whether or not the hardware address for the cached route is up
to date.
FILES
/proc/net/ipv6_route
/proc/net/route
/proc/net/rt_cache
SEE ALSO
ifconfig(8), netstat(8), arp(8), rarp(8), ip(8)
HISTORY
Route for Linux was originally written by Fred N. van Kempen,
<[email protected]> and then modified by Johannes Stille and
Linus Torvalds for pl15. Alan Cox added the mss and window options for
Linux 1.1.22. irtt support and merged with netstat from Bernd Ecken‐
fels.
AUTHOR
Currently maintained by Phil Blundell <[email protected]> and
Bernd Eckenfels <[email protected]>.
net-tools 2014-02-17 ROUTE(8)
3. route源码分析之busybox
在shell窗口配置如下路由信息时
添加到主机的路由
# route add -host 192.168.1.2 dev eth0
# route add -host 10.20.30.148 gw 10.20.30.40 #添加到10.20.30.148的网管
添加到网络的路由
# route add -net 10.20.30.40 netmask 255.255.255.248 eth0 #添加10.20.30.40的网络
# route add -net 10.20.30.48 netmask 255.255.255.248 gw 10.20.30.41 #添加10.20.30.48的网络
# route add -net 192.168.1.0/24 eth1
添加默认路由
# route add default gw 192.168.1.1
删除路由
# route del -host 192.168.1.2 dev eth0:0
# route del -host 10.20.30.148 gw 10.20.30.40
# route del -net 10.20.30.40 netmask 255.255.255.248 eth0
# route del -net 10.20.30.48 netmask 255.255.255.248 gw 10.20.30.41
# route del -net 192.168.1.0/24 eth1
# route del default gw 192.168.1.1
将调用busybox内部的源码route.c
int route_main(int argc UNUSED_PARAM, char **argv)
{
unsigned opt;
int what;
char *family;
char **p;
/*
route add -net 192.56.76.0 netmask 255.255.255.0 dev eth0 //添加一条静态路由
route add default gw 192.168.0.1 //添加默认路由
route del -net 192.168.1.0/24 gw 192.168.0.1 //删除一条路由
route -n //查看路由表
*/
/* First, remap '-net' and '-host' to avoid getopt problems. */
p = argv;
while (*++p) {
if (strcmp(*p, "-net") == 0 || strcmp(*p, "-host") == 0) {
p[0][0] = '#';
}
}
opt = getopt32(argv, "A:ne", &family);
if ((opt & ROUTE_OPT_A) && strcmp(family, "inet") != 0) {
#if ENABLE_FEATURE_IPV6
if (strcmp(family, "inet6") == 0) {
opt |= ROUTE_OPT_INET6; /* Set flag for ipv6. */
} else
#endif
bb_show_usage();
}
argv += optind;
/* No more args means display the routing table. */
if (!*argv) { //表示输入的命令是:route 即显示所有路由信息
int noresolve = (opt & ROUTE_OPT_n) ? 0x0fff : 0;
#if ENABLE_FEATURE_IPV6
if (opt & ROUTE_OPT_INET6)
INET6_displayroutes();
else
#endif
bb_displayroutes(noresolve, opt & ROUTE_OPT_e);
fflush_stdout_and_exit(EXIT_SUCCESS);
}
/* Check verb. At the moment, must be add, del, or delete. */
what = kw_lookup(tbl_verb, &argv);
if (!what || !*argv) { /* Unknown verb or no more args. */
bb_show_usage();
}
#if ENABLE_FEATURE_IPV6
if (opt & ROUTE_OPT_INET6)
INET6_setroute(what, argv);
else
#endif
INET_setroute(what, argv); //what meas: add del delete
return EXIT_SUCCESS;
}
由于是IPV4,所以在上面的函数将调用INET_setroute()
static NOINLINE void INET_setroute(int action, char **args)
{
/* char buffer instead of bona-fide struct avoids aliasing warning */
char rt_buf[sizeof(struct rtentry)];
struct rtentry *const rt = (void *)rt_buf;
const char *netmask = NULL;
int skfd, isnet, xflag;
...
/* Create a socket to the INET kernel. */
skfd = xsocket(AF_INET, SOCK_DGRAM, 0);
if (action == RTACTION_ADD)
xioctl(skfd, SIOCADDRT, rt);
else
xioctl(skfd, SIOCDELRT, rt);
if (ENABLE_FEATURE_CLEAN_UP) close(skfd);
}
在该函数内部主要完成 struct rtentry *const rt = (void *)rt_buf结构体的初始化,最后通过xioctl系统调用CM=SIOCADDRT、SIOCDELRT,完成对参数为struct rtentry *const rt处理。
4. route之内核源码
int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
{
struct sock *sk = sock->sk;
int err = 0;
struct net *net = sock_net(sk);
switch (cmd) {
......
case SIOCADDRT:
case SIOCDELRT:
case SIOCRTMSG:
err = ip_rt_ioctl(net, cmd, (void __user *)arg);
break;
......
}
int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
{
struct fib_config cfg;
struct rtentry rt;
int err;
switch (cmd) {
case SIOCADDRT: /* Add a route */
case SIOCDELRT: /* Delete a route */
if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
return -EPERM;
if (copy_from_user(&rt, arg, sizeof(rt))) //强制转换为rtentry结构体成员
return -EFAULT;
rtnl_lock();
err = rtentry_to_fib_config(net, cmd, &rt, &cfg); //通过rtentry结构体来解析cfg的配置
if (err == 0) {
struct fib_table *tb;
if (cmd == SIOCDELRT) {
tb = fib_get_table(net, cfg.fc_table); //cfg.fc_table = 0
if (tb)
err = fib_table_delete(tb, &cfg);
else
err = -ESRCH;
} else {
tb = fib_new_table(net, cfg.fc_table); //cfg.fc_table = 0
if (tb)
err = fib_table_insert(tb, &cfg);
else
err = -ENOBUFS;
}
/* allocated by rtentry_to_fib_config() */
kfree(cfg.fc_mx);
}
rtnl_unlock();
return err;
}
return -EINVAL;
}
在该函数内部主要完成以下几种功能:
a. rtentry_to_fib_config() rtentry结构体转fib_config;
先看下struct fib_config的成员
struct fib_config {
u8 fc_dst_len; //目的地址有效bit的个数
u8 fc_tos;
/* rtm_protocol */
#define RTPROT_UNSPEC 0
#define RTPROT_REDIRECT 1 /* Route installed by ICMP redirects;
not used by current IPv4 */
#define RTPROT_KERNEL 2 /* Route installed by kernel */
#define RTPROT_BOOT 3 /* Route installed during boot */
#define RTPROT_STATIC 4 /* Route installed by administrator */
u8 fc_protocol; //见上宏定义
enum rt_scope_t {
RT_SCOPE_UNIVERSE=0,
/* User defined values */
RT_SCOPE_SITE=200,
RT_SCOPE_LINK=253,
RT_SCOPE_HOST=254,
RT_SCOPE_NOWHERE=255
};
u8 fc_scope; //RT_SCOPE_HOST RT_SCOPE_LINK
/* rtm_type */
enum {
RTN_UNSPEC,
RTN_UNICAST, /* Gateway or direct route */
RTN_LOCAL, /* Accept locally */
RTN_BROADCAST, /* Accept locally as broadcast,
send as broadcast */
RTN_ANYCAST, /* Accept locally as broadcast,
but send as unicast */
RTN_MULTICAST, /* Multicast route */
RTN_BLACKHOLE, /* Drop */
RTN_UNREACHABLE, /* Destination is unreachable */
RTN_PROHIBIT, /* Administratively prohibited */
RTN_THROW, /* Not in this table */
RTN_NAT, /* Translate this address */
RTN_XRESOLVE, /* Use external resolver */
__RTN_MAX
};
u8 fc_type; //fc config配置
/* 3 bytes unused */
u32 fc_table;
__be32 fc_dst; //目的地址
__be32 fc_gw; //网关地址
int fc_oif; // = dev->ifindex 设备接口索引编号
u32 fc_flags;
u32 fc_priority;
__be32 fc_prefsrc; // = ifa->ifa_local 即本地地址,通过net获取的设备
struct nlattr *fc_mx; //指向nlattr链表
struct rtnexthop *fc_mp; //fc_mp表示fc_mx指针的个数
int fc_mx_len; //表示 fc_mx 指针指向的成员字节数
int fc_mp_len;
u32 fc_flow;
/* Modifiers to NEW request */
#define NLM_F_REPLACE 0x100 /* Override existing */
#define NLM_F_EXCL 0x200 /* Do not touch, if it exists */
#define NLM_F_CREATE 0x400 /* Create, if it does not exist */
#define NLM_F_APPEND 0x800 /* Add to end of list */
u32 fc_nlflags; //见上宏定义
struct nl_info fc_nlinfo;
};
static int rtentry_to_fib_config(struct net *net, int cmd, struct rtentry *rt,
struct fib_config *cfg)
{
__be32 addr;
int plen;
memset(cfg, 0, sizeof(*cfg));
cfg->fc_nlinfo.nl_net = net;
if (rt->rt_dst.sa_family != AF_INET) //目的地址的协议族
return -EAFNOSUPPORT;
/*
* Check mask for validity:
* a) it must be contiguous.
* b) destination must have all host bits clear.
* c) if application forgot to set correct family (AF_INET),
* reject request unless it is absolutely clear i.e.
* both family and mask are zero.
*/
plen = 32;
addr = sk_extract_addr(&rt->rt_dst); //提取目的地址
if (!(rt->rt_flags & RTF_HOST)) { //不是主机路由,即是网络路由
__be32 mask = sk_extract_addr(&rt->rt_genmask); //提取子网掩码
if (rt->rt_genmask.sa_family != AF_INET) { //子网掩码协议族不为AF_INET
if (mask || rt->rt_genmask.sa_family) //子网掩码有效 || rt->rt_genmask.sa_family有效
return -EAFNOSUPPORT;
}
//到这里是网络路由,假设路由IP=192.168.1.0/24,所以addr=192.168.1.0,mask=255.255.255.0
//在bad_mask函数内部为真,表示网络路由配置的有问题
if (bad_mask(mask, addr))
return -EINVAL;
plen = inet_mask_len(mask); //获取mask的长度
}
cfg->fc_dst_len = plen;
cfg->fc_dst = addr;
if (cmd != SIOCDELRT) {
cfg->fc_nlflags = NLM_F_CREATE;
cfg->fc_protocol = RTPROT_BOOT;
}
if (rt->rt_metric)
cfg->fc_priority = rt->rt_metric - 1;
if (rt->rt_flags & RTF_REJECT) {
cfg->fc_scope = RT_SCOPE_HOST;
cfg->fc_type = RTN_UNREACHABLE;
return 0;
}
cfg->fc_scope = RT_SCOPE_NOWHERE;
cfg->fc_type = RTN_UNICAST;
if (rt->rt_dev) {
char *colon;
struct net_device *dev;
char devname[IFNAMSIZ];
if (copy_from_user(devname, rt->rt_dev, IFNAMSIZ-1))
return -EFAULT;
devname[IFNAMSIZ-1] = 0;
colon = strchr(devname, ':'); //colon: 冒号 如eth0:1表示eth0的别名
if (colon)
*colon = 0;
dev = __dev_get_by_name(net, devname); //通过接口名称devname获取设备dev
if (!dev)
return -ENODEV;
cfg->fc_oif = dev->ifindex;
if (colon) {
struct in_ifaddr *ifa;
struct in_device *in_dev = __in_dev_get_rtnl(dev);
if (!in_dev)
return -ENODEV;
*colon = ':';
for (ifa = in_dev->ifa_list; ifa; ifa = ifa->ifa_next) //遍历设备下的接口
if (strcmp(ifa->ifa_label, devname) == 0) //接口名称是否相同
break;
if (ifa == NULL)
return -ENODEV;
cfg->fc_prefsrc = ifa->ifa_local; //获取本地地址
}
}
addr = sk_extract_addr(&rt->rt_gateway); //提取路由网关地址
if (rt->rt_gateway.sa_family == AF_INET && addr) {
cfg->fc_gw = addr; //网关地址
if (rt->rt_flags & RTF_GATEWAY && //目的地址是网关
inet_addr_type(net, addr) == RTN_UNICAST)
cfg->fc_scope = RT_SCOPE_UNIVERSE;
}
if (cmd == SIOCDELRT)
return 0;
if (rt->rt_flags & RTF_GATEWAY && !cfg->fc_gw) //网关地址无效就直接退出
return -EINVAL;
if (cfg->fc_scope == RT_SCOPE_NOWHERE)
cfg->fc_scope = RT_SCOPE_LINK;
if (rt->rt_flags & (RTF_MTU | RTF_WINDOW | RTF_IRTT)) { //路由标识
struct nlattr *mx;
int len = 0;
mx = kzalloc(3 * nla_total_size(4), GFP_KERNEL);
if (mx == NULL)
return -ENOMEM;
//特别注意put_rtax函数接口,mx是函数指针,在put_rtax函数内部会执行mx的偏移
if (rt->rt_flags & RTF_MTU)
len = put_rtax(mx, len, RTAX_ADVMSS, rt->rt_mtu - 40);
if (rt->rt_flags & RTF_WINDOW)
len = put_rtax(mx, len, RTAX_WINDOW, rt->rt_window);
if (rt->rt_flags & RTF_IRTT)
len = put_rtax(mx, len, RTAX_RTT, rt->rt_irtt << 3);
cfg->fc_mx = mx;
cfg->fc_mx_len = len;
}
return 0;
}
b. cmd == SIOCDELRT 的处理;
if (cmd == SIOCDELRT) {
tb = fib_get_table(net, cfg.fc_table); //cfg.fc_table = 0
if (tb)
err = fib_table_delete(tb, &cfg); //详见其内部的实现
//通过形参id,匹配hash链表,成功就返回tb,否则NULL
struct fib_table *fib_get_table(struct net *net, u32 id)
{
struct fib_table *tb;
struct hlist_head *head;
unsigned int h;
if (id == 0)
id = RT_TABLE_MAIN;
h = id & (FIB_TABLE_HASHSZ - 1); //h = id & 0xff
rcu_read_lock();
//关于fib_table_hash[*]的创建,详见:https://blog.csdn.net/guodong1010/article/details/52245555
head = &net->ipv4.fib_table_hash[h]; //看下这里是什么时候赋值的,在 fib_new_table 函数内部初始化链表的
hlist_for_each_entry_rcu(tb, head, tb_hlist) { //遍历 net->ipv4.fib_table_hash 链表,寻找匹配成功的路由表id
if (tb->tb_id == id) {
rcu_read_unlock();
return tb;
}
}
rcu_read_unlock();
return NULL;
}
int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
{
struct trie *t = (struct trie *) tb->tb_data;
u32 key, mask;
int plen = cfg->fc_dst_len;
u8 tos = cfg->fc_tos;
struct fib_alias *fa, *fa_to_delete;
struct list_head *fa_head;
struct leaf *l;
struct leaf_info *li;
if (plen > 32)
return -EINVAL;
key = ntohl(cfg->fc_dst); //获取目的地址
mask = ntohl(inet_make_mask(plen)); //获取子网掩码
if (key & ~mask) //key是网络地址?如果是网络地址,那么子网掩码为1的个数,即表示网络号,子网掩码为0的表示主机号
return -EINVAL;
key = key & mask; //获取网络号地址
l = fib_find_node(t, key); //返回leaf叶子
if (!l)
return -ESRCH;
li = find_leaf_info(l, plen); //通过叶子leaf上的链表,比较其有效地址长度,获取其leaf_info
if (!li)
return -ESRCH;
fa_head = &li->falh; //通过leaf_info获取fib_alias链表,通过该链表获取其fib_alias别名
fa = fib_find_alias(fa_head, tos, 0);
if (!fa)
return -ESRCH;
pr_debug("Deleting %08x/%d tos=%d t=%p\n", key, plen, tos, t);
fa_to_delete = NULL;
fa = list_entry(fa->fa_list.prev, struct fib_alias, fa_list); //遍历fa->fa_list链表
list_for_each_entry_continue(fa, fa_head, fa_list) {
struct fib_info *fi = fa->fa_info;
if (fa->fa_tos != tos)
break;
if ((!cfg->fc_type || fa->fa_type == cfg->fc_type) && //路由类型
(cfg->fc_scope == RT_SCOPE_NOWHERE ||
fa->fa_info->fib_scope == cfg->fc_scope) && //路由返回,如主机、link...
(!cfg->fc_prefsrc ||
fi->fib_prefsrc == cfg->fc_prefsrc) && //源地址相等
(!cfg->fc_protocol ||
fi->fib_protocol == cfg->fc_protocol) && //路由协议类型
fib_nh_match(cfg, fi) == 0) {
fa_to_delete = fa;
break;
}
}
if (!fa_to_delete)
return -ESRCH;
fa = fa_to_delete;
rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
&cfg->fc_nlinfo, 0);
list_del_rcu(&fa->fa_list);
if (!plen)
tb->tb_num_default--;
if (list_empty(fa_head)) {
hlist_del_rcu(&li->hlist);
free_leaf_info(li);
}
if (hlist_empty(&l->list))
trie_leaf_remove(t, l);
if (fa->fa_state & FA_S_ACCESSED)
rt_cache_flush(cfg->fc_nlinfo.nl_net);
fib_release_info(fa->fa_info);
alias_free_mem_rcu(fa);
return 0;
}
c. cmd == SIOCADDRT的处理;
tb = fib_new_table(net, cfg.fc_table); //cfg.fc_table = 0
if (tb)
err = fib_table_insert(tb, &cfg);
struct fib_table *fib_new_table(struct net *net, u32 id)
{
struct fib_table *tb;
unsigned int h;
if (id == 0)
id = RT_TABLE_MAIN;
tb = fib_get_table(net, id); //检索tb是否被加入到id对应的链表(如RT_TABLE_LOCAL链表)上,被加入就直接退出,否则将执行 fib_trie_table
if (tb)
return tb;
tb = fib_trie_table(id); //内存申请一个 fib_table
if (!tb)
return NULL;
switch (id) {
case RT_TABLE_LOCAL:
net->ipv4.fib_local = tb;
break;
case RT_TABLE_MAIN:
net->ipv4.fib_main = tb;
break;
case RT_TABLE_DEFAULT:
net->ipv4.fib_default = tb;
break;
default:
break;
}
h = id & (FIB_TABLE_HASHSZ - 1);
hlist_add_head_rcu(&tb->tb_hlist, &net->ipv4.fib_table_hash[h]);//将tb(struct fib_table *tb)添加到其链表上
return tb;
}
int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
{
struct trie *t = (struct trie *) tb->tb_data;
struct fib_alias *fa, *new_fa;
struct list_head *fa_head = NULL;
struct fib_info *fi;
int plen = cfg->fc_dst_len; //目的地址有效bit个数(如表示网络号的个数)
u8 tos = cfg->fc_tos; //服务类型质量
u32 key, mask;
int err;
struct leaf *l;
if (plen > 32)
return -EINVAL;
key = ntohl(cfg->fc_dst); //获取目的地址
pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);
mask = ntohl(inet_make_mask(plen)); //通过地址有效的个数,计算子网掩码
if (key & ~mask) //网络号
return -EINVAL;
key = key & mask; //地址 & mask = 网络号
fi = fib_create_info(cfg); //分配一个struct fib_info结构体
if (IS_ERR(fi)) {
err = PTR_ERR(fi);
goto err;
}
l = fib_find_node(t, key); //通过关键字key查找leaf
fa = NULL;
if (l) { //l为真表示叶子存在
fa_head = get_fa_head(l, plen); //通过leaf->leaf_info->fa_alias获取其链表头
fa = fib_find_alias(fa_head, tos, fi->fib_priority); //通过表头fa_head遍历是否存在相同的fa
}
/* Now fa, if non-NULL, points to the first fib alias
* with the same keys [prefix,tos,priority], if such key already
* exists or to the node before which we will insert new one.
*
* If fa is NULL, we will need to allocate a new one and
* insert to the head of f.
*
* If f is NULL, no fib node matched the destination key
* and we need to allocate a new one of those as well.
*/
if (fa && fa->fa_tos == tos &&
fa->fa_info->fib_priority == fi->fib_priority) { //表明存在相同的fa
struct fib_alias *fa_first, *fa_match;
err = -EEXIST;
if (cfg->fc_nlflags & NLM_F_EXCL)
goto out;
/* We have 2 goals:
* 1. Find exact match for type, scope, fib_info to avoid
* duplicate routes
* 2. Find next 'fa' (or head), NLM_F_APPEND inserts before it
*/
fa_match = NULL;
fa_first = fa;
fa = list_entry(fa->fa_list.prev, struct fib_alias, fa_list);
list_for_each_entry_continue(fa, fa_head, fa_list) {
if (fa->fa_tos != tos)
break;
if (fa->fa_info->fib_priority != fi->fib_priority)
break;
if (fa->fa_type == cfg->fc_type &&
fa->fa_info == fi) {
fa_match = fa;
break;
}
}
if (cfg->fc_nlflags & NLM_F_REPLACE) { //存在,替换原来的
struct fib_info *fi_drop;
u8 state;
fa = fa_first;
if (fa_match) {
if (fa == fa_match)
err = 0;
goto out; //上面匹配成功就直接退出,否则要新建一个new_fa
}
err = -ENOBUFS;
new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL);
if (new_fa == NULL)
goto out;
fi_drop = fa->fa_info;
new_fa->fa_tos = fa->fa_tos;
new_fa->fa_info = fi;
new_fa->fa_type = cfg->fc_type;
state = fa->fa_state;
new_fa->fa_state = state & ~FA_S_ACCESSED;
list_replace_rcu(&fa->fa_list, &new_fa->fa_list);
alias_free_mem_rcu(fa);
fib_release_info(fi_drop);
if (state & FA_S_ACCESSED)
rt_cache_flush(cfg->fc_nlinfo.nl_net);
rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
goto succeeded;
}
/* Error if we find a perfect match which
* uses the same scope, type, and nexthop
* information.
*/
if (fa_match) //匹配成功就退出
goto out;
if (!(cfg->fc_nlflags & NLM_F_APPEND))
fa = fa_first;
}
err = -ENOENT;
if (!(cfg->fc_nlflags & NLM_F_CREATE))
goto out;
err = -ENOBUFS;
new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL); //到这里表明上面没有找到相同的fa,需重新申请一个新的
if (new_fa == NULL)
goto out;
//初始化fib_alias结构体
new_fa->fa_info = fi; //绑定上面分配的fi(fib_info)
//关键字绑定
new_fa->fa_tos = tos;
new_fa->fa_type = cfg->fc_type;
new_fa->fa_state = 0;
/*
* Insert new entry to the list.
*/
if (!fa_head) { //为NULL,表明是第一次执行
fa_head = fib_insert_node(t, key, plen); //插入一个节点,内部的实现还未理顺,待分析中,核心部分!!!
if (unlikely(!fa_head)) {
err = -ENOMEM;
goto out_free_new_fa;
}
}
if (!plen)
tb->tb_num_default++;
list_add_tail_rcu(&new_fa->fa_list,
(fa ? &fa->fa_list : fa_head));
rt_cache_flush(cfg->fc_nlinfo.nl_net);
rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb->tb_id,
&cfg->fc_nlinfo, 0);
succeeded:
return 0;
out_free_new_fa:
kmem_cache_free(fn_alias_kmem, new_fa);
out:
fib_release_info(fi);
err:
return err;
}
最后在该函数内部通过路由消息发送
rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb->tb_id,
&cfg->fc_nlinfo, 0);
关于该路由接收的处理,这里不再赘述,详见:https://blog.csdn.net/chenliang0224/article/details/82534489 里面有描述接收部分的处理。