forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathbonding.txt
2827 lines (2179 loc) · 112 KB
/
bonding.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Linux Ethernet Bonding Driver HOWTO
Latest update: 27 April 2011
Initial release : Thomas Davis <tadavis at lbl.gov>
Corrections, HA extensions : 2000/10/03-15 :
- Willy Tarreau <willy at meta-x.org>
- Constantine Gavrilov <const-g at xpert.com>
- Chad N. Tindel <ctindel at ieee dot org>
- Janice Girouard <girouard at us dot ibm dot com>
- Jay Vosburgh <fubar at us dot ibm dot com>
Reorganized and updated Feb 2005 by Jay Vosburgh
Added Sysfs information: 2006/04/24
- Mitch Williams <mitch.a.williams at intel.com>
Introduction
============
The Linux bonding driver provides a method for aggregating
multiple network interfaces into a single logical "bonded" interface.
The behavior of the bonded interfaces depends upon the mode; generally
speaking, modes provide either hot standby or load balancing services.
Additionally, link integrity monitoring may be performed.
The bonding driver originally came from Donald Becker's
beowulf patches for kernel 2.0. It has changed quite a bit since, and
the original tools from extreme-linux and beowulf sites will not work
with this version of the driver.
For new versions of the driver, updated userspace tools, and
who to ask for help, please follow the links at the end of this file.
Table of Contents
=================
1. Bonding Driver Installation
2. Bonding Driver Options
3. Configuring Bonding Devices
3.1 Configuration with Sysconfig Support
3.1.1 Using DHCP with Sysconfig
3.1.2 Configuring Multiple Bonds with Sysconfig
3.2 Configuration with Initscripts Support
3.2.1 Using DHCP with Initscripts
3.2.2 Configuring Multiple Bonds with Initscripts
3.3 Configuring Bonding Manually with Ifenslave
3.3.1 Configuring Multiple Bonds Manually
3.4 Configuring Bonding Manually via Sysfs
3.5 Configuration with Interfaces Support
3.6 Overriding Configuration for Special Cases
3.7 Configuring LACP for 802.3ad mode in a more secure way
4. Querying Bonding Configuration
4.1 Bonding Configuration
4.2 Network Configuration
5. Switch Configuration
6. 802.1q VLAN Support
7. Link Monitoring
7.1 ARP Monitor Operation
7.2 Configuring Multiple ARP Targets
7.3 MII Monitor Operation
8. Potential Trouble Sources
8.1 Adventures in Routing
8.2 Ethernet Device Renaming
8.3 Painfully Slow Or No Failed Link Detection By Miimon
9. SNMP agents
10. Promiscuous mode
11. Configuring Bonding for High Availability
11.1 High Availability in a Single Switch Topology
11.2 High Availability in a Multiple Switch Topology
11.2.1 HA Bonding Mode Selection for Multiple Switch Topology
11.2.2 HA Link Monitoring for Multiple Switch Topology
12. Configuring Bonding for Maximum Throughput
12.1 Maximum Throughput in a Single Switch Topology
12.1.1 MT Bonding Mode Selection for Single Switch Topology
12.1.2 MT Link Monitoring for Single Switch Topology
12.2 Maximum Throughput in a Multiple Switch Topology
12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
12.2.2 MT Link Monitoring for Multiple Switch Topology
13. Switch Behavior Issues
13.1 Link Establishment and Failover Delays
13.2 Duplicated Incoming Packets
14. Hardware Specific Considerations
14.1 IBM BladeCenter
15. Frequently Asked Questions
16. Resources and Links
1. Bonding Driver Installation
==============================
Most popular distro kernels ship with the bonding driver
already available as a module. If your distro does not, or you
have need to compile bonding from source (e.g., configuring and
installing a mainline kernel from kernel.org), you'll need to perform
the following steps:
1.1 Configure and build the kernel with bonding
-----------------------------------------------
The current version of the bonding driver is available in the
drivers/net/bonding subdirectory of the most recent kernel source
(which is available on http://kernel.org). Most users "rolling their
own" will want to use the most recent kernel from kernel.org.
Configure kernel with "make menuconfig" (or "make xconfig" or
"make config"), then select "Bonding driver support" in the "Network
device support" section. It is recommended that you configure the
driver as module since it is currently the only way to pass parameters
to the driver or configure more than one bonding device.
Build and install the new kernel and modules.
1.2 Bonding Control Utility
-------------------------------------
It is recommended to configure bonding via iproute2 (netlink)
or sysfs, the old ifenslave control utility is obsolete.
2. Bonding Driver Options
=========================
Options for the bonding driver are supplied as parameters to the
bonding module at load time, or are specified via sysfs.
Module options may be given as command line arguments to the
insmod or modprobe command, but are usually specified in either the
/etc/modrobe.d/*.conf configuration files, or in a distro-specific
configuration file (some of which are detailed in the next section).
Details on bonding support for sysfs is provided in the
"Configuring Bonding Manually via Sysfs" section, below.
The available bonding driver parameters are listed below. If a
parameter is not specified the default value is used. When initially
configuring a bond, it is recommended "tail -f /var/log/messages" be
run in a separate window to watch for bonding driver error messages.
It is critical that either the miimon or arp_interval and
arp_ip_target parameters be specified, otherwise serious network
degradation will occur during link failures. Very few devices do not
support at least miimon, so there is really no reason not to use it.
Options with textual values will accept either the text name
or, for backwards compatibility, the option value. E.g.,
"mode=802.3ad" and "mode=4" set the same mode.
The parameters are as follows:
active_slave
Specifies the new active slave for modes that support it
(active-backup, balance-alb and balance-tlb). Possible values
are the name of any currently enslaved interface, or an empty
string. If a name is given, the slave and its link must be up in order
to be selected as the new active slave. If an empty string is
specified, the current active slave is cleared, and a new active
slave is selected automatically.
Note that this is only available through the sysfs interface. No module
parameter by this name exists.
The normal value of this option is the name of the currently
active slave, or the empty string if there is no active slave or
the current mode does not use an active slave.
ad_actor_sys_prio
In an AD system, this specifies the system priority. The allowed range
is 1 - 65535. If the value is not specified, it takes 65535 as the
default value.
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
ad_actor_system
In an AD system, this specifies the mac-address for the actor in
protocol packet exchanges (LACPDUs). The value cannot be NULL or
multicast. It is preferred to have the local-admin bit set for this
mac but driver does not enforce it. If the value is not given then
system defaults to using the masters' mac address as actors' system
address.
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
ad_select
Specifies the 802.3ad aggregation selection logic to use. The
possible values and their effects are:
stable or 0
The active aggregator is chosen by largest aggregate
bandwidth.
Reselection of the active aggregator occurs only when all
slaves of the active aggregator are down or the active
aggregator has no slaves.
This is the default value.
bandwidth or 1
The active aggregator is chosen by largest aggregate
bandwidth. Reselection occurs if:
- A slave is added to or removed from the bond
- Any slave's link state changes
- Any slave's 802.3ad association state changes
- The bond's administrative state changes to up
count or 2
The active aggregator is chosen by the largest number of
ports (slaves). Reselection occurs as described under the
"bandwidth" setting, above.
The bandwidth and count selection policies permit failover of
802.3ad aggregations when partial failure of the active aggregator
occurs. This keeps the aggregator with the highest availability
(either in bandwidth or in number of ports) active at all times.
This option was added in bonding version 3.4.0.
ad_user_port_key
In an AD system, the port-key has three parts as shown below -
Bits Use
00 Duplex
01-05 Speed
06-15 User-defined
This defines the upper 10 bits of the port key. The values can be
from 0 - 1023. If not given, the system defaults to 0.
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
all_slaves_active
Specifies that duplicate frames (received on inactive ports) should be
dropped (0) or delivered (1).
Normally, bonding will drop duplicate frames (received on inactive
ports), which is desirable for most users. But there are some times
it is nice to allow duplicate frames to be delivered.
The default value is 0 (drop duplicate frames received on inactive
ports).
arp_interval
Specifies the ARP link monitoring frequency in milliseconds.
The ARP monitor works by periodically checking the slave
devices to determine whether they have sent or received
traffic recently (the precise criteria depends upon the
bonding mode, and the state of the slave). Regular traffic is
generated via ARP probes issued for the addresses specified by
the arp_ip_target option.
This behavior can be modified by the arp_validate option,
below.
If ARP monitoring is used in an etherchannel compatible mode
(modes 0 and 2), the switch should be configured in a mode
that evenly distributes packets across all links. If the
switch is configured to distribute the packets in an XOR
fashion, all replies from the ARP targets will be received on
the same link which could cause the other team members to
fail. ARP monitoring should not be used in conjunction with
miimon. A value of 0 disables ARP monitoring. The default
value is 0.
arp_ip_target
Specifies the IP addresses to use as ARP monitoring peers when
arp_interval is > 0. These are the targets of the ARP request
sent to determine the health of the link to the targets.
Specify these values in ddd.ddd.ddd.ddd format. Multiple IP
addresses must be separated by a comma. At least one IP
address must be given for ARP monitoring to function. The
maximum number of targets that can be specified is 16. The
default value is no IP addresses.
arp_validate
Specifies whether or not ARP probes and replies should be
validated in any mode that supports arp monitoring, or whether
non-ARP traffic should be filtered (disregarded) for link
monitoring purposes.
Possible values are:
none or 0
No validation or filtering is performed.
active or 1
Validation is performed only for the active slave.
backup or 2
Validation is performed only for backup slaves.
all or 3
Validation is performed for all slaves.
filter or 4
Filtering is applied to all slaves. No validation is
performed.
filter_active or 5
Filtering is applied to all slaves, validation is performed
only for the active slave.
filter_backup or 6
Filtering is applied to all slaves, validation is performed
only for backup slaves.
Validation:
Enabling validation causes the ARP monitor to examine the incoming
ARP requests and replies, and only consider a slave to be up if it
is receiving the appropriate ARP traffic.
For an active slave, the validation checks ARP replies to confirm
that they were generated by an arp_ip_target. Since backup slaves
do not typically receive these replies, the validation performed
for backup slaves is on the broadcast ARP request sent out via the
active slave. It is possible that some switch or network
configurations may result in situations wherein the backup slaves
do not receive the ARP requests; in such a situation, validation
of backup slaves must be disabled.
The validation of ARP requests on backup slaves is mainly helping
bonding to decide which slaves are more likely to work in case of
the active slave failure, it doesn't really guarantee that the
backup slave will work if it's selected as the next active slave.
Validation is useful in network configurations in which multiple
bonding hosts are concurrently issuing ARPs to one or more targets
beyond a common switch. Should the link between the switch and
target fail (but not the switch itself), the probe traffic
generated by the multiple bonding instances will fool the standard
ARP monitor into considering the links as still up. Use of
validation can resolve this, as the ARP monitor will only consider
ARP requests and replies associated with its own instance of
bonding.
Filtering:
Enabling filtering causes the ARP monitor to only use incoming ARP
packets for link availability purposes. Arriving packets that are
not ARPs are delivered normally, but do not count when determining
if a slave is available.
Filtering operates by only considering the reception of ARP
packets (any ARP packet, regardless of source or destination) when
determining if a slave has received traffic for link availability
purposes.
Filtering is useful in network configurations in which significant
levels of third party broadcast traffic would fool the standard
ARP monitor into considering the links as still up. Use of
filtering can resolve this, as only ARP traffic is considered for
link availability purposes.
This option was added in bonding version 3.1.0.
arp_all_targets
Specifies the quantity of arp_ip_targets that must be reachable
in order for the ARP monitor to consider a slave as being up.
This option affects only active-backup mode for slaves with
arp_validation enabled.
Possible values are:
any or 0
consider the slave up only when any of the arp_ip_targets
is reachable
all or 1
consider the slave up only when all of the arp_ip_targets
are reachable
downdelay
Specifies the time, in milliseconds, to wait before disabling
a slave after a link failure has been detected. This option
is only valid for the miimon link monitor. The downdelay
value should be a multiple of the miimon value; if not, it
will be rounded down to the nearest multiple. The default
value is 0.
fail_over_mac
Specifies whether active-backup mode should set all slaves to
the same MAC address at enslavement (the traditional
behavior), or, when enabled, perform special handling of the
bond's MAC address in accordance with the selected policy.
Possible values are:
none or 0
This setting disables fail_over_mac, and causes
bonding to set all slaves of an active-backup bond to
the same MAC address at enslavement time. This is the
default.
active or 1
The "active" fail_over_mac policy indicates that the
MAC address of the bond should always be the MAC
address of the currently active slave. The MAC
address of the slaves is not changed; instead, the MAC
address of the bond changes during a failover.
This policy is useful for devices that cannot ever
alter their MAC address, or for devices that refuse
incoming broadcasts with their own source MAC (which
interferes with the ARP monitor).
The down side of this policy is that every device on
the network must be updated via gratuitous ARP,
vs. just updating a switch or set of switches (which
often takes place for any traffic, not just ARP
traffic, if the switch snoops incoming traffic to
update its tables) for the traditional method. If the
gratuitous ARP is lost, communication may be
disrupted.
When this policy is used in conjunction with the mii
monitor, devices which assert link up prior to being
able to actually transmit and receive are particularly
susceptible to loss of the gratuitous ARP, and an
appropriate updelay setting may be required.
follow or 2
The "follow" fail_over_mac policy causes the MAC
address of the bond to be selected normally (normally
the MAC address of the first slave added to the bond).
However, the second and subsequent slaves are not set
to this MAC address while they are in a backup role; a
slave is programmed with the bond's MAC address at
failover time (and the formerly active slave receives
the newly active slave's MAC address).
This policy is useful for multiport devices that
either become confused or incur a performance penalty
when multiple ports are programmed with the same MAC
address.
The default policy is none, unless the first slave cannot
change its MAC address, in which case the active policy is
selected by default.
This option may be modified via sysfs only when no slaves are
present in the bond.
This option was added in bonding version 3.2.0. The "follow"
policy was added in bonding version 3.3.0.
lacp_rate
Option specifying the rate in which we'll ask our link partner
to transmit LACPDU packets in 802.3ad mode. Possible values
are:
slow or 0
Request partner to transmit LACPDUs every 30 seconds
fast or 1
Request partner to transmit LACPDUs every 1 second
The default is slow.
max_bonds
Specifies the number of bonding devices to create for this
instance of the bonding driver. E.g., if max_bonds is 3, and
the bonding driver is not already loaded, then bond0, bond1
and bond2 will be created. The default value is 1. Specifying
a value of 0 will load bonding, but will not create any devices.
miimon
Specifies the MII link monitoring frequency in milliseconds.
This determines how often the link state of each slave is
inspected for link failures. A value of zero disables MII
link monitoring. A value of 100 is a good starting point.
The use_carrier option, below, affects how the link state is
determined. See the High Availability section for additional
information. The default value is 0.
min_links
Specifies the minimum number of links that must be active before
asserting carrier. It is similar to the Cisco EtherChannel min-links
feature. This allows setting the minimum number of member ports that
must be up (link-up state) before marking the bond device as up
(carrier on). This is useful for situations where higher level services
such as clustering want to ensure a minimum number of low bandwidth
links are active before switchover. This option only affect 802.3ad
mode.
The default value is 0. This will cause carrier to be asserted (for
802.3ad mode) whenever there is an active aggregator, regardless of the
number of available links in that aggregator. Note that, because an
aggregator cannot be active without at least one available link,
setting this option to 0 or to 1 has the exact same effect.
mode
Specifies one of the bonding policies. The default is
balance-rr (round robin). Possible values are:
balance-rr or 0
Round-robin policy: Transmit packets in sequential
order from the first available slave through the
last. This mode provides load balancing and fault
tolerance.
active-backup or 1
Active-backup policy: Only one slave in the bond is
active. A different slave becomes active if, and only
if, the active slave fails. The bond's MAC address is
externally visible on only one port (network adapter)
to avoid confusing the switch.
In bonding version 2.6.2 or later, when a failover
occurs in active-backup mode, bonding will issue one
or more gratuitous ARPs on the newly active slave.
One gratuitous ARP is issued for the bonding master
interface and each VLAN interfaces configured above
it, provided that the interface has at least one IP
address configured. Gratuitous ARPs issued for VLAN
interfaces are tagged with the appropriate VLAN id.
This mode provides fault tolerance. The primary
option, documented below, affects the behavior of this
mode.
balance-xor or 2
XOR policy: Transmit based on the selected transmit
hash policy. The default policy is a simple [(source
MAC address XOR'd with destination MAC address XOR
packet type ID) modulo slave count]. Alternate transmit
policies may be selected via the xmit_hash_policy option,
described below.
This mode provides load balancing and fault tolerance.
broadcast or 3
Broadcast policy: transmits everything on all slave
interfaces. This mode provides fault tolerance.
802.3ad or 4
IEEE 802.3ad Dynamic link aggregation. Creates
aggregation groups that share the same speed and
duplex settings. Utilizes all slaves in the active
aggregator according to the 802.3ad specification.
Slave selection for outgoing traffic is done according
to the transmit hash policy, which may be changed from
the default simple XOR policy via the xmit_hash_policy
option, documented below. Note that not all transmit
policies may be 802.3ad compliant, particularly in
regards to the packet mis-ordering requirements of
section 43.2.4 of the 802.3ad standard. Differing
peer implementations will have varying tolerances for
noncompliance.
Prerequisites:
1. Ethtool support in the base drivers for retrieving
the speed and duplex of each slave.
2. A switch that supports IEEE 802.3ad Dynamic link
aggregation.
Most switches will require some type of configuration
to enable 802.3ad mode.
balance-tlb or 5
Adaptive transmit load balancing: channel bonding that
does not require any special switch support.
In tlb_dynamic_lb=1 mode; the outgoing traffic is
distributed according to the current load (computed
relative to the speed) on each slave.
In tlb_dynamic_lb=0 mode; the load balancing based on
current load is disabled and the load is distributed
only using the hash distribution.
Incoming traffic is received by the current slave.
If the receiving slave fails, another slave takes over
the MAC address of the failed receiving slave.
Prerequisite:
Ethtool support in the base drivers for retrieving the
speed of each slave.
balance-alb or 6
Adaptive load balancing: includes balance-tlb plus
receive load balancing (rlb) for IPV4 traffic, and
does not require any special switch support. The
receive load balancing is achieved by ARP negotiation.
The bonding driver intercepts the ARP Replies sent by
the local system on their way out and overwrites the
source hardware address with the unique hardware
address of one of the slaves in the bond such that
different peers use different hardware addresses for
the server.
Receive traffic from connections created by the server
is also balanced. When the local system sends an ARP
Request the bonding driver copies and saves the peer's
IP information from the ARP packet. When the ARP
Reply arrives from the peer, its hardware address is
retrieved and the bonding driver initiates an ARP
reply to this peer assigning it to one of the slaves
in the bond. A problematic outcome of using ARP
negotiation for balancing is that each time that an
ARP request is broadcast it uses the hardware address
of the bond. Hence, peers learn the hardware address
of the bond and the balancing of receive traffic
collapses to the current slave. This is handled by
sending updates (ARP Replies) to all the peers with
their individually assigned hardware address such that
the traffic is redistributed. Receive traffic is also
redistributed when a new slave is added to the bond
and when an inactive slave is re-activated. The
receive load is distributed sequentially (round robin)
among the group of highest speed slaves in the bond.
When a link is reconnected or a new slave joins the
bond the receive traffic is redistributed among all
active slaves in the bond by initiating ARP Replies
with the selected MAC address to each of the
clients. The updelay parameter (detailed below) must
be set to a value equal or greater than the switch's
forwarding delay so that the ARP Replies sent to the
peers will not be blocked by the switch.
Prerequisites:
1. Ethtool support in the base drivers for retrieving
the speed of each slave.
2. Base driver support for setting the hardware
address of a device while it is open. This is
required so that there will always be one slave in the
team using the bond hardware address (the
curr_active_slave) while having a unique hardware
address for each slave in the bond. If the
curr_active_slave fails its hardware address is
swapped with the new curr_active_slave that was
chosen.
num_grat_arp
num_unsol_na
Specify the number of peer notifications (gratuitous ARPs and
unsolicited IPv6 Neighbor Advertisements) to be issued after a
failover event. As soon as the link is up on the new slave
(possibly immediately) a peer notification is sent on the
bonding device and each VLAN sub-device. This is repeated at
each link monitor interval (arp_interval or miimon, whichever
is active) if the number is greater than 1.
The valid range is 0 - 255; the default value is 1. These options
affect only the active-backup mode. These options were added for
bonding versions 3.3.0 and 3.4.0 respectively.
From Linux 3.0 and bonding version 3.7.1, these notifications
are generated by the ipv4 and ipv6 code and the numbers of
repetitions cannot be set independently.
packets_per_slave
Specify the number of packets to transmit through a slave before
moving to the next one. When set to 0 then a slave is chosen at
random.
The valid range is 0 - 65535; the default value is 1. This option
has effect only in balance-rr mode.
primary
A string (eth0, eth2, etc) specifying which slave is the
primary device. The specified device will always be the
active slave while it is available. Only when the primary is
off-line will alternate devices be used. This is useful when
one slave is preferred over another, e.g., when one slave has
higher throughput than another.
The primary option is only valid for active-backup(1),
balance-tlb (5) and balance-alb (6) mode.
primary_reselect
Specifies the reselection policy for the primary slave. This
affects how the primary slave is chosen to become the active slave
when failure of the active slave or recovery of the primary slave
occurs. This option is designed to prevent flip-flopping between
the primary slave and other slaves. Possible values are:
always or 0 (default)
The primary slave becomes the active slave whenever it
comes back up.
better or 1
The primary slave becomes the active slave when it comes
back up, if the speed and duplex of the primary slave is
better than the speed and duplex of the current active
slave.
failure or 2
The primary slave becomes the active slave only if the
current active slave fails and the primary slave is up.
The primary_reselect setting is ignored in two cases:
If no slaves are active, the first slave to recover is
made the active slave.
When initially enslaved, the primary slave is always made
the active slave.
Changing the primary_reselect policy via sysfs will cause an
immediate selection of the best active slave according to the new
policy. This may or may not result in a change of the active
slave, depending upon the circumstances.
This option was added for bonding version 3.6.0.
tlb_dynamic_lb
Specifies if dynamic shuffling of flows is enabled in tlb
mode. The value has no effect on any other modes.
The default behavior of tlb mode is to shuffle active flows across
slaves based on the load in that interval. This gives nice lb
characteristics but can cause packet reordering. If re-ordering is
a concern use this variable to disable flow shuffling and rely on
load balancing provided solely by the hash distribution.
xmit-hash-policy can be used to select the appropriate hashing for
the setup.
The sysfs entry can be used to change the setting per bond device
and the initial value is derived from the module parameter. The
sysfs entry is allowed to be changed only if the bond device is
down.
The default value is "1" that enables flow shuffling while value "0"
disables it. This option was added in bonding driver 3.7.1
updelay
Specifies the time, in milliseconds, to wait before enabling a
slave after a link recovery has been detected. This option is
only valid for the miimon link monitor. The updelay value
should be a multiple of the miimon value; if not, it will be
rounded down to the nearest multiple. The default value is 0.
use_carrier
Specifies whether or not miimon should use MII or ETHTOOL
ioctls vs. netif_carrier_ok() to determine the link
status. The MII or ETHTOOL ioctls are less efficient and
utilize a deprecated calling sequence within the kernel. The
netif_carrier_ok() relies on the device driver to maintain its
state with netif_carrier_on/off; at this writing, most, but
not all, device drivers support this facility.
If bonding insists that the link is up when it should not be,
it may be that your network device driver does not support
netif_carrier_on/off. The default state for netif_carrier is
"carrier on," so if a driver does not support netif_carrier,
it will appear as if the link is always up. In this case,
setting use_carrier to 0 will cause bonding to revert to the
MII / ETHTOOL ioctl method to determine the link state.
A value of 1 enables the use of netif_carrier_ok(), a value of
0 will use the deprecated MII / ETHTOOL ioctls. The default
value is 1.
xmit_hash_policy
Selects the transmit hash policy to use for slave selection in
balance-xor, 802.3ad, and tlb modes. Possible values are:
layer2
Uses XOR of hardware MAC addresses and packet type ID
field to generate the hash. The formula is
hash = source MAC XOR destination MAC XOR packet type ID
slave number = hash modulo slave count
This algorithm will place all traffic to a particular
network peer on the same slave.
This algorithm is 802.3ad compliant.
layer2+3
This policy uses a combination of layer2 and layer3
protocol information to generate the hash.
Uses XOR of hardware MAC addresses and IP addresses to
generate the hash. The formula is
hash = source MAC XOR destination MAC XOR packet type ID
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
And then hash is reduced modulo slave count.
If the protocol is IPv6 then the source and destination
addresses are first hashed using ipv6_addr_hash.
This algorithm will place all traffic to a particular
network peer on the same slave. For non-IP traffic,
the formula is the same as for the layer2 transmit
hash policy.
This policy is intended to provide a more balanced
distribution of traffic than layer2 alone, especially
in environments where a layer3 gateway device is
required to reach most destinations.
This algorithm is 802.3ad compliant.
layer3+4
This policy uses upper layer protocol information,
when available, to generate the hash. This allows for
traffic to a particular network peer to span multiple
slaves, although a single connection will not span
multiple slaves.
The formula for unfragmented TCP and UDP packets is
hash = source port, destination port (as in the header)
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
And then hash is reduced modulo slave count.
If the protocol is IPv6 then the source and destination
addresses are first hashed using ipv6_addr_hash.
For fragmented TCP or UDP packets and all other IPv4 and
IPv6 protocol traffic, the source and destination port
information is omitted. For non-IP traffic, the
formula is the same as for the layer2 transmit hash
policy.
This algorithm is not fully 802.3ad compliant. A
single TCP or UDP conversation containing both
fragmented and unfragmented packets will see packets
striped across two interfaces. This may result in out
of order delivery. Most traffic types will not meet
this criteria, as TCP rarely fragments traffic, and
most UDP traffic is not involved in extended
conversations. Other implementations of 802.3ad may
or may not tolerate this noncompliance.
encap2+3
This policy uses the same formula as layer2+3 but it
relies on skb_flow_dissect to obtain the header fields
which might result in the use of inner headers if an
encapsulation protocol is used. For example this will
improve the performance for tunnel users because the
packets will be distributed according to the encapsulated
flows.
encap3+4
This policy uses the same formula as layer3+4 but it
relies on skb_flow_dissect to obtain the header fields
which might result in the use of inner headers if an
encapsulation protocol is used. For example this will
improve the performance for tunnel users because the
packets will be distributed according to the encapsulated
flows.
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
layer2+3 value was added for bonding version 3.2.2.
resend_igmp
Specifies the number of IGMP membership reports to be issued after
a failover event. One membership report is issued immediately after
the failover, subsequent packets are sent in each 200ms interval.
The valid range is 0 - 255; the default value is 1. A value of 0
prevents the IGMP membership report from being issued in response
to the failover event.
This option is useful for bonding modes balance-rr (0), active-backup
(1), balance-tlb (5) and balance-alb (6), in which a failover can
switch the IGMP traffic from one slave to another. Therefore a fresh
IGMP report must be issued to cause the switch to forward the incoming
IGMP traffic over the newly selected slave.
This option was added for bonding version 3.7.0.
lp_interval
Specifies the number of seconds between instances where the bonding
driver sends learning packets to each slaves peer switch.
The valid range is 1 - 0x7fffffff; the default value is 1. This Option
has effect only in balance-tlb and balance-alb modes.
3. Configuring Bonding Devices
==============================
You can configure bonding using either your distro's network
initialization scripts, or manually using either iproute2 or the
sysfs interface. Distros generally use one of three packages for the
network initialization scripts: initscripts, sysconfig or interfaces.
Recent versions of these packages have support for bonding, while older
versions do not.
We will first describe the options for configuring bonding for
distros using versions of initscripts, sysconfig and interfaces with full
or partial support for bonding, then provide information on enabling
bonding without support from the network initialization scripts (i.e.,
older versions of initscripts or sysconfig).
If you're unsure whether your distro uses sysconfig,
initscripts or interfaces, or don't know if it's new enough, have no fear.
Determining this is fairly straightforward.
First, look for a file called interfaces in /etc/network directory.
If this file is present in your system, then your system use interfaces. See
Configuration with Interfaces Support.
Else, issue the command:
$ rpm -qf /sbin/ifup
It will respond with a line of text starting with either
"initscripts" or "sysconfig," followed by some numbers. This is the
package that provides your network initialization scripts.
Next, to determine if your installation supports bonding,
issue the command: