diff options
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/00-INDEX | 3 | ||||
-rw-r--r-- | Documentation/networking/NAPI_HOWTO.txt | 766 | ||||
-rw-r--r-- | Documentation/networking/dccp.txt | 21 | ||||
-rw-r--r-- | Documentation/networking/dgrs.txt | 52 | ||||
-rw-r--r-- | Documentation/networking/ip-sysctl.txt | 17 | ||||
-rw-r--r-- | Documentation/networking/mac80211-injection.txt | 32 | ||||
-rw-r--r-- | Documentation/networking/multiqueue.txt | 10 | ||||
-rw-r--r-- | Documentation/networking/netconsole.txt | 99 | ||||
-rw-r--r-- | Documentation/networking/netdevices.txt | 15 | ||||
-rw-r--r-- | Documentation/networking/sk98lin.txt | 568 |
10 files changed, 741 insertions, 842 deletions
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index d63f480afb7..153d84d281e 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -96,6 +96,9 @@ routing.txt - the new routing mechanism shaper.txt - info on the module that can shape/limit transmitted traffic. +sk98lin.txt + - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit + Ethernet Adapter family driver info skfp.txt - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. smc9.txt diff --git a/Documentation/networking/NAPI_HOWTO.txt b/Documentation/networking/NAPI_HOWTO.txt deleted file mode 100644 index 7907435a661..00000000000 --- a/Documentation/networking/NAPI_HOWTO.txt +++ /dev/null @@ -1,766 +0,0 @@ -HISTORY: -February 16/2002 -- revision 0.2.1: -COR typo corrected -February 10/2002 -- revision 0.2: -some spell checking ;-> -January 12/2002 -- revision 0.1 -This is still work in progress so may change. -To keep up to date please watch this space. - -Introduction to NAPI -==================== - -NAPI is a proven (www.cyberus.ca/~hadi/usenix-paper.tgz) technique -to improve network performance on Linux. For more details please -read that paper. -NAPI provides a "inherent mitigation" which is bound by system capacity -as can be seen from the following data collected by Robert on Gigabit -ethernet (e1000): - - Psize Ipps Tput Rxint Txint Done Ndone - --------------------------------------------------------------- - 60 890000 409362 17 27622 7 6823 - 128 758150 464364 21 9301 10 7738 - 256 445632 774646 42 15507 21 12906 - 512 232666 994445 241292 19147 241192 1062 - 1024 119061 1000003 872519 19258 872511 0 - 1440 85193 1000003 946576 19505 946569 0 - - -Legend: -"Ipps" stands for input packets per second. -"Tput" == packets out of total 1M that made it out. -"txint" == transmit completion interrupts seen -"Done" == The number of times that the poll() managed to pull all -packets out of the rx ring. Note from this that the lower the -load the more we could clean up the rxring -"Ndone" == is the converse of "Done". Note again, that the higher -the load the more times we couldn't clean up the rxring. - -Observe that: -when the NIC receives 890Kpackets/sec only 17 rx interrupts are generated. -The system cant handle the processing at 1 interrupt/packet at that load level. -At lower rates on the other hand, rx interrupts go up and therefore the -interrupt/packet ratio goes up (as observable from that table). So there is -possibility that under low enough input, you get one poll call for each -input packet caused by a single interrupt each time. And if the system -cant handle interrupt per packet ratio of 1, then it will just have to -chug along .... - - -0) Prerequisites: -================== -A driver MAY continue using the old 2.4 technique for interfacing -to the network stack and not benefit from the NAPI changes. -NAPI additions to the kernel do not break backward compatibility. -NAPI, however, requires the following features to be available: - -A) DMA ring or enough RAM to store packets in software devices. - -B) Ability to turn off interrupts or maybe events that send packets up -the stack. - -NAPI processes packet events in what is known as dev->poll() method. -Typically, only packet receive events are processed in dev->poll(). -The rest of the events MAY be processed by the regular interrupt handler -to reduce processing latency (justified also because there are not that -many of them). -Note, however, NAPI does not enforce that dev->poll() only processes -receive events. -Tests with the tulip driver indicated slightly increased latency if -all of the interrupt handler is moved to dev->poll(). Also MII handling -gets a little trickier. -The example used in this document is to move the receive processing only -to dev->poll(); this is shown with the patch for the tulip driver. -For an example of code that moves all the interrupt driver to -dev->poll() look at the ported e1000 code. - -There are caveats that might force you to go with moving everything to -dev->poll(). Different NICs work differently depending on their status/event -acknowledgement setup. -There are two types of event register ACK mechanisms. - I) what is known as Clear-on-read (COR). - when you read the status/event register, it clears everything! - The natsemi and sunbmac NICs are known to do this. - In this case your only choice is to move all to dev->poll() - - II) Clear-on-write (COW) - i) you clear the status by writing a 1 in the bit-location you want. - These are the majority of the NICs and work the best with NAPI. - Put only receive events in dev->poll(); leave the rest in - the old interrupt handler. - ii) whatever you write in the status register clears every thing ;-> - Cant seem to find any supported by Linux which do this. If - someone knows such a chip email us please. - Move all to dev->poll() - -C) Ability to detect new work correctly. -NAPI works by shutting down event interrupts when there's work and -turning them on when there's none. -New packets might show up in the small window while interrupts were being -re-enabled (refer to appendix 2). A packet might sneak in during the period -we are enabling interrupts. We only get to know about such a packet when the -next new packet arrives and generates an interrupt. -Essentially, there is a small window of opportunity for a race condition -which for clarity we'll refer to as the "rotting packet". - -This is a very important topic and appendix 2 is dedicated for more -discussion. - -Locking rules and environmental guarantees -========================================== - --Guarantee: Only one CPU at any time can call dev->poll(); this is because -only one CPU can pick the initial interrupt and hence the initial -netif_rx_schedule(dev); -- The core layer invokes devices to send packets in a round robin format. -This implies receive is totally lockless because of the guarantee that only -one CPU is executing it. -- contention can only be the result of some other CPU accessing the rx -ring. This happens only in close() and suspend() (when these methods -try to clean the rx ring); -****guarantee: driver authors need not worry about this; synchronization -is taken care for them by the top net layer. --local interrupts are enabled (if you dont move all to dev->poll()). For -example link/MII and txcomplete continue functioning just same old way. -This improves the latency of processing these events. It is also assumed that -the receive interrupt is the largest cause of noise. Note this might not -always be true. -[according to Manfred Spraul, the winbond insists on sending one -txmitcomplete interrupt for each packet (although this can be mitigated)]. -For these broken drivers, move all to dev->poll(). - -For the rest of this text, we'll assume that dev->poll() only -processes receive events. - -new methods introduce by NAPI -============================= - -a) netif_rx_schedule(dev) -Called by an IRQ handler to schedule a poll for device - -b) netif_rx_schedule_prep(dev) -puts the device in a state which allows for it to be added to the -CPU polling list if it is up and running. You can look at this as -the first half of netif_rx_schedule(dev) above; the second half -being c) below. - -c) __netif_rx_schedule(dev) -Add device to the poll list for this CPU; assuming that _prep above -has already been called and returned 1. - -d) netif_rx_reschedule(dev, undo) -Called to reschedule polling for device specifically for some -deficient hardware. Read Appendix 2 for more details. - -e) netif_rx_complete(dev) - -Remove interface from the CPU poll list: it must be in the poll list -on current cpu. This primitive is called by dev->poll(), when -it completes its work. The device cannot be out of poll list at this -call, if it is then clearly it is a BUG(). You'll know ;-> - -All of the above methods are used below, so keep reading for clarity. - -Device driver changes to be made when porting NAPI -================================================== - -Below we describe what kind of changes are required for NAPI to work. - -1) introduction of dev->poll() method -===================================== - -This is the method that is invoked by the network core when it requests -for new packets from the driver. A driver is allowed to send upto -dev->quota packets by the current CPU before yielding to the network -subsystem (so other devices can also get opportunity to send to the stack). - -dev->poll() prototype looks as follows: -int my_poll(struct net_device *dev, int *budget) - -budget is the remaining number of packets the network subsystem on the -current CPU can send up the stack before yielding to other system tasks. -*Each driver is responsible for decrementing budget by the total number of -packets sent. - Total number of packets cannot exceed dev->quota. - -dev->poll() method is invoked by the top layer, the driver just sends if it -can to the stack the packet quantity requested. - -more on dev->poll() below after the interrupt changes are explained. - -2) registering dev->poll() method -=================================== - -dev->poll should be set in the dev->probe() method. -e.g: -dev->open = my_open; -. -. -/* two new additions */ -/* first register my poll method */ -dev->poll = my_poll; -/* next register my weight/quanta; can be overridden in /proc */ -dev->weight = 16; -. -. -dev->stop = my_close; - - - -3) scheduling dev->poll() -============================= -This involves modifying the interrupt handler and the code -path which takes the packet off the NIC and sends them to the -stack. - -it's important at this point to introduce the classical D Becker -interrupt processor: - ------------------- -static irqreturn_t -netdevice_interrupt(int irq, void *dev_id, struct pt_regs *regs) -{ - - struct net_device *dev = (struct net_device *)dev_instance; - struct my_private *tp = (struct my_private *)dev->priv; - - int work_count = my_work_count; - status = read_interrupt_status_reg(); - if (status == 0) - return IRQ_NONE; /* Shared IRQ: not us */ - if (status == 0xffff) - return IRQ_HANDLED; /* Hot unplug */ - if (status & error) - do_some_error_handling() - - do { - acknowledge_ints_ASAP(); - - if (status & link_interrupt) { - spin_lock(&tp->link_lock); - do_some_link_stat_stuff(); - spin_lock(&tp->link_lock); - } - - if (status & rx_interrupt) { - receive_packets(dev); - } - - if (status & rx_nobufs) { - make_rx_buffs_avail(); - } - - if (status & tx_related) { - spin_lock(&tp->lock); - tx_ring_free(dev); - if (tx_died) - restart_tx(); - spin_unlock(&tp->lock); - } - - status = read_interrupt_status_reg(); - - } while (!(status & error) || more_work_to_be_done); - return IRQ_HANDLED; -} - ----------------------------------------------------------------------- - -We now change this to what is shown below to NAPI-enable it: - ----------------------------------------------------------------------- -static irqreturn_t -netdevice_interrupt(int irq, void *dev_id, struct pt_regs *regs) -{ - struct net_device *dev = (struct net_device *)dev_instance; - struct my_private *tp = (struct my_private *)dev->priv; - - status = read_interrupt_status_reg(); - if (status == 0) - return IRQ_NONE; /* Shared IRQ: not us */ - if (status == 0xffff) - return IRQ_HANDLED; /* Hot unplug */ - if (status & error) - do_some_error_handling(); - - do { -/************************ start note *********************************/ - acknowledge_ints_ASAP(); // dont ack rx and rxnobuff here -/************************ end note *********************************/ - - if (status & link_interrupt) { - spin_lock(&tp->link_lock); - do_some_link_stat_stuff(); - spin_unlock(&tp->link_lock); - } -/************************ start note *********************************/ - if (status & rx_interrupt || (status & rx_nobuffs)) { - if (netif_rx_schedule_prep(dev)) { - - /* disable interrupts caused - * by arriving packets */ - disable_rx_and_rxnobuff_ints(); - /* tell system we have work to be done. */ - __netif_rx_schedule(dev); - } else { - printk("driver bug! interrupt while in poll\n"); - /* FIX by disabling interrupts */ - disable_rx_and_rxnobuff_ints(); - } - } -/************************ end note note *********************************/ - - if (status & tx_related) { - spin_lock(&tp->lock); - tx_ring_free(dev); - - if (tx_died) - restart_tx(); - spin_unlock(&tp->lock); - } - - status = read_interrupt_status_reg(); - -/************************ start note *********************************/ - } while (!(status & error) || more_work_to_be_done(status)); -/************************ end note note *********************************/ - return IRQ_HANDLED; -} - ---------------------------------------------------------------------- - - -We note several things from above: - -I) Any interrupt source which is caused by arriving packets is now -turned off when it occurs. Depending on the hardware, there could be -several reasons that arriving packets would cause interrupts; these are the -interrupt sources we wish to avoid. The two common ones are a) a packet -arriving (rxint) b) a packet arriving and finding no DMA buffers available -(rxnobuff) . -This means also acknowledge_ints_ASAP() will not clear the status -register for those two items above; clearing is done in the place where -proper work is done within NAPI; at the poll() and refill_rx_ring() -discussed further below. -netif_rx_schedule_prep() returns 1 if device is in running state and -gets successfully added to the core poll list. If we get a zero value -we can _almost_ assume are already added to the list (instead of not running. -Logic based on the fact that you shouldn't get interrupt if not running) -We rectify this by disabling rx and rxnobuf interrupts. - -II) that receive_packets(dev) and make_rx_buffs_avail() may have disappeared. -These functionalities are still around actually...... - -infact, receive_packets(dev) is very close to my_poll() and -make_rx_buffs_avail() is invoked from my_poll() - -4) converting receive_packets() to dev->poll() -=============================================== - -We need to convert the classical D Becker receive_packets(dev) to my_poll() - -First the typical receive_packets() below: -------------------------------------------------------------------- - -/* this is called by interrupt handler */ -static void receive_packets (struct net_device *dev) -{ - - struct my_private *tp = (struct my_private *)dev->priv; - rx_ring = tp->rx_ring; - cur_rx = tp->cur_rx; - int entry = cur_rx % RX_RING_SIZE; - int received = 0; - int rx_work_limit = tp->dirty_rx + RX_RING_SIZE - tp->cur_rx; - - while (rx_ring_not_empty) { - u32 rx_status; - unsigned int rx_size; - unsigned int pkt_size; - struct sk_buff *skb; - /* read size+status of next frame from DMA ring buffer */ - /* the number 16 and 4 are just examples */ - rx_status = le32_to_cpu (*(u32 *) (rx_ring + ring_offset)); - rx_size = rx_status >> 16; - pkt_size = rx_size - 4; - - /* process errors */ - if ((rx_size > (MAX_ETH_FRAME_SIZE+4)) || - (!(rx_status & RxStatusOK))) { - netdrv_rx_err (rx_status, dev, tp, ioaddr); - return; - } - - if (--rx_work_limit < 0) - break; - - /* grab a skb */ - skb = dev_alloc_skb (pkt_size + 2); - if (skb) { - . - . - netif_rx (skb); - . - . - } else { /* OOM */ - /*seems very driver specific ... some just pass - whatever is on the ring already. */ - } - - /* move to the next skb on the ring */ - entry = (++tp->cur_rx) % RX_RING_SIZE; - received++ ; - - } - - /* store current ring pointer state */ - tp->cur_rx = cur_rx; - - /* Refill the Rx ring buffers if they are needed */ - refill_rx_ring(); - . - . - -} -------------------------------------------------------------------- -We change it to a new one below; note the additional parameter in -the call. - -------------------------------------------------------------------- - -/* this is called by the network core */ -static int my_poll (struct net_device *dev, int *budget) -{ - - struct my_private *tp = (struct my_private *)dev->priv; - rx_ring = tp->rx_ring; - cur_rx = tp->cur_rx; - int entry = cur_rx % RX_BUF_LEN; - /* maximum packets to send to the stack */ -/************************ note note *********************************/ - int rx_work_limit = dev->quota; - -/************************ end note note *********************************/ - do { // outer beginning loop starts here - - clear_rx_status_register_bit(); - - while (rx_ring_not_empty) { - u32 rx_status; - unsigned int rx_size; - unsigned int pkt_size; - struct sk_buff *skb; - /* read size+status of next frame from DMA ring buffer */ - /* the number 16 and 4 are just examples */ - rx_status = le32_to_cpu (*(u32 *) (rx_ring + ring_offset)); - rx_size = rx_status >> 16; - pkt_size = rx_size - 4; - - /* process errors */ - if ((rx_size > (MAX_ETH_FRAME_SIZE+4)) || - (!(rx_status & RxStatusOK))) { - netdrv_rx_err (rx_status, dev, tp, ioaddr); - return 1; - } - -/************************ note note *********************************/ - if (--rx_work_limit < 0) { /* we got packets, but no quota */ - /* store current ring pointer state */ - tp->cur_rx = cur_rx; - - /* Refill the Rx ring buffers if they are needed */ - refill_rx_ring(dev); - goto not_done; - } -/********************** end note **********************************/ - - /* grab a skb */ - skb = dev_alloc_skb (pkt_size + 2); - if (skb) { - . - . -/************************ note note *********************************/ - netif_receive_skb (skb); -/********************** end note **********************************/ - . - . - } else { /* OOM */ - /*seems very driver specific ... common is just pass - whatever is on the ring already. */ - } - - /* move to the next skb on the ring */ - entry = (++tp->cur_rx) % RX_RING_SIZE; - received++ ; - - } - - /* store current ring pointer state */ - tp->cur_rx = cur_rx; - - /* Refill the Rx ring buffers if they are needed */ - refill_rx_ring(dev); - - /* no packets on ring; but new ones can arrive since we last - checked */ - status = read_interrupt_status_reg(); - if (rx status is not set) { - /* If something arrives in this narrow window, - an interrupt will be generated */ - goto done; - } - /* done! at least that's what it looks like ;-> - if new packets came in after our last check on status bits - they'll be caught by the while check and we go back and clear them - since we havent exceeded our quota */ - } while (rx_status_is_set); - -done: - -/************************ note note *********************************/ - dev->quota -= received; - *budget -= received; - - /* If RX ring is not full we are out of memory. */ - if (tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL) - goto oom; - - /* we are happy/done, no more packets on ring; put us back - to where we can start processing interrupts again */ - netif_rx_complete(dev); - enable_rx_and_rxnobuf_ints(); - - /* The last op happens after poll completion. Which means the following: - * 1. it can race with disabling irqs in irq handler (which are done to - * schedule polls) - * 2. it can race with dis/enabling irqs in other poll threads - * 3. if an irq raised after the beginning of the outer beginning - * loop (marked in the code above), it will be immediately - * triggered here. - * - * Summarizing: the logic may result in some redundant irqs both - * due to races in masking and due to too late acking of already - * processed irqs. The good news: no events are ever lost. - */ - - return 0; /* done */ - -not_done: - if (tp->cur_rx - tp->dirty_rx > RX_RING_SIZE/2 || - tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL) - refill_rx_ring(dev); - - if (!received) { - printk("received==0\n"); - received = 1; - } - dev->quota -= received; - *budget -= received; - return 1; /* not_done */ - -oom: - /* Start timer, stop polling, but do not enable rx interrupts. */ - start_poll_timer(dev); - return 0; /* we'll take it from here so tell core "done"*/ - -/************************ End note note *********************************/ -} -------------------------------------------------------------------- - -From above we note that: -0) rx_work_limit = dev->quota -1) refill_rx_ring() is in charge of clearing the bit for rxnobuff when -it does the work. -2) We have a done and not_done state. -3) instead of netif_rx() we call netif_receive_skb() to pass the skb. -4) we have a new way of handling oom condition -5) A new outer for (;;) loop has been added. This serves the purpose of -ensuring that if a new packet has come in, after we are all set and done, -and we have not exceeded our quota that we continue sending packets up. - - ------------------------------------------------------------ -Poll timer code will need to do the following: - -a) - - if (tp->cur_rx - tp->dirty_rx > RX_RING_SIZE/2 || - tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL) - refill_rx_ring(dev); - - /* If RX ring is not full we are still out of memory. - Restart the timer again. Else we re-add ourselves - to the master poll list. - */ - - if (tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL) - restart_timer(); - - else netif_rx_schedule(dev); /* we are back on the poll list */ - -5) dev->close() and dev->suspend() issues -========================================== -The driver writer needn't worry about this; the top net layer takes -care of it. - -6) Adding new Stats to /proc -============================= -In order to debug some of the new features, we introduce new stats -that need to be collected. -TODO: Fill this later. - -APPENDIX 1: discussion on using ethernet HW FC -============================================== -Most chips with FC only send a pause packet when they run out of Rx buffers. -Since packets are pulled off the DMA ring by a softirq in NAPI, -if the system is slow in grabbing them and we have a high input -rate (faster than the system's capacity to remove packets), then theoretically -there will only be one rx interrupt for all packets during a given packetstorm. -Under low load, we might have a single interrupt per packet. -FC should be programmed to apply in the case when the system cant pull out -packets fast enough i.e send a pause only when you run out of rx buffers. -Note FC in itself is a good solution but we have found it to not be -much of a commodity feature (both in NICs and switches) and hence falls -under the same category as using NIC based mitigation. Also, experiments -indicate that it's much harder to resolve the resource allocation -issue (aka lazy receiving that NAPI offers) and hence quantify its usefulness -proved harder. In any case, FC works even better with NAPI but is not -necessary. - - -APPENDIX 2: the "rotting packet" race-window avoidance scheme -============================================================= - -There are two types of associations seen here - -1) status/int which honors level triggered IRQ - -If a status bit for receive or rxnobuff is set and the corresponding -interrupt-enable bit is not on, then no interrupts will be generated. However, -as soon as the "interrupt-enable" bit is unmasked, an immediate interrupt is -generated. [assuming the status bit was not turned off]. -Generally the concept of level triggered IRQs in association with a status and -interrupt-enable CSR register set is used to avoid the race. - -If we take the example of the tulip: -"pending work" is indicated by the status bit(CSR5 in tulip). -the corresponding interrupt bit (CSR7 in tulip) might be turned off (but -the CSR5 will continue to be turned on with new packet arrivals even if -we clear it the first time) -Very important is the fact that if we turn on the interrupt bit on when -status is set that an immediate irq is triggered. - -If we cleared the rx ring and proclaimed there was "no more work -to be done" and then went on to do a few other things; then when we enable -interrupts, there is a possibility that a new packet might sneak in during -this phase. It helps to look at the pseudo code for the tulip poll -routine: - --------------------------- - do { - ACK; - while (ring_is_not_empty()) { - work-work-work - if quota is exceeded: exit, no touching irq status/mask - } - /* No packets, but new can arrive while we are doing this*/ - CSR5 := read - if (CSR5 is not set) { - /* If something arrives in this narrow window here, - * where the comments are ;-> irq will be generated */ - unmask irqs; - exit poll; - } - } while (rx_status_is_set); ------------------------- - -CSR5 bit of interest is only the rx status. -If you look at the last if statement: -you just finished grabbing all the packets from the rx ring .. you check if -status bit says there are more packets just in ... it says none; you then -enable rx interrupts again; if a new packet just came in during this check, -we are counting that CSR5 will be set in that small window of opportunity -and that by re-enabling interrupts, we would actually trigger an interrupt -to register the new packet for processing. - -[The above description nay be very verbose, if you have better wording -that will make this more understandable, please suggest it.] - -2) non-capable hardware - -These do not generally respect level triggered IRQs. Normally, -irqs may be lost while being masked and the only way to leave poll is to do -a double check for new input after netif_rx_complete() is invoked -and re-enable polling (after seeing this new input). - -Sample code: - ---------- - . - . -restart_poll: - while (ring_is_not_empty()) { - work-work-work - if quota is exceeded: exit, not touching irq status/mask - } - . - . - . - enable_rx_interrupts() - netif_rx_complete(dev); - if (ring_has_new_packet() && netif_rx_reschedule(dev, received)) { - disable_rx_and_rxnobufs() - goto restart_poll - } while (rx_status_is_set); ---------- - -Basically netif_rx_complete() removes us from the poll list, but because a -new packet which will never be caught due to the possibility of a race -might come in, we attempt to re-add ourselves to the poll list. - - - - -APPENDIX 3: Scheduling issues. -============================== -As seen NAPI moves processing to softirq level. Linux uses the ksoftirqd as the -general solution to schedule softirq's to run before next interrupt and by putting -them under scheduler control. Also this prevents consecutive softirq's from -monopolize the CPU. This also have the effect that the priority of ksoftirq needs -to be considered when running very CPU-intensive applications and networking to -get the proper balance of softirq/user balance. Increasing ksoftirq priority to 0 -(eventually more) is reported cure problems with low network performance at high -CPU load. - -Most used processes in a GIGE router: -USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND -root 3 0.2 0.0 0 0 ? RWN Aug 15 602:00 (ksoftirqd_CPU0) -root 232 0.0 7.9 41400 40884 ? S Aug 15 74:12 gated - --------------------------------------------------------------------- - -relevant sites: -================== -ftp://robur.slu.se/pub/Linux/net-development/NAPI/ - - --------------------------------------------------------------------- -TODO: Write net-skeleton.c driver. -------------------------------------------------------------- - -Authors: -======== -Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> -Jamal Hadi Salim <hadi@cyberus.ca> -Robert Olsson <Robert.Olsson@data.slu.se> - -Acknowledgements: -================ -People who made this document better: - -Lennert Buytenhek <buytenh@gnu.org> -Andrew Morton <akpm@zip.com.au> -Manfred Spraul <manfred@colorfullife.com> -Donald Becker <becker@scyld.com> -Jeff Garzik <jgarzik@pobox.com> diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index 4504cc59e40..afb66f9a8af 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -38,8 +38,13 @@ Socket options DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, the socket will fall back to 0 (which means that no meaningful service code -is present). Connecting sockets set at most one service option; for -listening sockets, multiple service codes can be specified. +is present). On active sockets this is set before connect(); specifying more +than one code has no effect (all subsequent service codes are ignored). The +case is different for passive sockets, where multiple service codes (up to 32) +can be set before calling bind(). + +DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet +size (application payload size) in bytes, see RFC 4340, section 14. DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums @@ -50,12 +55,13 @@ be enabled at the receiver, too with suitable choice of CsCov. DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the range 0..15 are acceptable. The default setting is 0 (full coverage), values between 1..15 indicate partial coverage. -DCCP_SOCKOPT_SEND_CSCOV is for the receiver and has a different meaning: it +DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it sets a threshold, where again values 0..15 are acceptable. The default of 0 means that all packets with a partial coverage will be discarded. Values in the range 1..15 indicate that packets with minimally such a coverage value are also acceptable. The higher the number, the more - restrictive this setting (see [RFC 4340, sec. 9.2.1]). + restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage + settings are inherited to the child socket after accept(). The following two options apply to CCID 3 exclusively and are getsockopt()-only. In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned. @@ -112,9 +118,14 @@ tx_qlen = 5 The size of the transmit buffer in packets. A value of 0 corresponds to an unbounded transmit buffer. +sync_ratelimit = 125 ms + The timeout between subsequent DCCP-Sync packets sent in response to + sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit + of this parameter is milliseconds; a value of 0 disables rate-limiting. + Notes ===== DCCP does not travel through NAT successfully at present on many boxes. This is -because the checksum covers the psuedo-header as per TCP and UDP. Linux NAT +because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT support for DCCP has been added. diff --git a/Documentation/networking/dgrs.txt b/Documentation/networking/dgrs.txt deleted file mode 100644 index 1aa1bb3f94a..00000000000 --- a/Documentation/networking/dgrs.txt +++ /dev/null @@ -1,52 +0,0 @@ - The Digi International RightSwitch SE-X (dgrs) Device Driver - -This is a Linux driver for the Digi International RightSwitch SE-X -EISA and PCI boards. These are 4 (EISA) or 6 (PCI) port Ethernet -switches and a NIC combined into a single board. This driver can -be compiled into the kernel statically or as a loadable module. - -There is also a companion management tool, called "xrightswitch". -The management tool lets you watch the performance graphically, -as well as set the SNMP agent IP and IPX addresses, IEEE Spanning -Tree, and Aging time. These can also be set from the command line -when the driver is loaded. The driver command line options are: - - debug=NNN Debug printing level - dma=0/1 Disable/Enable DMA on PCI card - spantree=0/1 Disable/Enable IEEE spanning tree - hashexpire=NNN Change address aging time (default 300 seconds) - ipaddr=A,B,C,D Set SNMP agent IP address i.e. 199,86,8,221 - iptrap=A,B,C,D Set SNMP agent IP trap address i.e. 199,86,8,221 - ipxnet=NNN Set SNMP agent IPX network number - nicmode=0/1 Disable/Enable multiple NIC mode - -There is also a tool for setting up input and output packet filters -on each port, called "dgrsfilt". - -Both the management tool and the filtering tool are available -separately from the following FTP site: - - ftp://ftp.dgii.com/drivers/rightswitch/linux/ - -When nicmode=1, the board and driver operate as 4 or 6 individual -NIC ports (eth0...eth5) instead of as a switch. All switching -functions are disabled. In the future, the board firmware may include -a routing cache when in this mode. - -Copyright 1995-1996 Digi International Inc. - -This software may be used and distributed according to the terms -of the GNU General Public License, incorporated herein by reference. - -For information on purchasing a RightSwitch SE-4 or SE-6 -board, please contact Digi's sales department at 1-612-912-3444 -or 1-800-DIGIBRD. Outside the U.S., please check our Web page at: - - http://www.dgii.com - -for sales offices worldwide. Tech support is also available through -the channels listed on the Web site, although as long as I am -employed on networking products at Digi I will be happy to provide -any bug fixes that may be needed. - --Rick Richardson, rick@dgii.com diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 32c2e9da5f3..6ae2feff308 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -180,13 +180,20 @@ tcp_fin_timeout - INTEGER to live longer. Cf. tcp_max_orphans. tcp_frto - INTEGER - Enables F-RTO, an enhanced recovery algorithm for TCP retransmission + Enables Forward RTO-Recovery (F-RTO) defined in RFC4138. + F-RTO is an enhanced recovery algorithm for TCP retransmission timeouts. It is particularly beneficial in wireless environments where packet loss is typically due to random radio interference - rather than intermediate router congestion. If set to 1, basic - version is enabled. 2 enables SACK enhanced F-RTO, which is - EXPERIMENTAL. The basic version can be used also when SACK is - enabled for a flow through tcp_sack sysctl. + rather than intermediate router congestion. FRTO is sender-side + only modification. Therefore it does not require any support from + the peer, but in a typical case, however, where wireless link is + the local access link and most of the data flows downlink, the + faraway servers should have FRTO enabled to take advantage of it. + If set to 1, basic version is enabled. 2 enables SACK enhanced + F-RTO if flow uses SACK. The basic version can be used also when + SACK is in use though scenario(s) with it exists where FRTO + interacts badly with the packet counting of the SACK enabled TCP + flow. tcp_frto_response - INTEGER When F-RTO has detected that a TCP retransmission timeout was diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt index 53ef7a06f49..84906ef3ed6 100644 --- a/Documentation/networking/mac80211-injection.txt +++ b/Documentation/networking/mac80211-injection.txt @@ -13,15 +13,35 @@ The radiotap format is discussed in ./Documentation/networking/radiotap-headers.txt. Despite 13 radiotap argument types are currently defined, most only make sense -to appear on received packets. Currently three kinds of argument are used by -the injection code, although it knows to skip any other arguments that are -present (facilitating replay of captured radiotap headers directly): +to appear on received packets. The following information is parsed from the +radiotap headers and used to control injection: - - IEEE80211_RADIOTAP_RATE - u8 arg in 500kbps units (0x02 --> 1Mbps) + * IEEE80211_RADIOTAP_RATE - - IEEE80211_RADIOTAP_ANTENNA - u8 arg, 0x00 = ant1, 0x01 = ant2 + rate in 500kbps units, automatic if invalid or not present - - IEEE80211_RADIOTAP_DBM_TX_POWER - u8 arg, dBm + + * IEEE80211_RADIOTAP_ANTENNA + + antenna to use, automatic if not present + + + * IEEE80211_RADIOTAP_DBM_TX_POWER + + transmit power in dBm, automatic if not present + + + * IEEE80211_RADIOTAP_FLAGS + + IEEE80211_RADIOTAP_F_FCS: FCS will be removed and recalculated + IEEE80211_RADIOTAP_F_WEP: frame will be encrypted if key available + IEEE80211_RADIOTAP_F_FRAG: frame will be fragmented if longer than the + current fragmentation threshold. Note that + this flag is only reliable when software + fragmentation is enabled) + +The injection code can also skip all other currently defined radiotap fields +facilitating replay of captured radiotap headers directly. Here is an example valid radiotap header defining these three parameters diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt index 00b60cce222..ea5a42e8f79 100644 --- a/Documentation/networking/multiqueue.txt +++ b/Documentation/networking/multiqueue.txt @@ -58,9 +58,13 @@ software, so it's a straight round-robin qdisc. It uses the same syntax and classification priomap that sch_prio uses, so it should be intuitive to configure for people who've used sch_prio. -The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been -built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of -bands requested is equal to the number of queues on the hardware. If they +In order to utilitize the multiqueue features of the qdiscs, the network +device layer needs to enable multiple queue support. This can be done by +selecting NETDEVICES_MULTIQUEUE under Drivers. + +The PRIO qdisc naturally plugs into a multiqueue device. If +NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of +bands requested is compared to the number of queues on the hardware. If they are equal, it sets a one-to-one mapping up between the queues and bands. If they're not equal, it will not load the qdisc. This is the same behavior for RR. Once the association is made, any skb that is classified will have diff --git a/Documentation/networking/netconsole.txt b/Documentation/networking/netconsole.txt index 1caa6c73469..3c2f2b32863 100644 --- a/Documentation/networking/netconsole.txt +++ b/Documentation/networking/netconsole.txt @@ -3,6 +3,10 @@ started by Ingo Molnar <mingo@redhat.com>, 2001.09.17 2.6 port and netpoll api by Matt Mackall <mpm@selenic.com>, Sep 9 2003 Please send bug reports to Matt Mackall <mpm@selenic.com> +and Satyam Sharma <satyam.sharma@gmail.com> + +Introduction: +============= This module logs kernel printk messages over UDP allowing debugging of problem where disk logging fails and serial consoles are impractical. @@ -13,6 +17,9 @@ the specified interface as soon as possible. While this doesn't allow capture of early kernel panics, it does capture most of the boot process. +Sender and receiver configuration: +================================== + It takes a string configuration parameter "netconsole" in the following format: @@ -34,21 +41,113 @@ Examples: insmod netconsole netconsole=@/,@10.0.0.2/ +It also supports logging to multiple remote agents by specifying +parameters for the multiple agents separated by semicolons and the +complete string enclosed in "quotes", thusly: + + modprobe netconsole netconsole="@/,@10.0.0.2/;@/eth1,6892@10.0.0.3/" + Built-in netconsole starts immediately after the TCP stack is initialized and attempts to bring up the supplied dev at the supplied address. The remote host can run either 'netcat -u -l -p <port>' or syslogd. +Dynamic reconfiguration: +======================== + +Dynamic reconfigurability is a useful addition to netconsole that enables +remote logging targets to be dynamically added, removed, or have their +parameters reconfigured at runtime from a configfs-based userspace interface. +[ Note that the parameters of netconsole targets that were specified/created +from the boot/module option are not exposed via this interface, and hence +cannot be modified dynamically. ] + +To include this feature, select CONFIG_NETCONSOLE_DYNAMIC when building the +netconsole module (or kernel, if netconsole is built-in). + +Some examples follow (where configfs is mounted at the /sys/kernel/config +mountpoint). + +To add a remote logging target (target names can be arbitrary): + + cd /sys/kernel/config/netconsole/ + mkdir target1 + +Note that newly created targets have default parameter values (as mentioned +above) and are disabled by default -- they must first be enabled by writing +"1" to the "enabled" attribute (usually after setting parameters accordingly) +as described below. + +To remove a target: + + rmdir /sys/kernel/config/netconsole/othertarget/ + +The interface exposes these parameters of a netconsole target to userspace: + + enabled Is this target currently enabled? (read-write) + dev_name Local network interface name (read-write) + local_port Source UDP port to use (read-write) + remote_port Remote agent's UDP port (read-write) + local_ip Source IP address to use (read-write) + remote_ip Remote agent's IP address (read-write) + local_mac Local interface's MAC address (read-only) + remote_mac Remote agent's MAC address (read-write) + +The "enabled" attribute is also used to control whether the parameters of +a target can be updated or not -- you can modify the parameters of only +disabled targets (i.e. if "enabled" is 0). + +To update a target's parameters: + + cat enabled # check if enabled is 1 + echo 0 > enabled # disable the target (if required) + echo eth2 > dev_name # set local interface + echo 10.0.0.4 > remote_ip # update some parameter + echo cb:a9:87:65:43:21 > remote_mac # update more parameters + echo 1 > enabled # enable target again + +You can also update the local interface dynamically. This is especially +useful if you want to use interfaces that have newly come up (and may not +have existed when netconsole was loaded / initialized). + +Miscellaneous notes: +==================== + WARNING: the default target ethernet setting uses the broadcast ethernet address to send packets, which can cause increased load on other systems on the same ethernet segment. +TIP: some LAN switches may be configured to suppress ethernet broadcasts +so it is advised to explicitly specify the remote agents' MAC addresses +from the config parameters passed to netconsole. + +TIP: to find out the MAC address of, say, 10.0.0.2, you may try using: + + ping -c 1 10.0.0.2 ; /sbin/arp -n | grep 10.0.0.2 + +TIP: in case the remote logging agent is on a separate LAN subnet than +the sender, it is suggested to try specifying the MAC address of the +default gateway (you may use /sbin/route -n to find it out) as the +remote MAC address instead. + NOTE: the network device (eth1 in the above case) can run any kind of other network traffic, netconsole is not intrusive. Netconsole might cause slight delays in other traffic if the volume of kernel messages is high, but should have no other impact. +NOTE: if you find that the remote logging agent is not receiving or +printing all messages from the sender, it is likely that you have set +the "console_loglevel" parameter (on the sender) to only send high +priority messages to the console. You can change this at runtime using: + + dmesg -n 8 + +or by specifying "debug" on the kernel command line at boot, to send +all kernel messages to the console. A specific value for this parameter +can also be set using the "loglevel" kernel boot option. See the +dmesg(8) man page and Documentation/kernel-parameters.txt for details. + Netconsole was designed to be as instantaneous as possible, to enable the logging of even the most critical kernel bugs. It works from IRQ contexts as well, and does not enable interrupts while diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index 37869295fc7..d0f71fc7f78 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt @@ -73,7 +73,8 @@ dev->hard_start_xmit: has to lock by itself when needed. It is recommended to use a try lock for this and return NETDEV_TX_LOCKED when the spin lock fails. The locking there should also properly protect against - set_multicast_list. + set_multicast_list. Note that the use of NETIF_F_LLTX is deprecated. + Dont use it for new drivers. Context: Process with BHs disabled or BH (timer), will be called with interrupts disabled by netconsole. @@ -95,9 +96,13 @@ dev->set_multicast_list: Synchronization: netif_tx_lock spinlock. Context: BHs disabled -dev->poll: - Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See - dev_close code and comments in net/core/dev.c for more info. +struct napi_struct synchronization rules +======================================== +napi->poll: + Synchronization: NAPI_STATE_SCHED bit in napi->state. Device + driver's dev->close method will invoke napi_disable() on + all NAPI instances which will do a sleeping poll on the + NAPI_STATE_SCHED napi->state bit, waiting for all pending + NAPI activity to cease. Context: softirq will be called with interrupts disabled by netconsole. - diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt new file mode 100644 index 00000000000..8590a954df1 --- /dev/null +++ b/Documentation/networking/sk98lin.txt @@ -0,0 +1,568 @@ +(C)Copyright 1999-2004 Marvell(R). +All rights reserved +=========================================================================== + +sk98lin.txt created 13-Feb-2004 + +Readme File for sk98lin v6.23 +Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX + +This file contains + 1 Overview + 2 Required Files + 3 Installation + 3.1 Driver Installation + 3.2 Inclusion of adapter at system start + 4 Driver Parameters + 4.1 Per-Port Parameters + 4.2 Adapter Parameters + 5 Large Frame Support + 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) + 7 Troubleshooting + +=========================================================================== + + +1 Overview +=========== + +The sk98lin driver supports the Marvell Yukon and SysKonnect +SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has +been tested with Linux on Intel/x86 machines. +*** + + +2 Required Files +================= + +The linux kernel source. +No additional files required. +*** + + +3 Installation +=============== + +It is recommended to download the latest version of the driver from the +SysKonnect web site www.syskonnect.com. If you have downloaded the latest +driver, the Linux kernel has to be patched before the driver can be +installed. For details on how to patch a Linux kernel, refer to the +patch.txt file. + +3.1 Driver Installation +------------------------ + +The following steps describe the actions that are required to install +the driver and to start it manually. These steps should be carried +out for the initial driver setup. Once confirmed to be ok, they can +be included in the system start. + +NOTE 1: To perform the following tasks you need 'root' access. + +NOTE 2: In case of problems, please read the section "Troubleshooting" + below. + +The driver can either be integrated into the kernel or it can be compiled +as a module. Select the appropriate option during the kernel +configuration. + +Compile/use the driver as a module +---------------------------------- +To compile the driver, go to the directory /usr/src/linux and +execute the command "make menuconfig" or "make xconfig" and proceed as +follows: + +To integrate the driver permanently into the kernel, proceed as follows: + +1. Select the menu "Network device support" and then "Ethernet(1000Mbit)" +2. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" + with (*) +3. Build a new kernel when the configuration of the above options is + finished. +4. Install the new kernel. +5. Reboot your system. + +To use the driver as a module, proceed as follows: + +1. Enable 'loadable module support' in the kernel. +2. For automatic driver start, enable the 'Kernel module loader'. +3. Select the menu "Network device support" and then "Ethernet(1000Mbit)" +4. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" + with (M) +5. Execute the command "make modules". +6. Execute the command "make modules_install". + The appropriate modules will be installed. +7. Reboot your system. + + +Load the module manually +------------------------ +To load the module manually, proceed as follows: + +1. Enter "modprobe sk98lin". +2. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in + your computer and you have a /proc file system, execute the command: + "ls /proc/net/sk98lin/" + This should produce an output containing a line with the following + format: + eth0 eth1 ... + which indicates that your adapter has been found and initialized. + + NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx + adapter installed, the adapters will be listed as 'eth0', + 'eth1', 'eth2', etc. + For each adapter, repeat steps 3 and 4 below. + + NOTE 2: If you have other Ethernet adapters installed, your Marvell + Yukon or SysKonnect SK-98xx adapter will be mapped to the + next available number, e.g. 'eth1'. The mapping is executed + automatically. + The module installation message (displayed either in a system + log file or on the console) prints a line for each adapter + found containing the corresponding 'ethX'. + +3. Select an IP address and assign it to the respective adapter by + entering: + ifconfig eth0 <ip-address> + With this command, the adapter is connected to the Ethernet. + + SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter + is now active, the link status LED of the primary port is active and + the link status LED of the secondary port (on dual port adapters) is + blinking (if the ports are connected to a switch or hub). + SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active. + In addition, you will receive a status message on the console stating + "ethX: network connection up using port Y" and showing the selected + connection parameters (x stands for the ethernet device number + (0,1,2, etc), y stands for the port name (A or B)). + + NOTE: If you are in doubt about IP addresses, ask your network + administrator for assistance. + +4. Your adapter should now be fully operational. + Use 'ping <otherstation>' to verify the connection to other computers + on your network. +5. To check the adapter configuration view /proc/net/sk98lin/[devicename]. + For example by executing: + "cat /proc/net/sk98lin/eth0" + +Unload the module +----------------- +To stop and unload the driver modules, proceed as follows: + +1. Execute the command "ifconfig eth0 down". +2. Execute the command "rmmod sk98lin". + +3.2 Inclusion of adapter at system start +----------------------------------------- + +Since a large number of different Linux distributions are +available, we are unable to describe a general installation procedure +for the driver module. +Because the driver is now integrated in the kernel, installation should +be easy, using the standard mechanism of your distribution. +Refer to the distribution's manual for installation of ethernet adapters. + +*** + +4 Driver Parameters +==================== + +Parameters can be set at the command line after the module has been +loaded with the command 'modprobe'. +In some distributions, the configuration tools are able to pass parameters +to the driver module. + +If you use the kernel module loader, you can set driver parameters +in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier). +To set the driver parameters in this file, proceed as follows: + +1. Insert a line of the form : + options sk98lin ... + For "...", the same syntax is required as described for the command + line parameters of modprobe below. +2. To activate the new parameters, either reboot your computer + or + unload and reload the driver. + The syntax of the driver parameters is: + + modprobe sk98lin parameter=value1[,value2[,value3...]] + + where value1 refers to the first adapter, value2 to the second etc. + +NOTE: All parameters are case sensitive. Write them exactly as shown + below. + +Example: +Suppose you have two adapters. You want to set auto-negotiation +on the first adapter to ON and on the second adapter to OFF. +You also want to set DuplexCapabilities on the first adapter +to FULL, and on the second adapter to HALF. +Then, you must enter: + + modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half + +NOTE: The number of adapters that can be configured this way is + limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM). + The current limit is 16. If you happen to install + more adapters, adjust this and recompile. + + +4.1 Per-Port Parameters +------------------------ + +These settings are available for each port on the adapter. +In the following description, '?' stands for the port for +which you set the parameter (A or B). + +Speed +----- +Parameter: Speed_? +Values: 10, 100, 1000, Auto +Default: Auto + +This parameter is used to set the speed capabilities. It is only valid +for the SK-98xx V2.0 copper adapters. +Usually, the speed is negotiated between the two ports during link +establishment. If this fails, a port can be forced to a specific setting +with this parameter. + +Auto-Negotiation +---------------- +Parameter: AutoNeg_? +Values: On, Off, Sense +Default: On + +The "Sense"-mode automatically detects whether the link partner supports +auto-negotiation or not. + +Duplex Capabilities +------------------- +Parameter: DupCap_? +Values: Half, Full, Both +Default: Both + +This parameters is only relevant if auto-negotiation for this port is +not set to "Sense". If auto-negotiation is set to "On", all three values +are possible. If it is set to "Off", only "Full" and "Half" are allowed. +This parameter is useful if your link partner does not support all +possible combinations. + +Flow Control +------------ +Parameter: FlowCtrl_? +Values: Sym, SymOrRem, LocSend, None +Default: SymOrRem + +This parameter can be used to set the flow control capabilities the +port reports during auto-negotiation. It can be set for each port +individually. +Possible modes: + -- Sym = Symmetric: both link partners are allowed to send + PAUSE frames + -- SymOrRem = SymmetricOrRemote: both or only remote partner + are allowed to send PAUSE frames + -- LocSend = LocalSend: only local link partner is allowed + to send PAUSE frames + -- None = no link partner is allowed to send PAUSE frames + +NOTE: This parameter is ignored if auto-negotiation is set to "Off". + +Role in Master-Slave-Negotiation (1000Base-T only) +-------------------------------------------------- +Parameter: Role_? +Values: Auto, Master, Slave +Default: Auto + +This parameter is only valid for the SK-9821 and SK-9822 adapters. +For two 1000Base-T ports to communicate, one must take the role of the +master (providing timing information), while the other must be the +slave. Usually, this is negotiated between the two ports during link +establishment. If this fails, a port can be forced to a specific setting +with this parameter. + + +4.2 Adapter Parameters +----------------------- + +Connection Type (SK-98xx V2.0 copper adapters only) +--------------- +Parameter: ConType +Values: Auto, 100FD, 100HD, 10FD, 10HD +Default: Auto + +The parameter 'ConType' is a combination of all five per-port parameters +within one single parameter. This simplifies the configuration of both ports +of an adapter card! The different values of this variable reflect the most +meaningful combinations of port parameters. + +The following table shows the values of 'ConType' and the corresponding +combinations of the per-port parameters: + + ConType | DupCap AutoNeg FlowCtrl Role Speed + ----------+------------------------------------------------------ + Auto | Both On SymOrRem Auto Auto + 100FD | Full Off None Auto (ignored) 100 + 100HD | Half Off None Auto (ignored) 100 + 10FD | Full Off None Auto (ignored) 10 + 10HD | Half Off None Auto (ignored) 10 + +Stating any other port parameter together with this 'ConType' variable +will result in a merged configuration of those settings. This due to +the fact, that the per-port parameters (e.g. Speed_? ) have a higher +priority than the combined variable 'ConType'. + +NOTE: This parameter is always used on both ports of the adapter card. + +Interrupt Moderation +-------------------- +Parameter: Moderation +Values: None, Static, Dynamic +Default: None + +Interrupt moderation is employed to limit the maximum number of interrupts +the driver has to serve. That is, one or more interrupts (which indicate any +transmit or receive packet to be processed) are queued until the driver +processes them. When queued interrupts are to be served, is determined by the +'IntsPerSec' parameter, which is explained later below. + +Possible modes: + + -- None - No interrupt moderation is applied on the adapter card. + Therefore, each transmit or receive interrupt is served immediately + as soon as it appears on the interrupt line of the adapter card. + + -- Static - Interrupt moderation is applied on the adapter card. + All transmit and receive interrupts are queued until a complete + moderation interval ends. If such a moderation interval ends, all + queued interrupts are processed in one big bunch without any delay. + The term 'static' reflects the fact, that interrupt moderation is + always enabled, regardless how much network load is currently + passing via a particular interface. In addition, the duration of + the moderation interval has a fixed length that never changes while + the driver is operational. + + -- Dynamic - Interrupt moderation might be applied on the adapter card, + depending on the load of the system. If the driver detects that the + system load is too high, the driver tries to shield the system against + too much network load by enabling interrupt moderation. If - at a later + time - the CPU utilization decreases again (or if the network load is + negligible) the interrupt moderation will automatically be disabled. + +Interrupt moderation should be used when the driver has to handle one or more +interfaces with a high network load, which - as a consequence - leads also to a +high CPU utilization. When moderation is applied in such high network load +situations, CPU load might be reduced by 20-30%. + +NOTE: The drawback of using interrupt moderation is an increase of the round- +trip-time (RTT), due to the queueing and serving of interrupts at dedicated +moderation times. + +Interrupts per second +--------------------- +Parameter: IntsPerSec +Values: 30...40000 (interrupts per second) +Default: 2000 + +This parameter is only used if either static or dynamic interrupt moderation +is used on a network adapter card. Using this parameter if no moderation is +applied will lead to no action performed. + +This parameter determines the length of any interrupt moderation interval. +Assuming that static interrupt moderation is to be used, an 'IntsPerSec' +parameter value of 2000 will lead to an interrupt moderation interval of +500 microseconds. + +NOTE: The duration of the moderation interval is to be chosen with care. +At first glance, selecting a very long duration (e.g. only 100 interrupts per +second) seems to be meaningful, but the increase of packet-processing delay +is tremendous. On the other hand, selecting a very short moderation time might +compensate the use of any moderation being applied. + + +Preferred Port +-------------- +Parameter: PrefPort +Values: A, B +Default: A + +This is used to force the preferred port to A or B (on dual-port network +adapters). The preferred port is the one that is used if both are detected +as fully functional. + +RLMT Mode (Redundant Link Management Technology) +------------------------------------------------ +Parameter: RlmtMode +Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet +Default: CheckLinkState + +RLMT monitors the status of the port. If the link of the active port +fails, RLMT switches immediately to the standby link. The virtual link is +maintained as long as at least one 'physical' link is up. + +Possible modes: + + -- CheckLinkState - Check link state only: RLMT uses the link state + reported by the adapter hardware for each individual port to + determine whether a port can be used for all network traffic or + not. + + -- CheckLocalPort - In this mode, RLMT monitors the network path + between the two ports of an adapter by regularly exchanging packets + between them. This mode requires a network configuration in which + the two ports are able to "see" each other (i.e. there must not be + any router between the ports). + + -- CheckSeg - Check local port and segmentation: This mode supports the + same functions as the CheckLocalPort mode and additionally checks + network segmentation between the ports. Therefore, this mode is only + to be used if Gigabit Ethernet switches are installed on the network + that have been configured to use the Spanning Tree protocol. + + -- DualNet - In this mode, ports A and B are used as separate devices. + If you have a dual port adapter, port A will be configured as eth0 + and port B as eth1. Both ports can be used independently with + distinct IP addresses. The preferred port setting is not used. + RLMT is turned off. + +NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations + where a network path between the ports on one adapter exists. + Moreover, they are not designed to work where adapters are connected + back-to-back. +*** + + +5 Large Frame Support +====================== + +The driver supports large frames (also called jumbo frames). Using large +frames can result in an improved throughput if transferring large amounts +of data. +To enable large frames, set the MTU (maximum transfer unit) of the +interface to the desired value (up to 9000), execute the following +command: + ifconfig eth0 mtu 9000 +This will only work if you have two adapters connected back-to-back +or if you use a switch that supports large frames. When using a switch, +it should be configured to allow large frames and auto-negotiation should +be set to OFF. The setting must be configured on all adapters that can be +reached by the large frames. If one adapter is not set to receive large +frames, it will simply drop them. + +You can switch back to the standard ethernet frame size by executing the +following command: + ifconfig eth0 mtu 1500 + +To permanently configure this setting, add a script with the 'ifconfig' +line to the system startup sequence (named something like "S99sk98lin" +in /etc/rc.d/rc2.d). +*** + + +6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) +================================================================== + +The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and +Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad. +These features are only available after installation of open source +modules available on the Internet: +For VLAN go to: http://www.candelatech.com/~greear/vlan.html +For Link Aggregation go to: http://www.st.rim.or.jp/~yumo + +NOTE: SysKonnect GmbH does not offer any support for these open source + modules and does not take the responsibility for any kind of + failures or problems arising in connection with these modules. + +NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may + cause problems when unloading the driver. + + +7 Troubleshooting +================== + +If any problems occur during the installation process, check the +following list: + + +Problem: The SK-98xx adapter cannot be found by the driver. +Solution: In /proc/pci search for the following entry: + 'Ethernet controller: SysKonnect SK-98xx ...' + If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has + been found by the system and should be operational. + If this entry does not exist or if the file '/proc/pci' is not + found, there may be a hardware problem or the PCI support may + not be enabled in your kernel. + The adapter can be checked using the diagnostics program which + is available on the SysKonnect web site: + www.syskonnect.com + + Some COMPAQ machines have problems dealing with PCI under Linux. + This problem is described in the 'PCI howto' document + (included in some distributions or available from the + web, e.g. at 'www.linux.org'). + + +Problem: Programs such as 'ifconfig' or 'route' cannot be found or the + error message 'Operation not permitted' is displayed. +Reason: You are not logged in as user 'root'. +Solution: Logout and login as 'root' or change to 'root' via 'su'. + + +Problem: Upon use of the command 'ping <address>' the message + "ping: sendto: Network is unreachable" is displayed. +Reason: Your route is not set correctly. +Solution: If you are using RedHat, you probably forgot to set up the + route in the 'network configuration'. + Check the existing routes with the 'route' command and check + if an entry for 'eth0' exists, and if so, if it is set correctly. + + +Problem: The driver can be started, the adapter is connected to the + network, but you cannot receive or transmit any packets; + e.g. 'ping' does not work. +Reason: There is an incorrect route in your routing table. +Solution: Check the routing table with the command 'route' and read the + manual help pages dealing with routes (enter 'man route'). + +NOTE: Although the 2.2.x kernel versions generate the routing entry + automatically, problems of this kind may occur here as well. We've + come across a situation in which the driver started correctly at + system start, but after the driver has been removed and reloaded, + the route of the adapter's network pointed to the 'dummy0'device + and had to be corrected manually. + + +Problem: Your computer should act as a router between multiple + IP subnetworks (using multiple adapters), but computers in + other subnetworks cannot be reached. +Reason: Either the router's kernel is not configured for IP forwarding + or the routing table and gateway configuration of at least one + computer is not working. + +Problem: Upon driver start, the following error message is displayed: + "eth0: -- ERROR -- + Class: internal Software error + Nr: 0xcc + Msg: SkGeInitPort() cannot init running ports" +Reason: You are using a driver compiled for single processor machines + on a multiprocessor machine with SMP (Symmetric MultiProcessor) + kernel. +Solution: Configure your kernel appropriately and recompile the kernel or + the modules. + + + +If your problem is not listed here, please contact SysKonnect's technical +support for help (linux@syskonnect.de). +When contacting our technical support, please ensure that the following +information is available: +- System Manufacturer and HW Informations (CPU, Memory... ) +- PCI-Boards in your system +- Distribution +- Kernel version +- Driver version +*** + + + +***End of Readme File*** |