From bfd00722ac230a39bc5234c5f7a514ea6a77996d Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 26 Sep 2005 11:28:47 +0900 Subject: [PATCH] libata EH document update Signed-off-by: Tejun Heo Signed-off-by: Jeff Garzik --- Documentation/DocBook/libata.tmpl | 356 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 356 insertions(+) (limited to 'Documentation') diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index 375ae760dc1..fcbb58fc0fe 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl @@ -413,6 +413,362 @@ and other resources, etc. + + Error handling + + + This chapter describes how errors are handled under libata. + Readers are advised to read SCSI EH + (Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first. + + + Origins of commands + + In libata, a command is represented with struct ata_queued_cmd + or qc. qc's are preallocated during port initialization and + repetitively used for command executions. Currently only one + qc is allocated per port but yet-to-be-merged NCQ branch + allocates one for each tag and maps each qc to NCQ tag 1-to-1. + + + libata commands can originate from two sources - libata itself + and SCSI midlayer. libata internal commands are used for + initialization and error handling. All normal blk requests + and commands for SCSI emulation are passed as SCSI commands + through queuecommand callback of SCSI host template. + + + + How commands are issued + + + + Internal commands + + + First, qc is allocated and initialized using + ata_qc_new_init(). Although ata_qc_new_init() doesn't + implement any wait or retry mechanism when qc is not + available, internal commands are currently issued only during + initialization and error recovery, so no other command is + active and allocation is guaranteed to succeed. + + + Once allocated qc's taskfile is initialized for the command to + be executed. qc currently has two mechanisms to notify + completion. One is via qc->complete_fn() callback and the + other is completion qc->waiting. qc->complete_fn() callback + is the asynchronous path used by normal SCSI translated + commands and qc->waiting is the synchronous (issuer sleeps in + process context) path used by internal commands. + + + Once initialization is complete, host_set lock is acquired + and the qc is issued. + + + + + SCSI commands + + + All libata drivers use ata_scsi_queuecmd() as + hostt->queuecommand callback. scmds can either be simulated + or translated. No qc is involved in processing a simulated + scmd. The result is computed right away and the scmd is + completed. + + + For a translated scmd, ata_qc_new_init() is invoked to + allocate a qc and the scmd is translated into the qc. SCSI + midlayer's completion notification function pointer is stored + into qc->scsidone. + + + qc->complete_fn() callback is used for completion + notification. ATA commands use ata_scsi_qc_complete() while + ATAPI commands use atapi_qc_complete(). Both functions end up + calling qc->scsidone to notify upper layer when the qc is + finished. After translation is completed, the qc is issued + with ata_qc_issue(). + + + Note that SCSI midlayer invokes hostt->queuecommand while + holding host_set lock, so all above occur while holding + host_set lock. + + + + + + + + How commands are processed + + Depending on which protocol and which controller are used, + commands are processed differently. For the purpose of + discussion, a controller which uses taskfile interface and all + standard callbacks is assumed. + + + Currently 6 ATA command protocols are used. They can be + sorted into the following four categories according to how + they are processed. + + + + ATA NO DATA or DMA + + + ATA_PROT_NODATA and ATA_PROT_DMA fall into this category. + These types of commands don't require any software + intervention once issued. Device will raise interrupt on + completion. + + + + + ATA PIO + + + ATA_PROT_PIO is in this category. libata currently + implements PIO with polling. ATA_NIEN bit is set to turn + off interrupt and pio_task on ata_wq performs polling and + IO. + + + + + ATAPI NODATA or DMA + + + ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this + category. packet_task is used to poll BSY bit after + issuing PACKET command. Once BSY is turned off by the + device, packet_task transfers CDB and hands off processing + to interrupt handler. + + + + + ATAPI PIO + + + ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set + and, as in ATAPI NODATA or DMA, packet_task submits cdb. + However, after submitting cdb, further processing (data + transfer) is handed off to pio_task. + + + + + + + How commands are completed + + Once issued, all qc's are either completed with + ata_qc_complete() or time out. For commands which are handled + by interrupts, ata_host_intr() invokes ata_qc_complete(), and, + for PIO tasks, pio_task invokes ata_qc_complete(). In error + cases, packet_task may also complete commands. + + + ata_qc_complete() does the following. + + + + + + + DMA memory is unmapped. + + + + + + ATA_QCFLAG_ACTIVE is clared from qc->flags. + + + + + + qc->complete_fn() callback is invoked. If the return value of + the callback is not zero. Completion is short circuited and + ata_qc_complete() returns. + + + + + + __ata_qc_complete() is called, which does + + + + + qc->flags is cleared to zero. + + + + + + ap->active_tag and qc->tag are poisoned. + + + + + + qc->waiting is claread & completed (in that order). + + + + + + qc is deallocated by clearing appropriate bit in ap->qactive. + + + + + + + + + + + So, it basically notifies upper layer and deallocates qc. One + exception is short-circuit path in #3 which is used by + atapi_qc_complete(). + + + For all non-ATAPI commands, whether it fails or not, almost + the same code path is taken and very little error handling + takes place. A qc is completed with success status if it + succeeded, with failed status otherwise. + + + However, failed ATAPI commands require more handling as + REQUEST SENSE is needed to acquire sense data. If an ATAPI + command fails, ata_qc_complete() is invoked with error status, + which in turn invokes atapi_qc_complete() via + qc->complete_fn() callback. + + + This makes atapi_qc_complete() set scmd->result to + SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As + the sense data is empty but scmd->result is CHECK CONDITION, + SCSI midlayer will invoke EH for the scmd, and returning 1 + makes ata_qc_complete() to return without deallocating the qc. + This leads us to ata_scsi_error() with partially completed qc. + + + + + ata_scsi_error() + + ata_scsi_error() is the current hostt->eh_strategy_handler() + for libata. As discussed above, this will be entered in two + cases - timeout and ATAPI error completion. This function + calls low level libata driver's eng_timeout() callback, the + standard callback for which is ata_eng_timeout(). It checks + if a qc is active and calls ata_qc_timeout() on the qc if so. + Actual error handling occurs in ata_qc_timeout(). + + + If EH is invoked for timeout, ata_qc_timeout() stops BMDMA and + completes the qc. Note that as we're currently in EH, we + cannot call scsi_done. As described in SCSI EH doc, a + recovered scmd should be either retried with + scsi_queue_insert() or finished with scsi_finish_command(). + Here, we override qc->scsidone with scsi_finish_command() and + calls ata_qc_complete(). + + + If EH is invoked due to a failed ATAPI qc, the qc here is + completed but not deallocated. The purpose of this + half-completion is to use the qc as place holder to make EH + code reach this place. This is a bit hackish, but it works. + + + Once control reaches here, the qc is deallocated by invoking + __ata_qc_complete() explicitly. Then, internal qc for REQUEST + SENSE is issued. Once sense data is acquired, scmd is + finished by directly invoking scsi_finish_command() on the + scmd. Note that as we already have completed and deallocated + the qc which was associated with the scmd, we don't need + to/cannot call ata_qc_complete() again. + + + + + Problems with the current EH + + + + + + Error representation is too crude. Currently any and all + error conditions are represented with ATA STATUS and ERROR + registers. Errors which aren't ATA device errors are treated + as ATA device errors by setting ATA_ERR bit. Better error + descriptor which can properly represent ATA and other + errors/exceptions is needed. + + + + + + When handling timeouts, no action is taken to make device + forget about the timed out command and ready for new commands. + + + + + + EH handling via ata_scsi_error() is not properly protected + from usual command processing. On EH entrance, the device is + not in quiescent state. Timed out commands may succeed or + fail any time. pio_task and atapi_task may still be running. + + + + + + Too weak error recovery. Devices / controllers causing HSM + mismatch errors and other errors quite often require reset to + return to known state. Also, advanced error handling is + necessary to support features like NCQ and hotplug. + + + + + + ATA errors are directly handled in the interrupt handler and + PIO errors in pio_task. This is problematic for advanced + error handling for the following reasons. + + + First, advanced error handling often requires context and + internal qc execution. + + + Second, even a simple failure (say, CRC error) needs + information gathering and could trigger complex error handling + (say, resetting & reconfiguring). Having multiple code + paths to gather information, enter EH and trigger actions + makes life painful. + + + Third, scattered EH code makes implementing low level drivers + difficult. Low level drivers override libata callbacks. If + EH is scattered over several places, each affected callbacks + should perform its part of error handling. This can be error + prone and painful. + + + + + + + -- cgit v1.2.3-70-g09d2 From a1213499b0ef75d8c627b461047805a235c9dd00 Mon Sep 17 00:00:00 2001 From: Jeff Garzik Date: Wed, 28 Sep 2005 13:26:47 -0400 Subject: libata: move EH docs to separate DocBook chapter --- Documentation/DocBook/libata.tmpl | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index fcbb58fc0fe..b2ec780bcda 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl @@ -413,7 +413,9 @@ and other resources, etc. - + + + Error handling @@ -422,7 +424,7 @@ and other resources, etc. (Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first. - Origins of commands + Origins of commands In libata, a command is represented with struct ata_queued_cmd or qc. qc's are preallocated during port initialization and @@ -437,9 +439,9 @@ and other resources, etc. and commands for SCSI emulation are passed as SCSI commands through queuecommand callback of SCSI host template. - + - How commands are issued + How commands are issued @@ -501,9 +503,9 @@ and other resources, etc. - + - How commands are processed + How commands are processed Depending on which protocol and which controller are used, commands are processed differently. For the purpose of @@ -562,9 +564,9 @@ and other resources, etc. - + - How commands are completed + How commands are completed Once issued, all qc's are either completed with ata_qc_complete() or time out. For commands which are handled @@ -660,9 +662,9 @@ and other resources, etc. This leads us to ata_scsi_error() with partially completed qc. - + - ata_scsi_error() + ata_scsi_error() ata_scsi_error() is the current hostt->eh_strategy_handler() for libata. As discussed above, this will be entered in two @@ -697,9 +699,9 @@ and other resources, etc. to/cannot call ata_qc_complete() again. - + - Problems with the current EH + Problems with the current EH @@ -766,9 +768,7 @@ and other resources, etc. - - - + -- cgit v1.2.3-70-g09d2 From fe998aa7e27f125f6768ec6b137b0ce2c9790509 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sun, 2 Oct 2005 11:54:29 +0900 Subject: [PATCH] libata: add ATA exceptions chapter to doc Hello, Jeff. This patch adds ATA errors & exceptions chapter to Documentation/DocBook/libata.tmpl. As suggested, the chapter is placed before low level driver specific chapters. Contents are unchanged from the last posting. Signed-off-by: Tejun Heo Signed-off-by: Jeff Garzik --- Documentation/DocBook/libata.tmpl | 716 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 716 insertions(+) (limited to 'Documentation') diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index b2ec780bcda..d260d92089a 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl @@ -787,6 +787,722 @@ and other resources, etc. !Idrivers/scsi/libata-scsi.c + + ATA errors & exceptions + + + This chapter tries to identify what error/exception conditions exist + for ATA/ATAPI devices and describe how they should be handled in + implementation-neutral way. + + + + The term 'error' is used to describe conditions where either an + explicit error condition is reported from device or a command has + timed out. + + + + The term 'exception' is either used to describe exceptional + conditions which are not errors (say, power or hotplug events), or + to describe both errors and non-error exceptional conditions. Where + explicit distinction between error and exception is necessary, the + term 'non-error exception' is used. + + + + Exception categories + + Exceptions are described primarily with respect to legacy + taskfile + bus master IDE interface. If a controller provides + other better mechanism for error reporting, mapping those into + categories described below shouldn't be difficult. + + + + In the following sections, two recovery actions - reset and + reconfiguring transport - are mentioned. These are described + further in . + + + + HSM violation + + This error is indicated when STATUS value doesn't match HSM + requirement during issuing or excution any ATA/ATAPI command. + + + + Examples + + + + ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying + to issue a command. + + + + + + !BSY && !DRQ during PIO data transfer. + + + + + + DRQ on command completion. + + + + + + !BSY && ERR after CDB tranfer starts but before the + last byte of CDB is transferred. ATA/ATAPI standard states + that "The device shall not terminate the PACKET command + with an error before the last byte of the command packet has + been written" in the error outputs description of PACKET + command and the state diagram doesn't include such + transitions. + + + + + + + In these cases, HSM is violated and not much information + regarding the error can be acquired from STATUS or ERROR + register. IOW, this error can be anything - driver bug, + faulty device, controller and/or cable. + + + + As HSM is violated, reset is necessary to restore known state. + Reconfiguring transport for lower speed might be helpful too + as transmission errors sometimes cause this kind of errors. + + + + + ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION) + + + These are errors detected and reported by ATA/ATAPI devices + indicating device problems. For this type of errors, STATUS + and ERROR register values are valid and describe error + condition. Note that some of ATA bus errors are detected by + ATA/ATAPI devices and reported using the same mechanism as + device errors. Those cases are described later in this + section. + + + + For ATA commands, this type of errors are indicated by !BSY + && ERR during command execution and on completion. + + + For ATAPI commands, + + + + + + !BSY && ERR && ABRT right after issuing PACKET + indicates that PACKET command is not supported and falls in + this category. + + + + + + !BSY && ERR(==CHK) && !ABRT after the last + byte of CDB is transferred indicates CHECK CONDITION and + doesn't fall in this category. + + + + + + !BSY && ERR(==CHK) && ABRT after the last byte + of CDB is transferred *probably* indicates CHECK CONDITION and + doesn't fall in this category. + + + + + + + Of errors detected as above, the followings are not ATA/ATAPI + device errors but ATA bus errors and should be handled + according to . + + + + + + CRC error during data transfer + + + This is indicated by ICRC bit in the ERROR register and + means that corruption occurred during data transfer. Upto + ATA/ATAPI-7, the standard specifies that this bit is only + applicable to UDMA transfers but ATA/ATAPI-8 draft revision + 1f says that the bit may be applicable to multiword DMA and + PIO. + + + + + + ABRT error during data transfer or on completion + + + Upto ATA/ATAPI-7, the standard specifies that ABRT could be + set on ICRC errors and on cases where a device is not able + to complete a command. Combined with the fact that MWDMA + and PIO transfer errors aren't allowed to use ICRC bit upto + ATA/ATAPI-7, it seems to imply that ABRT bit alone could + indicate tranfer errors. + + + However, ATA/ATAPI-8 draft revision 1f removes the part + that ICRC errors can turn on ABRT. So, this is kind of + gray area. Some heuristics are needed here. + + + + + + + + ATA/ATAPI device errors can be further categorized as follows. + + + + + + Media errors + + + This is indicated by UNC bit in the ERROR register. ATA + devices reports UNC error only after certain number of + retries cannot recover the data, so there's nothing much + else to do other than notifying upper layer. + + + READ and WRITE commands report CHS or LBA of the first + failed sector but ATA/ATAPI standard specifies that the + amount of transferred data on error completion is + indeterminate, so we cannot assume that sectors preceding + the failed sector have been transferred and thus cannot + complete those sectors successfully as SCSI does. + + + + + + Media changed / media change requested error + + + <<TODO: fill here>> + + + + + Address error + + + This is indicated by IDNF bit in the ERROR register. + Report to upper layer. + + + + + Other errors + + + This can be invalid command or parameter indicated by ABRT + ERROR bit or some other error condition. Note that ABRT + bit can indicate a lot of things including ICRC and Address + errors. Heuristics needed. + + + + + + + + Depending on commands, not all STATUS/ERROR bits are + applicable. These non-applicable bits are marked with + "na" in the output descriptions but upto ATA/ATAPI-7 + no definition of "na" can be found. However, + ATA/ATAPI-8 draft revision 1f describes "N/A" as + follows. + + +
+ + 3.2.3.3a N/A + + + A keyword the indicates a field has no defined value in + this standard and should not be checked by the host or + device. N/A fields should be cleared to zero. + + + + +
+ + + So, it seems reasonable to assume that "na" bits are + cleared to zero by devices and thus need no explicit masking. + + +
+ + + ATAPI device CHECK CONDITION + + + ATAPI device CHECK CONDITION error is indicated by set CHK bit + (ERR bit) in the STATUS register after the last byte of CDB is + transferred for a PACKET command. For this kind of errors, + sense data should be acquired to gather information regarding + the errors. REQUEST SENSE packet command should be used to + acquire sense data. + + + + Once sense data is acquired, this type of errors can be + handled similary to other SCSI errors. Note that sense data + may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR + && ASC/ASCQ 47h/00h SCSI PARITY ERROR). In such + cases, the error should be considered as an ATA bus error and + handled according to . + + + + + + ATA device error (NCQ) + + + NCQ command error is indicated by cleared BSY and set ERR bit + during NCQ command phase (one or more NCQ commands + outstanding). Although STATUS and ERROR registers will + contain valid values describing the error, READ LOG EXT is + required to clear the error condition, determine which command + has failed and acquire more information. + + + + READ LOG EXT Log Page 10h reports which tag has failed and + taskfile register values describing the error. With this + information the failed command can be handled as a normal ATA + command error as in and all + other in-flight commands must be retried. Note that this + retry should not be counted - it's likely that commands + retried this way would have completed normally if it were not + for the failed command. + + + + Note that ATA bus errors can be reported as ATA device NCQ + errors. This should be handled as described in . + + + + If READ LOG EXT Log Page 10h fails or reports NQ, we're + thoroughly screwed. This condition should be treated + according to . + + + + + + ATA bus error + + + ATA bus error means that data corruption occurred during + transmission over ATA bus (SATA or PATA). This type of errors + can be indicated by + + + + + + + ICRC or ABRT error as described in . + + + + + + Controller-specific error completion with error information + indicating transmission error. + + + + + + On some controllers, command timeout. In this case, there may + be a mechanism to determine that the timeout is due to + transmission error. + + + + + + Unknown/random errors, timeouts and all sorts of weirdities. + + + + + + + As described above, transmission errors can cause wide variety + of symptoms ranging from device ICRC error to random device + lockup, and, for many cases, there is no way to tell if an + error condition is due to transmission error or not; + therefore, it's necessary to employ some kind of heuristic + when dealing with errors and timeouts. For example, + encountering repetitive ABRT errors for known supported + command is likely to indicate ATA bus error. + + + + Once it's determined that ATA bus errors have possibly + occurred, lowering ATA bus transmission speed is one of + actions which may alleviate the problem. See for more information. + + + + + + PCI bus error + + + Data corruption or other failures during transmission over PCI + (or other system bus). For standard BMDMA, this is indicated + by Error bit in the BMDMA Status register. This type of + errors must be logged as it indicates something is very wrong + with the system. Resetting host controller is recommended. + + + + + + Late completion + + + This occurs when timeout occurs and the timeout handler finds + out that the timed out command has completed successfully or + with error. This is usually caused by lost interrupts. This + type of errors must be logged. Resetting host controller is + recommended. + + + + + + Unknown error (timeout) + + + This is when timeout occurs and the command is still + processing or the host and device are in unknown state. When + this occurs, HSM could be in any valid or invalid state. To + bring the device to known state and make it forget about the + timed out command, resetting is necessary. The timed out + command may be retried. + + + + Timeouts can also be caused by transmission errors. Refer to + for more details. + + + + + + Hotplug and power management exceptions + + + <<TODO: fill here>> + + + + +
+ + + EH recovery actions + + + This section discusses several important recovery actions. + + + + Clearing error condition + + + Many controllers require its error registers to be cleared by + error handler. Different controllers may have different + requirements. + + + + For SATA, it's strongly recommended to clear at least SError + register during error handling. + + + + + Reset + + + During EH, resetting is necessary in the following cases. + + + + + + + HSM is in unknown or invalid state + + + + + + HBA is in unknown or invalid state + + + + + + EH needs to make HBA/device forget about in-flight commands + + + + + + HBA/device behaves weirdly + + + + + + + Resetting during EH might be a good idea regardless of error + condition to improve EH robustness. Whether to reset both or + either one of HBA and device depends on situation but the + following scheme is recommended. + + + + + + + When it's known that HBA is in ready state but ATA/ATAPI + device in in unknown state, reset only device. + + + + + + If HBA is in unknown state, reset both HBA and device. + + + + + + + HBA resetting is implementation specific. For a controller + complying to taskfile/BMDMA PCI IDE, stopping active DMA + transaction may be sufficient iff BMDMA state is the only HBA + context. But even mostly taskfile/BMDMA PCI IDE complying + controllers may have implementation specific requirements and + mechanism to reset themselves. This must be addressed by + specific drivers. + + + + OTOH, ATA/ATAPI standard describes in detail ways to reset + ATA/ATAPI devices. + + + + + PATA hardware reset + + + This is hardware initiated device reset signalled with + asserted PATA RESET- signal. There is no standard way to + initiate hardware reset from software although some + hardware provides registers that allow driver to directly + tweak the RESET- signal. + + + + + Software reset + + + This is achieved by turning CONTROL SRST bit on for at + least 5us. Both PATA and SATA support it but, in case of + SATA, this may require controller-specific support as the + second Register FIS to clear SRST should be transmitted + while BSY bit is still set. Note that on PATA, this resets + both master and slave devices on a channel. + + + + + EXECUTE DEVICE DIAGNOSTIC command + + + Although ATA/ATAPI standard doesn't describe exactly, EDD + implies some level of resetting, possibly similar level + with software reset. Host-side EDD protocol can be handled + with normal command processing and most SATA controllers + should be able to handle EDD's just like other commands. + As in software reset, EDD affects both devices on a PATA + bus. + + + Although EDD does reset devices, this doesn't suit error + handling as EDD cannot be issued while BSY is set and it's + unclear how it will act when device is in unknown/weird + state. + + + + + ATAPI DEVICE RESET command + + + This is very similar to software reset except that reset + can be restricted to the selected device without affecting + the other device sharing the cable. + + + + + SATA phy reset + + + This is the preferred way of resetting a SATA device. In + effect, it's identical to PATA hardware reset. Note that + this can be done with the standard SCR Control register. + As such, it's usually easier to implement than software + reset. + + + + + + + + One more thing to consider when resetting devices is that + resetting clears certain configuration parameters and they + need to be set to their previous or newly adjusted values + after reset. + + + + Parameters affected are. + + + + + + + CHS set up with INITIALIZE DEVICE PARAMETERS (seldomly used) + + + + + + Parameters set with SET FEATURES including transfer mode setting + + + + + + Block count set with SET MULTIPLE MODE + + + + + + Other parameters (SET MAX, MEDIA LOCK...) + + + + + + + ATA/ATAPI standard specifies that some parameters must be + maintained across hardware or software reset, but doesn't + strictly specify all of them. Always reconfiguring needed + parameters after reset is required for robustness. Note that + this also applies when resuming from deep sleep (power-off). + + + + Also, ATA/ATAPI standard requires that IDENTIFY DEVICE / + IDENTIFY PACKET DEVICE is issued after any configuration + parameter is updated or a hardware reset and the result used + for further operation. OS driver is required to implement + revalidation mechanism to support this. + + + + + + Reconfigure transport + + + For both PATA and SATA, a lot of corners are cut for cheap + connectors, cables or controllers and it's quite common to see + high transmission error rate. This can be mitigated by + lowering transmission speed. + + + + The following is a possible scheme Jeff Garzik suggested. + + +
+ + If more than $N (3?) transmission errors happen in 15 minutes, + + + + + if SATA, decrease SATA PHY speed. if speed cannot be decreased, + + + + + decrease UDMA xfer speed. if at UDMA0, switch to PIO4, + + + + + decrease PIO xfer speed. if at PIO3, complain, but continue + + + +
+ +
+ +
+ +
+ ata_piix Internals !Idrivers/scsi/ata_piix.c -- cgit v1.2.3-70-g09d2 From 4cac018ae30dac8f7fc089a8ff4963904c99725e Mon Sep 17 00:00:00 2001 From: "John W. Linville" Date: Tue, 18 Oct 2005 21:30:59 -0400 Subject: [PATCH] bonding: fix typos in bonding documentation Fix some simple typos in the bonding.txt file. The typos are in areas relating to loading the bonding driver multiple times. Signed-off-by: John W. Linville Signed-off-by: Jeff Garzik --- Documentation/networking/bonding.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index a55f0f95b17..b0fe41da007 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -777,7 +777,7 @@ doing so is the same as described in the "Configuring Multiple Bonds Manually" section, below. NOTE: It has been observed that some Red Hat supplied kernels -are apparently unable to rename modules at load time (the "-obonding1" +are apparently unable to rename modules at load time (the "-o bond1" part). Attempts to pass that option to modprobe will produce an "Operation not permitted" error. This has been reported on some Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels @@ -883,7 +883,8 @@ the above does not work, and the second bonding instance never sees its options. In that case, the second options line can be substituted as follows: -install bonding1 /sbin/modprobe bonding -obond1 mode=balance-alb miimon=50 +install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ + mode=balance-alb miimon=50 This may be repeated any number of times, specifying a new and unique name in place of bond1 for each subsequent instance. -- cgit v1.2.3-70-g09d2 From 4c9f7836406f41ef9da6ee68d7a0448fdb97b5ef Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 20 Oct 2005 16:47:40 +0200 Subject: [PATCH] 05/05 update biodoc to match new generic dispatch api Updates biodoc to reflect changes in elevator API Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe --- Documentation/block/biodoc.txt | 113 +++++++++++++++++++---------------------- 1 file changed, 52 insertions(+), 61 deletions(-) (limited to 'Documentation') diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index 6dd274d7e1c..2d65c218216 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt @@ -906,9 +906,20 @@ Aside: 4. The I/O scheduler -I/O schedulers are now per queue. They should be runtime switchable and modular -but aren't yet. Jens has most bits to do this, but the sysfs implementation is -missing. +I/O scheduler, a.k.a. elevator, is implemented in two layers. Generic dispatch +queue and specific I/O schedulers. Unless stated otherwise, elevator is used +to refer to both parts and I/O scheduler to specific I/O schedulers. + +Block layer implements generic dispatch queue in ll_rw_blk.c and elevator.c. +The generic dispatch queue is responsible for properly ordering barrier +requests, requeueing, handling non-fs requests and all other subtleties. + +Specific I/O schedulers are responsible for ordering normal filesystem +requests. They can also choose to delay certain requests to improve +throughput or whatever purpose. As the plural form indicates, there are +multiple I/O schedulers. They can be built as modules but at least one should +be built inside the kernel. Each queue can choose different one and can also +change to another one dynamically. A block layer call to the i/o scheduler follows the convention elv_xxx(). This calls elevator_xxx_fn in the elevator switch (drivers/block/elevator.c). Oh, @@ -921,44 +932,36 @@ keeping work. The functions an elevator may implement are: (* are mandatory) elevator_merge_fn called to query requests for merge with a bio -elevator_merge_req_fn " " " with another request +elevator_merge_req_fn called when two requests get merged. the one + which gets merged into the other one will be + never seen by I/O scheduler again. IOW, after + being merged, the request is gone. elevator_merged_fn called when a request in the scheduler has been involved in a merge. It is used in the deadline scheduler for example, to reposition the request if its sorting order has changed. -*elevator_next_req_fn returns the next scheduled request, or NULL - if there are none (or none are ready). +elevator_dispatch_fn fills the dispatch queue with ready requests. + I/O schedulers are free to postpone requests by + not filling the dispatch queue unless @force + is non-zero. Once dispatched, I/O schedulers + are not allowed to manipulate the requests - + they belong to generic dispatch queue. -*elevator_add_req_fn called to add a new request into the scheduler +elevator_add_req_fn called to add a new request into the scheduler elevator_queue_empty_fn returns true if the merge queue is empty. Drivers shouldn't use this, but rather check if elv_next_request is NULL (without losing the request if one exists!) -elevator_remove_req_fn This is called when a driver claims ownership of - the target request - it now belongs to the - driver. It must not be modified or merged. - Drivers must not lose the request! A subsequent - call of elevator_next_req_fn must return the - _next_ request. - -elevator_requeue_req_fn called to add a request to the scheduler. This - is used when the request has alrnadebeen - returned by elv_next_request, but hasn't - completed. If this is not implemented then - elevator_add_req_fn is called instead. - elevator_former_req_fn elevator_latter_req_fn These return the request before or after the one specified in disk sort order. Used by the block layer to find merge possibilities. -elevator_completed_req_fn called when a request is completed. This might - come about due to being merged with another or - when the device completes the request. +elevator_completed_req_fn called when a request is completed. elevator_may_queue_fn returns true if the scheduler wants to allow the current context to queue a new request even if @@ -967,13 +970,33 @@ elevator_may_queue_fn returns true if the scheduler wants to allow the elevator_set_req_fn elevator_put_req_fn Must be used to allocate and free any elevator - specific storate for a request. + specific storage for a request. + +elevator_activate_req_fn Called when device driver first sees a request. + I/O schedulers can use this callback to + determine when actual execution of a request + starts. +elevator_deactivate_req_fn Called when device driver decides to delay + a request by requeueing it. elevator_init_fn elevator_exit_fn Allocate and free any elevator specific storage for a queue. -4.2 I/O scheduler implementation +4.2 Request flows seen by I/O schedulers +All requests seens by I/O schedulers strictly follow one of the following three +flows. + + set_req_fn -> + + i. add_req_fn -> (merged_fn ->)* -> dispatch_fn -> activate_req_fn -> + (deactivate_req_fn -> activate_req_fn ->)* -> completed_req_fn + ii. add_req_fn -> (merged_fn ->)* -> merge_req_fn + iii. [none] + + -> put_req_fn + +4.3 I/O scheduler implementation The generic i/o scheduler algorithm attempts to sort/merge/batch requests for optimal disk scan and request servicing performance (based on generic principles and device capabilities), optimized for: @@ -993,18 +1016,7 @@ request in sort order to prevent binary tree lookups. This arrangement is not a generic block layer characteristic however, so elevators may implement queues as they please. -ii. Last merge hint -The last merge hint is part of the generic queue layer. I/O schedulers must do -some management on it. For the most part, the most important thing is to make -sure q->last_merge is cleared (set to NULL) when the request on it is no longer -a candidate for merging (for example if it has been sent to the driver). - -The last merge performed is cached as a hint for the subsequent request. If -sequential data is being submitted, the hint is used to perform merges without -any scanning. This is not sufficient when there are multiple processes doing -I/O though, so a "merge hash" is used by some schedulers. - -iii. Merge hash +ii. Merge hash AS and deadline use a hash table indexed by the last sector of a request. This enables merging code to quickly look up "back merge" candidates, even when multiple I/O streams are being performed at once on one disk. @@ -1013,29 +1025,8 @@ multiple I/O streams are being performed at once on one disk. are far less common than "back merges" due to the nature of most I/O patterns. Front merges are handled by the binary trees in AS and deadline schedulers. -iv. Handling barrier cases -A request with flags REQ_HARDBARRIER or REQ_SOFTBARRIER must not be ordered -around. That is, they must be processed after all older requests, and before -any newer ones. This includes merges! - -In AS and deadline schedulers, barriers have the effect of flushing the reorder -queue. The performance cost of this will vary from nothing to a lot depending -on i/o patterns and device characteristics. Obviously they won't improve -performance, so their use should be kept to a minimum. - -v. Handling insertion position directives -A request may be inserted with a position directive. The directives are one of -ELEVATOR_INSERT_BACK, ELEVATOR_INSERT_FRONT, ELEVATOR_INSERT_SORT. - -ELEVATOR_INSERT_SORT is a general directive for non-barrier requests. -ELEVATOR_INSERT_BACK is used to insert a barrier to the back of the queue. -ELEVATOR_INSERT_FRONT is used to insert a barrier to the front of the queue, and -overrides the ordering requested by any previous barriers. In practice this is -harmless and required, because it is used for SCSI requeueing. This does not -require flushing the reorder queue, so does not impose a performance penalty. - -vi. Plugging the queue to batch requests in anticipation of opportunities for - merge/sort optimizations +iii. Plugging the queue to batch requests in anticipation of opportunities for + merge/sort optimizations This is just the same as in 2.4 so far, though per-device unplugging support is anticipated for 2.5. Also with a priority-based i/o scheduler, @@ -1069,7 +1060,7 @@ Aside: blk_kick_queue() to unplug a specific queue (right away ?) or optionally, all queues, is in the plan. -4.3 I/O contexts +4.4 I/O contexts I/O contexts provide a dynamically allocated per process data area. They may be used in I/O schedulers, and in the block layer (could be used for IO statis, priorities for example). See *io_context in drivers/block/ll_rw_blk.c, and -- cgit v1.2.3-70-g09d2