summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/arm/OMAP/omap_pm129
-rw-r--r--Documentation/cpu-freq/user-guide.txt9
-rw-r--r--Documentation/dontdiff1
-rw-r--r--Documentation/feature-removal-schedule.txt10
-rw-r--r--Documentation/hwmon/pcf859128
-rw-r--r--Documentation/hwmon/tmp42136
-rw-r--r--Documentation/hwmon/wm831x37
-rw-r--r--Documentation/hwmon/wm835026
-rw-r--r--Documentation/kernel-doc-nano-HOWTO.txt4
-rw-r--r--Documentation/kernel-parameters.txt6
-rw-r--r--Documentation/kref.txt1
-rw-r--r--Documentation/trace/events.txt184
-rw-r--r--Documentation/trace/ftrace-design.txt233
-rw-r--r--Documentation/trace/ftrace.txt6
14 files changed, 680 insertions, 30 deletions
diff --git a/Documentation/arm/OMAP/omap_pm b/Documentation/arm/OMAP/omap_pm
new file mode 100644
index 00000000000..5389440aade
--- /dev/null
+++ b/Documentation/arm/OMAP/omap_pm
@@ -0,0 +1,129 @@
+
+The OMAP PM interface
+=====================
+
+This document describes the temporary OMAP PM interface. Driver
+authors use these functions to communicate minimum latency or
+throughput constraints to the kernel power management code.
+Over time, the intention is to merge features from the OMAP PM
+interface into the Linux PM QoS code.
+
+Drivers need to express PM parameters which:
+
+- support the range of power management parameters present in the TI SRF;
+
+- separate the drivers from the underlying PM parameter
+ implementation, whether it is the TI SRF or Linux PM QoS or Linux
+ latency framework or something else;
+
+- specify PM parameters in terms of fundamental units, such as
+ latency and throughput, rather than units which are specific to OMAP
+ or to particular OMAP variants;
+
+- allow drivers which are shared with other architectures (e.g.,
+ DaVinci) to add these constraints in a way which won't affect non-OMAP
+ systems,
+
+- can be implemented immediately with minimal disruption of other
+ architectures.
+
+
+This document proposes the OMAP PM interface, including the following
+five power management functions for driver code:
+
+1. Set the maximum MPU wakeup latency:
+ (*pdata->set_max_mpu_wakeup_lat)(struct device *dev, unsigned long t)
+
+2. Set the maximum device wakeup latency:
+ (*pdata->set_max_dev_wakeup_lat)(struct device *dev, unsigned long t)
+
+3. Set the maximum system DMA transfer start latency (CORE pwrdm):
+ (*pdata->set_max_sdma_lat)(struct device *dev, long t)
+
+4. Set the minimum bus throughput needed by a device:
+ (*pdata->set_min_bus_tput)(struct device *dev, u8 agent_id, unsigned long r)
+
+5. Return the number of times the device has lost context
+ (*pdata->get_dev_context_loss_count)(struct device *dev)
+
+
+Further documentation for all OMAP PM interface functions can be
+found in arch/arm/plat-omap/include/mach/omap-pm.h.
+
+
+The OMAP PM layer is intended to be temporary
+---------------------------------------------
+
+The intention is that eventually the Linux PM QoS layer should support
+the range of power management features present in OMAP3. As this
+happens, existing drivers using the OMAP PM interface can be modified
+to use the Linux PM QoS code; and the OMAP PM interface can disappear.
+
+
+Driver usage of the OMAP PM functions
+-------------------------------------
+
+As the 'pdata' in the above examples indicates, these functions are
+exposed to drivers through function pointers in driver .platform_data
+structures. The function pointers are initialized by the board-*.c
+files to point to the corresponding OMAP PM functions:
+.set_max_dev_wakeup_lat will point to
+omap_pm_set_max_dev_wakeup_lat(), etc. Other architectures which do
+not support these functions should leave these function pointers set
+to NULL. Drivers should use the following idiom:
+
+ if (pdata->set_max_dev_wakeup_lat)
+ (*pdata->set_max_dev_wakeup_lat)(dev, t);
+
+The most common usage of these functions will probably be to specify
+the maximum time from when an interrupt occurs, to when the device
+becomes accessible. To accomplish this, driver writers should use the
+set_max_mpu_wakeup_lat() function to to constrain the MPU wakeup
+latency, and the set_max_dev_wakeup_lat() function to constrain the
+device wakeup latency (from clk_enable() to accessibility). For
+example,
+
+ /* Limit MPU wakeup latency */
+ if (pdata->set_max_mpu_wakeup_lat)
+ (*pdata->set_max_mpu_wakeup_lat)(dev, tc);
+
+ /* Limit device powerdomain wakeup latency */
+ if (pdata->set_max_dev_wakeup_lat)
+ (*pdata->set_max_dev_wakeup_lat)(dev, td);
+
+ /* total wakeup latency in this example: (tc + td) */
+
+The PM parameters can be overwritten by calling the function again
+with the new value. The settings can be removed by calling the
+function with a t argument of -1 (except in the case of
+set_max_bus_tput(), which should be called with an r argument of 0).
+
+The fifth function above, omap_pm_get_dev_context_loss_count(),
+is intended as an optimization to allow drivers to determine whether the
+device has lost its internal context. If context has been lost, the
+driver must restore its internal context before proceeding.
+
+
+Other specialized interface functions
+-------------------------------------
+
+The five functions listed above are intended to be usable by any
+device driver. DSPBridge and CPUFreq have a few special requirements.
+DSPBridge expresses target DSP performance levels in terms of OPP IDs.
+CPUFreq expresses target MPU performance levels in terms of MPU
+frequency. The OMAP PM interface contains functions for these
+specialized cases to convert that input information (OPPs/MPU
+frequency) into the form that the underlying power management
+implementation needs:
+
+6. (*pdata->dsp_get_opp_table)(void)
+
+7. (*pdata->dsp_set_min_opp)(u8 opp_id)
+
+8. (*pdata->dsp_get_opp)(void)
+
+9. (*pdata->cpu_get_freq_table)(void)
+
+10. (*pdata->cpu_set_freq)(unsigned long f)
+
+11. (*pdata->cpu_get_freq)(void)
diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt
index 5d5f5fadd1c..2a5b850847c 100644
--- a/Documentation/cpu-freq/user-guide.txt
+++ b/Documentation/cpu-freq/user-guide.txt
@@ -176,7 +176,9 @@ scaling_governor, and by "echoing" the name of another
work on some specific architectures or
processors.
-cpuinfo_cur_freq : Current speed of the CPU, in KHz.
+cpuinfo_cur_freq : Current frequency of the CPU as obtained from
+ the hardware, in KHz. This is the frequency
+ the CPU actually runs at.
scaling_available_frequencies : List of available frequencies, in KHz.
@@ -196,7 +198,10 @@ related_cpus : List of CPUs that need some sort of frequency
scaling_driver : Hardware driver for cpufreq.
-scaling_cur_freq : Current frequency of the CPU, in KHz.
+scaling_cur_freq : Current frequency of the CPU as determined by
+ the governor and cpufreq core, in KHz. This is
+ the frequency the kernel thinks the CPU runs
+ at.
If you have selected the "userspace" governor which allows you to
set the CPU operating frequency to a specific value, you can read out
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 88519daab6e..e1efc400bed 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -152,7 +152,6 @@ piggy.gz
piggyback
pnmtologo
ppc_defs.h*
-promcon_tbl.c
pss_boot.h
qconf
raid6altivec*.c
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 503d21216d5..fa75220f8d3 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -428,16 +428,6 @@ Who: Johannes Berg <johannes@sipsolutions.net>
----------------------------
-What: CONFIG_X86_OLD_MCE
-When: 2.6.32
-Why: Remove the old legacy 32bit machine check code. This has been
- superseded by the newer machine check code from the 64bit port,
- but the old version has been kept around for easier testing. Note this
- doesn't impact the old P5 and WinChip machine check handlers.
-Who: Andi Kleen <andi@firstfloor.org>
-
-----------------------------
-
What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be
exported interface anymore.
When: 2.6.33
diff --git a/Documentation/hwmon/pcf8591 b/Documentation/hwmon/pcf8591
index 5628fcf4207..e76a7892f68 100644
--- a/Documentation/hwmon/pcf8591
+++ b/Documentation/hwmon/pcf8591
@@ -2,11 +2,11 @@ Kernel driver pcf8591
=====================
Supported chips:
- * Philips PCF8591
+ * Philips/NXP PCF8591
Prefix: 'pcf8591'
Addresses scanned: I2C 0x48 - 0x4f
- Datasheet: Publicly available at the Philips Semiconductor website
- http://www.semiconductors.philips.com/pip/PCF8591P.html
+ Datasheet: Publicly available at the NXP website
+ http://www.nxp.com/pip/PCF8591_6.html
Authors:
Aurelien Jarno <aurelien@aurel32.net>
@@ -16,9 +16,10 @@ Authors:
Description
-----------
+
The PCF8591 is an 8-bit A/D and D/A converter (4 analog inputs and one
-analog output) for the I2C bus produced by Philips Semiconductors. It
-is designed to provide a byte I2C interface to up to 4 separate devices.
+analog output) for the I2C bus produced by Philips Semiconductors (now NXP).
+It is designed to provide a byte I2C interface to up to 4 separate devices.
The PCF8591 has 4 analog inputs programmable as single-ended or
differential inputs :
@@ -58,8 +59,8 @@ Accessing PCF8591 via /sys interface
-------------------------------------
! Be careful !
-The PCF8591 is plainly impossible to detect ! Stupid chip.
-So every chip with address in the interval [48..4f] is
+The PCF8591 is plainly impossible to detect! Stupid chip.
+So every chip with address in the interval [0x48..0x4f] is
detected as PCF8591. If you have other chips in this address
range, the workaround is to load this module after the one
for your others chips.
@@ -67,19 +68,20 @@ for your others chips.
On detection (i.e. insmod, modprobe et al.), directories are being
created for each detected PCF8591:
-/sys/bus/devices/<0>-<1>/
+/sys/bus/i2c/devices/<0>-<1>/
where <0> is the bus the chip was detected on (e. g. i2c-0)
and <1> the chip address ([48..4f])
Inside these directories, there are such files:
-in0, in1, in2, in3, out0_enable, out0_output, name
+in0_input, in1_input, in2_input, in3_input, out0_enable, out0_output, name
Name contains chip name.
-The in0, in1, in2 and in3 files are RO. Reading gives the value of the
-corresponding channel. Depending on the current analog inputs configuration,
-files in2 and/or in3 do not exist. Values range are from 0 to 255 for single
-ended inputs and -128 to +127 for differential inputs (8-bit ADC).
+The in0_input, in1_input, in2_input and in3_input files are RO. Reading gives
+the value of the corresponding channel. Depending on the current analog inputs
+configuration, files in2_input and in3_input may not exist. Values range
+from 0 to 255 for single ended inputs and -128 to +127 for differential inputs
+(8-bit ADC).
The out0_enable file is RW. Reading gives "1" for analog output enabled and
"0" for analog output disabled. Writing accepts "0" and "1" accordingly.
diff --git a/Documentation/hwmon/tmp421 b/Documentation/hwmon/tmp421
new file mode 100644
index 00000000000..0cf07f82474
--- /dev/null
+++ b/Documentation/hwmon/tmp421
@@ -0,0 +1,36 @@
+Kernel driver tmp421
+====================
+
+Supported chips:
+ * Texas Instruments TMP421
+ Prefix: 'tmp421'
+ Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f
+ Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html
+ * Texas Instruments TMP422
+ Prefix: 'tmp422'
+ Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f
+ Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html
+ * Texas Instruments TMP423
+ Prefix: 'tmp423'
+ Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f
+ Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html
+
+Authors:
+ Andre Prendel <andre.prendel@gmx.de>
+
+Description
+-----------
+
+This driver implements support for Texas Instruments TMP421, TMP422
+and TMP423 temperature sensor chips. These chips implement one local
+and up to one (TMP421), up to two (TMP422) or up to three (TMP423)
+remote sensors. Temperature is measured in degrees Celsius. The chips
+are wired over I2C/SMBus and specified over a temperature range of -40
+to +125 degrees Celsius. Resolution for both the local and remote
+channels is 0.0625 degree C.
+
+The chips support only temperature measurement. The driver exports
+the temperature values via the following sysfs files:
+
+temp[1-4]_input
+temp[2-4]_fault
diff --git a/Documentation/hwmon/wm831x b/Documentation/hwmon/wm831x
new file mode 100644
index 00000000000..24f47d8f6a4
--- /dev/null
+++ b/Documentation/hwmon/wm831x
@@ -0,0 +1,37 @@
+Kernel driver wm831x-hwmon
+==========================
+
+Supported chips:
+ * Wolfson Microelectronics WM831x PMICs
+ Prefix: 'wm831x'
+ Datasheet:
+ http://www.wolfsonmicro.com/products/WM8310
+ http://www.wolfsonmicro.com/products/WM8311
+ http://www.wolfsonmicro.com/products/WM8312
+
+Authors: Mark Brown <broonie@opensource.wolfsonmicro.com>
+
+Description
+-----------
+
+The WM831x series of PMICs include an AUXADC which can be used to
+monitor a range of system operating parameters, including the voltages
+of the major supplies within the system. Currently the driver provides
+reporting of all the input values but does not provide any alarms.
+
+Voltage Monitoring
+------------------
+
+Voltages are sampled by a 12 bit ADC. Voltages in milivolts are 1.465
+times the ADC value.
+
+Temperature Monitoring
+----------------------
+
+Temperatures are sampled by a 12 bit ADC. Chip and battery temperatures
+are available. The chip temperature is calculated as:
+
+ Degrees celsius = (512.18 - data) / 1.0983
+
+while the battery temperature calculation will depend on the NTC
+thermistor component.
diff --git a/Documentation/hwmon/wm8350 b/Documentation/hwmon/wm8350
new file mode 100644
index 00000000000..98f923bd2e9
--- /dev/null
+++ b/Documentation/hwmon/wm8350
@@ -0,0 +1,26 @@
+Kernel driver wm8350-hwmon
+==========================
+
+Supported chips:
+ * Wolfson Microelectronics WM835x PMICs
+ Prefix: 'wm8350'
+ Datasheet:
+ http://www.wolfsonmicro.com/products/WM8350
+ http://www.wolfsonmicro.com/products/WM8351
+ http://www.wolfsonmicro.com/products/WM8352
+
+Authors: Mark Brown <broonie@opensource.wolfsonmicro.com>
+
+Description
+-----------
+
+The WM835x series of PMICs include an AUXADC which can be used to
+monitor a range of system operating parameters, including the voltages
+of the major supplies within the system. Currently the driver provides
+simple access to these major supplies.
+
+Voltage Monitoring
+------------------
+
+Voltages are sampled by a 12 bit ADC. For the internal supplies the ADC
+is referenced to the system VRTC.
diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt
index 4d04572b654..348b9e5e28f 100644
--- a/Documentation/kernel-doc-nano-HOWTO.txt
+++ b/Documentation/kernel-doc-nano-HOWTO.txt
@@ -66,7 +66,9 @@ Example kernel-doc function comment:
* The longer description can have multiple paragraphs.
*/
-The first line, with the short description, must be on a single line.
+The short description following the subject can span multiple lines
+and ends with an @argument description, an empty line or the end of
+the comment block.
The @argument descriptions must begin on the very next line following
this opening short function description line, with no intervening
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4c12a290bee..0f17d16dc10 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1286,6 +1286,10 @@ and is between 256 and 4096 characters. It is defined in the file
(machvec) in a generic kernel.
Example: machvec=hpzx1_swiotlb
+ machtype= [Loongson] Share the same kernel image file between different
+ yeeloong laptop.
+ Example: machtype=lemote-yeeloong-2f-7inch
+
max_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory greater
than or equal to this physical address is ignored.
@@ -1561,7 +1565,7 @@ and is between 256 and 4096 characters. It is defined in the file
of returning the full 64-bit number.
The default is to return 64-bit inode numbers.
- nmi_debug= [KNL,AVR32] Specify one or more actions to take
+ nmi_debug= [KNL,AVR32,SH] Specify one or more actions to take
when a NMI is triggered.
Format: [state][,regs][,debounce][,die]
diff --git a/Documentation/kref.txt b/Documentation/kref.txt
index 130b6e87aa7..ae203f91ee9 100644
--- a/Documentation/kref.txt
+++ b/Documentation/kref.txt
@@ -84,7 +84,6 @@ int my_data_handler(void)
task = kthread_run(more_data_handling, data, "more_data_handling");
if (task == ERR_PTR(-ENOMEM)) {
rv = -ENOMEM;
- kref_put(&data->refcount, data_release);
goto out;
}
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index 90e8b3383ba..78c45a87be5 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -1,7 +1,7 @@
Event Tracing
Documentation written by Theodore Ts'o
- Updated by Li Zefan
+ Updated by Li Zefan and Tom Zanussi
1. Introduction
===============
@@ -97,3 +97,185 @@ The format of this boot option is the same as described in section 2.1.
See The example provided in samples/trace_events
+4. Event formats
+================
+
+Each trace event has a 'format' file associated with it that contains
+a description of each field in a logged event. This information can
+be used to parse the binary trace stream, and is also the place to
+find the field names that can be used in event filters (see section 5).
+
+It also displays the format string that will be used to print the
+event in text mode, along with the event name and ID used for
+profiling.
+
+Every event has a set of 'common' fields associated with it; these are
+the fields prefixed with 'common_'. The other fields vary between
+events and correspond to the fields defined in the TRACE_EVENT
+definition for that event.
+
+Each field in the format has the form:
+
+ field:field-type field-name; offset:N; size:N;
+
+where offset is the offset of the field in the trace record and size
+is the size of the data item, in bytes.
+
+For example, here's the information displayed for the 'sched_wakeup'
+event:
+
+# cat /debug/tracing/events/sched/sched_wakeup/format
+
+name: sched_wakeup
+ID: 60
+format:
+ field:unsigned short common_type; offset:0; size:2;
+ field:unsigned char common_flags; offset:2; size:1;
+ field:unsigned char common_preempt_count; offset:3; size:1;
+ field:int common_pid; offset:4; size:4;
+ field:int common_tgid; offset:8; size:4;
+
+ field:char comm[TASK_COMM_LEN]; offset:12; size:16;
+ field:pid_t pid; offset:28; size:4;
+ field:int prio; offset:32; size:4;
+ field:int success; offset:36; size:4;
+ field:int cpu; offset:40; size:4;
+
+print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
+ REC->prio, REC->success, REC->cpu
+
+This event contains 10 fields, the first 5 common and the remaining 5
+event-specific. All the fields for this event are numeric, except for
+'comm' which is a string, a distinction important for event filtering.
+
+5. Event filtering
+==================
+
+Trace events can be filtered in the kernel by associating boolean
+'filter expressions' with them. As soon as an event is logged into
+the trace buffer, its fields are checked against the filter expression
+associated with that event type. An event with field values that
+'match' the filter will appear in the trace output, and an event whose
+values don't match will be discarded. An event with no filter
+associated with it matches everything, and is the default when no
+filter has been set for an event.
+
+5.1 Expression syntax
+---------------------
+
+A filter expression consists of one or more 'predicates' that can be
+combined using the logical operators '&&' and '||'. A predicate is
+simply a clause that compares the value of a field contained within a
+logged event with a constant value and returns either 0 or 1 depending
+on whether the field value matched (1) or didn't match (0):
+
+ field-name relational-operator value
+
+Parentheses can be used to provide arbitrary logical groupings and
+double-quotes can be used to prevent the shell from interpreting
+operators as shell metacharacters.
+
+The field-names available for use in filters can be found in the
+'format' files for trace events (see section 4).
+
+The relational-operators depend on the type of the field being tested:
+
+The operators available for numeric fields are:
+
+==, !=, <, <=, >, >=
+
+And for string fields they are:
+
+==, !=
+
+Currently, only exact string matches are supported.
+
+Currently, the maximum number of predicates in a filter is 16.
+
+5.2 Setting filters
+-------------------
+
+A filter for an individual event is set by writing a filter expression
+to the 'filter' file for the given event.
+
+For example:
+
+# cd /debug/tracing/events/sched/sched_wakeup
+# echo "common_preempt_count > 4" > filter
+
+A slightly more involved example:
+
+# cd /debug/tracing/events/sched/sched_signal_send
+# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
+
+If there is an error in the expression, you'll get an 'Invalid
+argument' error when setting it, and the erroneous string along with
+an error message can be seen by looking at the filter e.g.:
+
+# cd /debug/tracing/events/sched/sched_signal_send
+# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
+-bash: echo: write error: Invalid argument
+# cat filter
+((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
+^
+parse_error: Field not found
+
+Currently the caret ('^') for an error always appears at the beginning of
+the filter string; the error message should still be useful though
+even without more accurate position info.
+
+5.3 Clearing filters
+--------------------
+
+To clear the filter for an event, write a '0' to the event's filter
+file.
+
+To clear the filters for all events in a subsystem, write a '0' to the
+subsystem's filter file.
+
+5.3 Subsystem filters
+---------------------
+
+For convenience, filters for every event in a subsystem can be set or
+cleared as a group by writing a filter expression into the filter file
+at the root of the subsytem. Note however, that if a filter for any
+event within the subsystem lacks a field specified in the subsystem
+filter, or if the filter can't be applied for any other reason, the
+filter for that event will retain its previous setting. This can
+result in an unintended mixture of filters which could lead to
+confusing (to the user who might think different filters are in
+effect) trace output. Only filters that reference just the common
+fields can be guaranteed to propagate successfully to all events.
+
+Here are a few subsystem filter examples that also illustrate the
+above points:
+
+Clear the filters on all events in the sched subsytem:
+
+# cd /sys/kernel/debug/tracing/events/sched
+# echo 0 > filter
+# cat sched_switch/filter
+none
+# cat sched_wakeup/filter
+none
+
+Set a filter using only common fields for all events in the sched
+subsytem (all events end up with the same filter):
+
+# cd /sys/kernel/debug/tracing/events/sched
+# echo common_pid == 0 > filter
+# cat sched_switch/filter
+common_pid == 0
+# cat sched_wakeup/filter
+common_pid == 0
+
+Attempt to set a filter using a non-common field for all events in the
+sched subsytem (all events but those that have a prev_pid field retain
+their old filters):
+
+# cd /sys/kernel/debug/tracing/events/sched
+# echo prev_pid == 0 > filter
+# cat sched_switch/filter
+prev_pid == 0
+# cat sched_wakeup/filter
+common_pid == 0
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
new file mode 100644
index 00000000000..7003e10f10f
--- /dev/null
+++ b/Documentation/trace/ftrace-design.txt
@@ -0,0 +1,233 @@
+ function tracer guts
+ ====================
+
+Introduction
+------------
+
+Here we will cover the architecture pieces that the common function tracing
+code relies on for proper functioning. Things are broken down into increasing
+complexity so that you can start simple and at least get basic functionality.
+
+Note that this focuses on architecture implementation details only. If you
+want more explanation of a feature in terms of common code, review the common
+ftrace.txt file.
+
+
+Prerequisites
+-------------
+
+Ftrace relies on these features being implemented:
+ STACKTRACE_SUPPORT - implement save_stack_trace()
+ TRACE_IRQFLAGS_SUPPORT - implement include/asm/irqflags.h
+
+
+HAVE_FUNCTION_TRACER
+--------------------
+
+You will need to implement the mcount and the ftrace_stub functions.
+
+The exact mcount symbol name will depend on your toolchain. Some call it
+"mcount", "_mcount", or even "__mcount". You can probably figure it out by
+running something like:
+ $ echo 'main(){}' | gcc -x c -S -o - - -pg | grep mcount
+ call mcount
+We'll make the assumption below that the symbol is "mcount" just to keep things
+nice and simple in the examples.
+
+Keep in mind that the ABI that is in effect inside of the mcount function is
+*highly* architecture/toolchain specific. We cannot help you in this regard,
+sorry. Dig up some old documentation and/or find someone more familiar than
+you to bang ideas off of. Typically, register usage (argument/scratch/etc...)
+is a major issue at this point, especially in relation to the location of the
+mcount call (before/after function prologue). You might also want to look at
+how glibc has implemented the mcount function for your architecture. It might
+be (semi-)relevant.
+
+The mcount function should check the function pointer ftrace_trace_function
+to see if it is set to ftrace_stub. If it is, there is nothing for you to do,
+so return immediately. If it isn't, then call that function in the same way
+the mcount function normally calls __mcount_internal -- the first argument is
+the "frompc" while the second argument is the "selfpc" (adjusted to remove the
+size of the mcount call that is embedded in the function).
+
+For example, if the function foo() calls bar(), when the bar() function calls
+mcount(), the arguments mcount() will pass to the tracer are:
+ "frompc" - the address bar() will use to return to foo()
+ "selfpc" - the address bar() (with _mcount() size adjustment)
+
+Also keep in mind that this mcount function will be called *a lot*, so
+optimizing for the default case of no tracer will help the smooth running of
+your system when tracing is disabled. So the start of the mcount function is
+typically the bare min with checking things before returning. That also means
+the code flow should usually kept linear (i.e. no branching in the nop case).
+This is of course an optimization and not a hard requirement.
+
+Here is some pseudo code that should help (these functions should actually be
+implemented in assembly):
+
+void ftrace_stub(void)
+{
+ return;
+}
+
+void mcount(void)
+{
+ /* save any bare state needed in order to do initial checking */
+
+ extern void (*ftrace_trace_function)(unsigned long, unsigned long);
+ if (ftrace_trace_function != ftrace_stub)
+ goto do_trace;
+
+ /* restore any bare state */
+
+ return;
+
+do_trace:
+
+ /* save all state needed by the ABI (see paragraph above) */
+
+ unsigned long frompc = ...;
+ unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;
+ ftrace_trace_function(frompc, selfpc);
+
+ /* restore all state needed by the ABI */
+}
+
+Don't forget to export mcount for modules !
+extern void mcount(void);
+EXPORT_SYMBOL(mcount);
+
+
+HAVE_FUNCTION_TRACE_MCOUNT_TEST
+-------------------------------
+
+This is an optional optimization for the normal case when tracing is turned off
+in the system. If you do not enable this Kconfig option, the common ftrace
+code will take care of doing the checking for you.
+
+To support this feature, you only need to check the function_trace_stop
+variable in the mcount function. If it is non-zero, there is no tracing to be
+done at all, so you can return.
+
+This additional pseudo code would simply be:
+void mcount(void)
+{
+ /* save any bare state needed in order to do initial checking */
+
++ if (function_trace_stop)
++ return;
+
+ extern void (*ftrace_trace_function)(unsigned long, unsigned long);
+ if (ftrace_trace_function != ftrace_stub)
+...
+
+
+HAVE_FUNCTION_GRAPH_TRACER
+--------------------------
+
+Deep breath ... time to do some real work. Here you will need to update the
+mcount function to check ftrace graph function pointers, as well as implement
+some functions to save (hijack) and restore the return address.
+
+The mcount function should check the function pointers ftrace_graph_return
+(compare to ftrace_stub) and ftrace_graph_entry (compare to
+ftrace_graph_entry_stub). If either of those are not set to the relevant stub
+function, call the arch-specific function ftrace_graph_caller which in turn
+calls the arch-specific function prepare_ftrace_return. Neither of these
+function names are strictly required, but you should use them anyways to stay
+consistent across the architecture ports -- easier to compare & contrast
+things.
+
+The arguments to prepare_ftrace_return are slightly different than what are
+passed to ftrace_trace_function. The second argument "selfpc" is the same,
+but the first argument should be a pointer to the "frompc". Typically this is
+located on the stack. This allows the function to hijack the return address
+temporarily to have it point to the arch-specific function return_to_handler.
+That function will simply call the common ftrace_return_to_handler function and
+that will return the original return address with which, you can return to the
+original call site.
+
+Here is the updated mcount pseudo code:
+void mcount(void)
+{
+...
+ if (ftrace_trace_function != ftrace_stub)
+ goto do_trace;
+
++#ifdef CONFIG_FUNCTION_GRAPH_TRACER
++ extern void (*ftrace_graph_return)(...);
++ extern void (*ftrace_graph_entry)(...);
++ if (ftrace_graph_return != ftrace_stub ||
++ ftrace_graph_entry != ftrace_graph_entry_stub)
++ ftrace_graph_caller();
++#endif
+
+ /* restore any bare state */
+...
+
+Here is the pseudo code for the new ftrace_graph_caller assembly function:
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+void ftrace_graph_caller(void)
+{
+ /* save all state needed by the ABI */
+
+ unsigned long *frompc = &...;
+ unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;
+ prepare_ftrace_return(frompc, selfpc);
+
+ /* restore all state needed by the ABI */
+}
+#endif
+
+For information on how to implement prepare_ftrace_return(), simply look at
+the x86 version. The only architecture-specific piece in it is the setup of
+the fault recovery table (the asm(...) code). The rest should be the same
+across architectures.
+
+Here is the pseudo code for the new return_to_handler assembly function. Note
+that the ABI that applies here is different from what applies to the mcount
+code. Since you are returning from a function (after the epilogue), you might
+be able to skimp on things saved/restored (usually just registers used to pass
+return values).
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+void return_to_handler(void)
+{
+ /* save all state needed by the ABI (see paragraph above) */
+
+ void (*original_return_point)(void) = ftrace_return_to_handler();
+
+ /* restore all state needed by the ABI */
+
+ /* this is usually either a return or a jump */
+ original_return_point();
+}
+#endif
+
+
+HAVE_FTRACE_NMI_ENTER
+---------------------
+
+If you can't trace NMI functions, then skip this option.
+
+<details to be filled>
+
+
+HAVE_FTRACE_SYSCALLS
+---------------------
+
+<details to be filled>
+
+
+HAVE_FTRACE_MCOUNT_RECORD
+-------------------------
+
+See scripts/recordmcount.pl for more info.
+
+<details to be filled>
+
+
+HAVE_DYNAMIC_FTRACE
+---------------------
+
+<details to be filled>
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 355d0f1f8c5..1b6292bbdd6 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -26,6 +26,12 @@ disabled, and more (ftrace allows for tracer plugins, which
means that the list of tracers can always grow).
+Implementation Details
+----------------------
+
+See ftrace-design.txt for details for arch porters and such.
+
+
The File System
---------------