Linux-2.6.12-rc2v2.6.12-rc2

Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
author: Linus Torvalds <torvalds@ppc970.osdl.org> 2005-04-16 15:20:36 -0700
committer: Linus Torvalds <torvalds@ppc970.osdl.org> 2005-04-16 15:20:36 -0700
commit: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (patch)
tree: 0bba044c4ce775e45a88a51686b5d9f90697ea9d /Documentation/fujitsu/frv/atomic-ops.txt
1 files changed, 134 insertions, 0 deletions
diff --git a/Documentation/fujitsu/frv/atomic-ops.txt b/Documentation/fujitsu/frv/atomic-ops.txt
new file mode 100644
index 00000000000..96638e9b9fe
--- /dev/null
+++ b/Documentation/fujitsu/frv/atomic-ops.txt
@@ -0,0 +1,134 @@
+			       =====================================
+			       FUJITSU FR-V KERNEL ATOMIC OPERATIONS
+			       =====================================
+
+On the FR-V CPUs, there is only one atomic Read-Modify-Write operation: the SWAP/SWAPI
+instruction. Unfortunately, this alone can't be used to implement the following operations:
+
+ (*) Atomic add to memory
+
+ (*) Atomic subtract from memory
+
+ (*) Atomic bit modification (set, clear or invert)
+
+ (*) Atomic compare and exchange
+
+On such CPUs, the standard way of emulating such operations in uniprocessor mode is to disable
+interrupts, but on the FR-V CPUs, modifying the PSR takes a lot of clock cycles, and it has to be
+done twice. This means the CPU runs for a relatively long time with interrupts disabled,
+potentially having a great effect on interrupt latency.
+
+
+=============
+NEW ALGORITHM
+=============
+
+To get around this, the following algorithm has been implemented. It operates in a way similar to
+the LL/SC instruction pairs supported on a number of platforms.
+
+ (*) The CCCR.CC3 register is reserved within the kernel to act as an atomic modify abort flag.
+
+ (*) In the exception prologues run on kernel->kernel entry, CCCR.CC3 is set to 0 (Undefined
+     state).
+
+ (*) All atomic operations can then be broken down into the following algorithm:
+
+     (1) Set ICC3.Z to true and set CC3 to True (ORCC/CKEQ/ORCR).
+
+     (2) Load the value currently in the memory to be modified into a register.
+
+     (3) Make changes to the value.
+
+     (4) If CC3 is still True, simultaneously and atomically (by VLIW packing):
+
+	 (a) Store the modified value back to memory.
+
+	 (b) Set ICC3.Z to false (CORCC on GR29 is sufficient for this - GR29 holds the current
+	     task pointer in the kernel, and so is guaranteed to be non-zero).
+
+     (5) If ICC3.Z is still true, go back to step (1).
+
+This works in a non-SMP environment because any interrupt or other exception that happens between
+steps (1) and (4) will set CC3 to the Undefined, thus aborting the store in (4a), and causing the
+condition in ICC3 to remain with the Z flag set, thus causing step (5) to loop back to step (1).
+
+
+This algorithm suffers from two problems:
+
+ (1) The condition CCCR.CC3 is cleared unconditionally by an exception, irrespective of whether or
+     not any changes were made to the target memory location during that exception.
+
+ (2) The branch from step (5) back to step (1) may have to happen more than once until the store
+     manages to take place. In theory, this loop could cycle forever because there are too many
+     interrupts coming in, but it's unlikely.
+
+
+=======
+EXAMPLE
+=======
+
+Taking an example from include/asm-frv/atomic.h:
+
+	static inline int atomic_add_return(int i, atomic_t *v)
+	{
+		unsigned long val;
+
+		asm("0:						\n"
+
+It starts by setting ICC3.Z to true for later use, and also transforming that into CC3 being in the
+True state.
+
+		    "	orcc		gr0,gr0,gr0,icc3	\n"	<-- (1)
+		    "	ckeq		icc3,cc7		\n"	<-- (1)
+
+Then it does the load. Note that the final phase of step (1) is done at the same time as the
+load. The VLIW packing ensures they are done simultaneously. The ".p" on the load must not be
+removed without swapping the order of these two instructions.
+
+		    "	ld.p		%M0,%1			\n"	<-- (2)
+		    "	orcr		cc7,cc7,cc3		\n"	<-- (1)
+
+Then the proposed modification is generated. Note that the old value can be retained if required
+(such as in test_and_set_bit()).
+
+		    "	add%I2		%1,%2,%1		\n"	<-- (3)
+
+Then it attempts to store the value back, contingent on no exception having cleared CC3 since it
+was set to True.
+
+		    "	cst.p		%1,%M0		,cc3,#1	\n"	<-- (4a)
+
+It simultaneously records the success or failure of the store in ICC3.Z.
+
+		    "	corcc		gr29,gr29,gr0	,cc3,#1	\n"	<-- (4b)
+
+Such that the branch can then be taken if the operation was aborted.
+
+		    "	beq		icc3,#0,0b		\n"	<-- (5)
+		    : "+U"(v->counter), "=&r"(val)
+		    : "NPr"(i)
+		    : "memory", "cc7", "cc3", "icc3"
+		    );
+
+		return val;
+	}
+
+
+=============
+CONFIGURATION
+=============
+
+The atomic ops implementation can be made inline or out-of-line by changing the
+CONFIG_FRV_OUTOFLINE_ATOMIC_OPS configuration variable. Making it out-of-line has a number of
+advantages:
+
+ - The resulting kernel image may be smaller
+ - Debugging is easier as atomic ops can just be stepped over and they can be breakpointed
+
+Keeping it inline also has a number of advantages:
+
+ - The resulting kernel may be Faster
+   - no out-of-line function calls need to be made
+   - the compiler doesn't have half its registers clobbered by making a call
+
+The out-of-line implementations live in arch/frv/lib/atomic-ops.S.
author	Linus Torvalds <torvalds@ppc970.osdl.org>	2005-04-16 15:20:36 -0700
committer	Linus Torvalds <torvalds@ppc970.osdl.org>	2005-04-16 15:20:36 -0700
commit	1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (patch)
tree	0bba044c4ce775e45a88a51686b5d9f90697ea9d /Documentation/fujitsu/frv/atomic-ops.txt