summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2012-07-11crypto: atmel - add Atmel AES driverNicolas Royer
Signed-off-by: Nicolas Royer <nicolas@eukrea.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Eric Bénard <eric@eukrea.com> Tested-by: Eric Bénard <eric@eukrea.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-07-11ARM: AT91SAM9G45: add crypto peripheralsNicolas Royer
Signed-off-by: Nicolas Royer <nicolas@eukrea.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Eric Bénard <eric@eukrea.com> Tested-by: Eric Bénard <eric@eukrea.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-07-11crypto: testmgr - allow aesni-intel and ghash_clmulni-intel in fips modeMilan Broz
Patch 863b557a88f8c033f7419fabafef4712a5055f85 added NULL entries for intel accelerated drivers but did not marked these fips allowed. This cause panic if running tests with fips=1. For ghash, fips_allowed flag was added in patch 18c0ebd2d8194cce4b3f67e2903fa01bea892cbc. Without patch, "modprobe tcrypt" fails with alg: skcipher: Failed to load transform for cbc-aes-aesni: -2 cbc-aes-aesni: cbc(aes) alg self test failed in fips mode! (panic) Also add missing cryptd(__driver-cbc-aes-aesni) and cryptd(__driver-gcm-aes-aesni) test to complement null tests above, otherwise system complains with alg: No test for __cbc-aes-aesni (cryptd(__driver-cbc-aes-aesni)) alg: No test for __gcm-aes-aesni (cryptd(__driver-gcm-aes-aesni)) Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Paul Wouters <pwouters@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-07-11hwrng: exynos - Add support for Exynos random number generatorJonghwa Lee
This patch supports Exynos SOC's PRNG driver. Exynos's PRNG has 5 seeds and 5 random number outputs. Module is excuted under runtime power management control, so it activates only while it's in use. Otherwise it will be suspended generally. It was tested on PQ board by rngtest program. Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-07-11crypto: aesni-intel - fix wrong kfree pointerMilan Broz
kfree(new_key_mem) in rfc4106_set_key() should be called on malloced pointer, not on aligned one, otherwise it can cause invalid pointer on free. (Seen at least once when running tcrypt tests with debug kernel.) Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-07-11crypto: caam - ERA retrieval and printing for SEC deviceAlex Porosanu
This patch adds support for retrieving and printing of SEC ERA information. It is useful for knowing beforehand what features exist from the SEC point of view on a certain SoC. Only era-s 1 to 4 are currently supported; other eras will appear as unknown. Signed-off-by: Alex Porosanu <alexandru.porosanu@freescale.com> - rebased onto current cryptodev master - made caam_eras static Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-07-11crypto: caam - Using alloc_coherent for caam job ringsBharat Bhushan
The caam job rings (input/output job ring) are allocated using dma_map_single(). These job rings can be visualized as the ring buffers in which the jobs are en-queued/de-queued. The s/w enqueues the jobs in input job ring which h/w dequeues and after processing it copies the jobs in output job ring. Software then de-queues the job from output ring. Using dma_map/unmap_single() is not preferred way to allocate memory for this type of requirements because this adds un-necessary complexity. Example, if bounce buffer (SWIOTLB) will get used then to make any change visible in this memory to other processing unit requires dmap_unmap_single() or dma_sync_single_for_cpu/device(). The dma_unmap_single() can not be used as this will free the bounce buffer, this will require changing the job rings on running system and I seriously doubt that it will be not possible or very complex to implement. Also using dma_sync_single_for_cpu/device() will also add unnecessary complexity. The simple and preferred way is using dma_alloc_coherent() for these type of memory requirements. This resolves the Linux boot crash issue when "swiotlb=force" is set in bootargs on systems which have memory more than 4G. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Acked-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: algapi - Fix hang on crypto allocationSteffen Klassert
git commit 398710379 (crypto: algapi - Move larval completion into algboss) replaced accidentally a call to complete_all() by a call to complete(). This causes a hang on crypto allocation if we have more than one larval waiter. This pach restores the call to complete_all(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: arc4 - now arc needs blockcipher supportSebastian Andrzej Siewior
Since commit ce6dd368 ("crypto: arc4 - improve performance by adding ecb(arc4)) we need to pull in a blkcipher. |ERROR: "crypto_blkcipher_type" [crypto/arc4.ko] undefined! |ERROR: "blkcipher_walk_done" [crypto/arc4.ko] undefined! |ERROR: "blkcipher_walk_virt" [crypto/arc4.ko] undefined! Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - one tasklet per job ringKim Phillips
there is no noticeable benefit for multiple cores to process one job ring's output ring: in fact, we can benefit from cache effects of having the back-half stay on the core that receives a particular ring's interrupts, and further relax general contention and the locking involved with reading outring_used, since tasklets run atomically. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - consolidate memory barriers from job ring en/dequeueKim Phillips
Memory barriers are implied by the i/o register write implementation (at least on Power). So we can remove the redundant wmb() in caam_jr_enqueue, and, in dequeue(), hoist the h/w done notification write up to before we need to increment the head of the ring, and save an smp_mb. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - only query h/w in job ring dequeue pathKim Phillips
Code was needlessly checking the s/w job ring when there would be nothing to process if the h/w's output completion ring were empty anyway. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - use non-irq versions of spinlocks for job ringsKim Phillips
The enqueue lock isn't used in any interrupt context, and the dequeue lock isn't used in the h/w interrupt context, only in bh context. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - disable IRQ coalescing by defaultKim Phillips
It has been observed that in zero-loss benchmarks, when a slow traffic rate is being tested, the IRQ timer coalescing parameter was set too high, and the ethernet controller would start dropping packets because the job ring back half wouldn't be executed in time before the ethernet controller would fill its buffers, thereby significantly reducing the zero-loss performance figures. Empirical testing has shown that the best zero-loss performance is achieved when IRQ coalescing is set to minimum values and/or turned off, since apparently the job ring driver already implements an adequately-performing general-purpose IRQ mitigation strategy in software. Whilst we could go with minimal count (2-8) and timing settings (192-256), we prefer just turning h/w coalescing altogether off to minimize setkey latency (due to split key generation), and for consistent cross-SoC performance (the SEC vs. core clock ratio changes). Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - add support for SEC v5.x RNG4Kim Phillips
The SEC v4.x' RNGB h/w block self-initialized. RNG4, available on SEC versions 5 and beyond, is based on a different standard that requires manual initialization. Also update any new errors From the SEC v5.2 reference manual: The SEC v5.2's RNG4 unit reuses some error IDs, thus the addition of rng_err_id_list over the CHA-independent err_id_list. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - assign 40-bit masks on SEC v5.0 and aboveKim Phillips
SEC v4.x were only 36-bit, SEC v5+ are 40-bit capable. Also set a DMA mask for any job ring devices created. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - hwrng supportYuan Kang
caam_read copies random bytes from two buffers into output. caam rng can fill empty buffer 0xffff bytes at a time, but the buffer sizes are rounded down to multiple of cacheline size. Signed-off-by: Yuan Kang <Yuan.Kang@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - chaining supportYuan Kang
support chained scatterlists for aead, ablkcipher and ahash. Signed-off-by: Yuan Kang <Yuan.Kang@freescale.com> - fix dma unmap leak - un-unlikely src == dst, due to experience with AF_ALG Signed-off-by: Kudupudi Ugendreshwar <B38865@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - unkeyed ahash supportYuan Kang
caam supports and registers unkeyed sha algorithms and md5. Signed-off-by: Yuan Kang <Yuan.Kang@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - ahash hmac supportYuan Kang
caam supports ahash hmac with sha algorithms and md5. Signed-off-by: Yuan Kang <Yuan.Kang@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - link_tbl renameYuan Kang
- rename scatterlist and link_tbl functions - link_tbl changed to sec4_sg - sg_to_link_tbl_one changed to dma_to_sec4_sg_one, since no scatterlist is use Signed-off-by: Yuan Kang <Yuan.Kang@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - refactor key_gen, sgYuan Kang
create separate files for split key generation and scatterlist functions. Signed-off-by: Yuan Kang <Yuan.Kang@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - remove jr register/deregisterYuan Kang
remove caam_jr_register and caam_jr_deregister to allow sharing of job rings. Signed-off-by: Yuan Kang <Yuan.Kang@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - support external seq in/out lengthsYuan Kang
functions for external storage of seq in/out lengths, i.e., for 32-bit lengths. These type-dependent functions automatically determine whether to store the length internally (embedded in the command header word) or externally (after the address pointer), based on size of the type given. Signed-off-by: Yuan Kang <Yuan.Kang@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - add PDB (Protocol Descriptor Block) definitionsHemant Agrawal
Add a PDB header file to support building protocol descriptors. Signed-off-by: Steve Cornelius <sec@pobox.com> Signed-off-by: Hemant Agrawal <hemant@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - fix descriptor length adjustments for protocol descriptorsKim Phillips
init_desc, by always ORing with 1 for the descriptor header inclusion into the descriptor length, and init_sh_desc_pdb, by always specifying the descriptor length modification for the PDB via options, would not allow for odd length PDBs to be embedded in the constructed descriptor length. Fix this by simply changing the OR to an addition. also round-up pdb_bytes to the next SEC command unit size, to allow for, e.g., optional packet header bytes that aren't a multiple of CAAM_CMD_SZ. Reported-by: Radu-Andrei BULIE <radu.bulie@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Cc: Yashpal Dutta <yashpal.dutta@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - fix start index for Protocol shared descriptorsYashpal Dutta
In case of protocol acceleration descriptors, Shared descriptor header must carry size of header length + PDB length in words which will be skipped by DECO while processing descriptor to provide first command word offset Signed-off-by: Yashpal Dutta <yashpal.dutta@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - fix input job ring element dma mapping sizeKim Phillips
SEC4 h/w gets configured in 32- vs. 36-bit physical addressing modes depending on the size of dma_addr_t, which is not always equal to sizeof(u32 *). Also fixed alignment of a dma_unmap call whilst in there. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: caam - remove line continuations from ablkcipher_append_src_dstKim Phillips
presumably leftovers from possible macro development. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: move arch/x86/include/asm/aes.h to arch/x86/include/asm/crypto/Jussi Kivilinna
Move AES header to the new asm/crypto directory. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: move arch/x86/include/asm/serpent-{sse2|avx}.h to ↵Jussi Kivilinna
arch/x86/include/asm/crypto/ Move serpent crypto headers to the new asm/crypto/ directory. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: twofish-avx - remove duplicated glue code and use shared glue code ↵Jussi Kivilinna
from glue_helper Now that shared glue code is available, convert twofish-avx to use it. Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: twofish-x86_64-3way - remove duplicated glue code and use shared ↵Jussi Kivilinna
glue code from glue_helper Now that shared glue code is available, convert twofish-x86_64-3way to use it. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: camellia-x86_64 - remove duplicated glue code and use shared glue ↵Jussi Kivilinna
code from glue_helper Now that shared glue code is available, convert camellia-x86_64 to use it. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: serpent-avx: remove duplicated glue code and use shared glue code ↵Jussi Kivilinna
from glue_helper Now that shared glue code is available, convert serpent-avx to use it. Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: serpent-sse2 - split generic glue code to new helper moduleJussi Kivilinna
Now that serpent-sse2 glue code has been made generic, it can be split to separate module. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: serpent-sse2 - prepare serpent-sse2 glue code into generic x86 glue ↵Jussi Kivilinna
code for 128bit block ciphers Block cipher implementations in arch/x86/crypto/ contain common glue code that is currently duplicated in each module (camellia-x86_64, twofish-x86_64-3way, twofish-avx, serpent-sse2 and serpent-avx). This patch prepares serpent-sse2 glue into generic glue code for all 128bit block ciphers to use in arch/x86/crypto. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: aes_ni - change to use shared ablk_* functionsJussi Kivilinna
Remove duplicate ablk_* functions and make use of ablk_helper module instead. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: twofish-avx - change to use shared ablk_* functionsJussi Kivilinna
Remove duplicate ablk_* functions and make use of ablk_helper module instead. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: ablk_helper - move ablk_* functions from serpent-sse2/avx glue code ↵Jussi Kivilinna
to shared module Move ablk-* functions to separate module to share common code between cipher implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: nx - fix typo in nx driver config optionSeth Jennings
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Kent Yoder <key@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: nx - move nx build to driver/crypto MakefileSeth Jennings
When the nx driver was pulled, the Makefile that actually builds it is arch/powerpc/Makefile. This is unnatural. This patch moves the line that builds the nx driver from arch/powerpc/Makefile to drivers/crypto/Makefile where it belongs. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Kent Yoder <key@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27hwrng: mxc-rnga - fix data_present APIBenoît Thébaudeau
Commit 45001e9, which added support for RNGA, ignored the previous commit 984e976, which changed the data_present API. Cc: Matt Mackall <mpm@selenic.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Alan Carvalho de Assis <acassis@gmail.com> Cc: <linux-arm-kernel@lists.infradead.org> Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-22crypto: algapi - Move larval completion into algbossHerbert Xu
It has been observed that sometimes the crypto allocation code will get stuck for 60 seconds or multiples thereof. This is usually caused by an algorithm failing to pass the self-test. If an algorithm fails to be constructed, we will immediately notify all larval waiters. However, if it succeeds in construction, but then fails the self-test, we won't notify anyone at all. This patch fixes this by merging the notification in the case where the algorithm fails to be constructed with that of the the case where it pases the self-test. This way regardless of what happens, we'll give the larval waiters an answer. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-14crypto: serpent-sse2/avx - allow both to be built into kernelJussi Kivilinna
Rename serpent-avx assembler functions so that they do not collide with serpent-sse2 assembler functions when linking both versions in to same kernel image. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-14crypto: arc4 - improve performance by using u32 for ctx and variablesJussi Kivilinna
This patch changes u8 in struct arc4_ctx and variables to u32 (as AMD seems to have problem with u8 array). Below are tcrypt results of old 1-byte block cipher versus ecb(arc4) with u8 and ecb(arc4) with u32. tcrypt results, x86-64 (speed ratios: new-u32/old, new-u8/old): u32 u8 AMD Phenom II : x3.6 x2.7 Intel Core 2 : x2.0 x1.9 tcrypt results, i386 (speed ratios: new-u32/old, new-u8/old): u32 u8 Intel Atom N260 : x1.5 x1.4 Cc: Jon Oberheide <jon@oberheide.org> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-14crypto: arc4 - improve performance by adding ecb(arc4)Jussi Kivilinna
Currently arc4.c provides simple one-byte blocksize cipher which is wrapped by ecb() module, giving function call overhead on every encrypted byte. This patch adds ecb(arc4) directly into arc4.c for higher performance. tcrypt results (speed ratios: new/old): AMD Phenom II, x86-64 : x2.7 Intel Core 2, x86-64 : x1.9 Intel Atom N260, i386 : x1.4 Cc: Jon Oberheide <jon@oberheide.org> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-14crypto: testmgr - add ecb(arc4) speed testsJussi Kivilinna
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-14crypto: s390 - clean up DES code a bit morePaul Bolle
Commit 98971f8439b1bb9a61682fe24a865ddd25167a6b ("crypto: s390 - cleanup DES code") should have also removed crypto_des.h. That file is unused and unneeded since that commit. So let's clean up that file too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Acked-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-12crypto: serpent - add x86_64/avx assembler implementationJohannes Goetzfried
This patch adds a x86_64/avx assembler implementation of the Serpent block cipher. The implementation is very similar to the sse2 implementation and processes eight blocks in parallel. Because of the new non-destructive three operand syntax all move-instructions can be removed and therefore a little performance increase is provided. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: Intel Core i5-2500 CPU (fam:6, model:42, step:7) serpent-avx-x86_64 vs. serpent-sse2-x86_64 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.03x 1.01x 1.01x 1.01x 1.00x 1.00x 1.00x 1.00x 1.00x 1.01x 64B 1.00x 1.00x 1.00x 1.00x 1.00x 0.99x 1.00x 1.01x 1.00x 1.00x 256B 1.05x 1.03x 1.00x 1.02x 1.05x 1.06x 1.05x 1.02x 1.05x 1.02x 1024B 1.05x 1.02x 1.00x 1.02x 1.05x 1.06x 1.05x 1.03x 1.05x 1.02x 8192B 1.05x 1.02x 1.00x 1.02x 1.06x 1.06x 1.04x 1.03x 1.04x 1.02x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.01x 1.00x 1.01x 1.01x 1.00x 1.00x 0.99x 1.03x 1.01x 1.01x 64B 1.00x 1.00x 1.00x 1.00x 1.00x 1.00x 1.00x 1.01x 1.00x 1.02x 256B 1.05x 1.02x 1.00x 1.02x 1.05x 1.02x 1.04x 1.05x 1.05x 1.02x 1024B 1.06x 1.02x 1.00x 1.02x 1.07x 1.06x 1.05x 1.04x 1.05x 1.02x 8192B 1.05x 1.02x 1.00x 1.02x 1.06x 1.06x 1.04x 1.05x 1.05x 1.02x serpent-avx-x86_64 vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.26x 1.73x ecb-dec 1.20x 1.64x cbc-enc 0.33x 0.45x cbc-dec 1.24x 1.67x ctr-enc 1.32x 1.76x ctr-dec 1.32x 1.76x lrw-enc 1.20x 1.60x lrw-dec 1.15x 1.54x xts-enc 1.22x 1.64x xts-dec 1.17x 1.57x Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>