Skip to content

Commit

Permalink
powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_…
Browse files Browse the repository at this point in the history
…from_user

Version 2.06 of the POWER ISA introduced enhanced touch instructions,
allowing us to specify a number of attributes including the length of
a stream.

This patch adds a software stream for both loads and stores in the
POWER7 copy_tofrom_user loop. Since the setup is quite complicated
and we have to use an eieio to ensure correct ordering of the "GO"
command we only do this for copies above 4kB.

To quantify any performance improvements we need a working set
bigger than the caches so we operate on a 1GB file:

# dd if=/dev/zero of=/tmp/foo bs=1M count=1024

And we compare how fast we can read the file:

# dd if=/tmp/foo of=/dev/null bs=1M

before: 7.7 GB/s
after:  9.6 GB/s

A 25% improvement.

The worst case for this patch will be a completely L1 cache contained
copy of just over 4kB. We can test this with the copy_to_user
testcase we used to tune copy_tofrom_user originally:

http://ozlabs.org/~anton/junkcode/copy_to_user.c

# time ./copy_to_user2 -l 4224 -i 10000000

before: 6.807 s
after:  6.946 s

A 2% slowdown, which seems reasonable considering our data is unlikely
to be completely L1 contained.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
  • Loading branch information
Anton Blanchard authored and Benjamin Herrenschmidt committed Jul 3, 2012
1 parent 8127e72 commit bce4b4b
Showing 1 changed file with 31 additions and 0 deletions.
31 changes: 31 additions & 0 deletions arch/powerpc/lib/copyuser_power7.S
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,37 @@ err1; stb r0,0(r3)
lis r8,0x8000 /* GO=1 */
clrldi r8,r8,32

.machine push
.machine "power4"
dcbt r0,r6,0b01000
dcbt r0,r7,0b01010
dcbtst r0,r9,0b01000
dcbtst r0,r10,0b01010
eieio
dcbt r0,r8,0b01010 /* GO */
.machine pop

/*
* We prefetch both the source and destination using enhanced touch
* instructions. We use a stream ID of 0 for the load side and
* 1 for the store side.
*/
clrrdi r6,r4,7
clrrdi r9,r3,7
ori r9,r9,1 /* stream=1 */

srdi r7,r5,7 /* length in cachelines, capped at 0x3FF */
cmpldi cr1,r7,0x3FF
ble cr1,1f
li r7,0x3FF
1: lis r0,0x0E00 /* depth=7 */
sldi r7,r7,7
or r7,r7,r0
ori r10,r7,1 /* stream=1 */

lis r8,0x8000 /* GO=1 */
clrldi r8,r8,32

.machine push
.machine "power4"
dcbt r0,r6,0b01000
Expand Down

0 comments on commit bce4b4b

Please sign in to comment.