SSE3

Not to be confused with SSSE3.

SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of their Pentium 4 CPU. In April 2005, AMD introduced a subset of SSE3 in revision E (Venice and San Diego) of their Athlon 64 CPUs. The earlier SIMD instruction sets on the x86 platform, from oldest to newest, are MMX, 3DNow! (developed by AMD), SSE and SSE2.

SSE3 contains 13 new instructions over SSE2.

Changes

The most notable change is the capability to work horizontally in a register, as opposed to the more or less strictly vertical operation of all previous SSE instructions. More specifically, instructions to add and subtract the multiple values stored within a single register have been added. These instructions can be used to speed up the implementation of a number of DSP and 3D operations. There is also a new instruction to convert floating point values to integers without having to change the global rounding mode, thus avoiding costly pipeline stalls. Finally, the extension adds LDDQU, an alternative misaligned integer vector load that has better performance on NetBurst based platforms for loads that cross cacheline boundaries.

CPUs with SSE3

AMD:
- Athlon 64 (since Venice Stepping E3 and San Diego Stepping E4)
- Athlon 64 X2
- Athlon 64 FX (since San Diego Stepping E4)
- Opteron (since Stepping E4)
- Sempron (since Palermo. Stepping E3)
- Phenom
- Phenom II
- Athlon II
- Turion 64
- Turion 64 X2
- Turion X2
- Turion X2 Ultra
- Turion II X2 Mobile
- Turion II X2 Ultra
- APU
- FX Series
Intel:
- Celeron D
- Celeron (starting with Core microarchitecture)
- Pentium 4 (since Prescott)
- Pentium D
- Pentium Extreme Edition (but NOT Pentium 4 Extreme Edition)
- Pentium Dual-Core
- Pentium (starting with Core microarchitecture)
- Core
- Xeon (since Nocona)
- Atom
VIA/Centaur:
- C7
- Nano
Transmeta Efficeon TM88xx (NOT Model Numbers TM86xx)

New instructions

Common instructions

Arithmetic

ADDSUBPD — (Add-Subtract-Packed-Double)
- Input: { A0, A1 }, { B0, B1 }
- Output: { A0 − B0, A1 + B1 }
ADDSUBPS — (Add-Subtract-Packed-Single)
- Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
- Output: { A0 − B0, A1 + B1, A2 − B2, A3 + B3 }

AOS ( Array Of Structures )

HADDPD — (Horizontal-Add-Packed-Double)
- Input: { A0, A1 }, { B0, B1 }
- Output: { A0 + A1, B0 + B1 }
HADDPS (Horizontal-Add-Packed-Single)
- Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
- Output: { A0 + A1, A2 + A3, B0 + B1, B2 + B3 }
HSUBPD — (Horizontal-Subtract-Packed-Double)
- Input: { A0, A1 }, { B0, B1 }
- Output: { A0 − A1, B0 − B1 }
HSUBPS — (Horizontal-Subtract-Packed-Single)
- Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
- Output: { A0 − A1, A2 − A3, B0 − B1, B2 − B3 }
LDDQU — As stated above, this is an alternative misaligned integer vector load. It can be helpful for video compression tasks.
MOVDDUP, MOVSHDUP, MOVSLDUP — These are useful for complex numbers and wave calculation like sound.
FISTTP — Like the older x87 FISTP instruction, but ignores the floating point control register's rounding mode settings and uses the "chop" (truncate) mode instead. Allows omission of the expensive loading and re-loading of the control register in languages such as C where float-to-int conversion requires truncate behaviour by standard.

Intel instructions

MONITOR, MWAIT - These optimize multi-threaded applications, giving processors with Hyper-threading better performance.

External links

X-bit Labs

Instruction set extensions

SIMD (RISC)	DEC Alpha MVI ARM NEON MIPS MDMX MIPS-3D MXU PA-RISC MAX Power Architecture AltiVec SPARC VIS

SIMD (x86)	MMX (1996) 3DNow! (1998) SSE (1999) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) SSE5 ~~(2007)~~ AVX (2008) F16C (2009) XOP (2009) FMA (FMA4: 2011, FMA3: 2012) AVX-512 (2015)

Bit manipulation	BMI (ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012)

Security and Cryptography	AES-NI (2008); 32- and 64-bit ARMv8 also has AES instructions CLMUL (2010) SHA (2013) MPX (2015) SGX (2015)

Transactional memory	TSX (2013) ASF

Virtualization	VT-x (2005) AMD-V (2006)

Suspended extensions' dates have been ~~struck through~~.

This article is issued from Wikipedia - version of the 9/17/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.