Xeon Phi

Not to be confused with the ATI Xenos, or Xenon.

The Tianhe-2 supercomputer uses Xeon Phi processors

Xeon Phi
Instruction set	x64

Xeon Phi^[1] is a brand name given to a series of massively-parallel multicore processors designed, manufactured, marketed, and sold by Intel, targeted at supercomputing, enterprise, and high-end workstation markets.

Initially in the form of PCIe-based add-on cards, a second generation product codenamed Knights Landing using a 14 nm process was announced in June 2013.

In June 2013, the Tianhe-2 supercomputer at the National Supercomputing Center in Guangzhou (NSCC-GZ) was announced^[2] as the world's fastest supercomputer (it's currently, second, after non-Intel based computer). It uses Intel Xeon Phi coprocessors and Ivy Bridge-EP Xeon processors to achieve 33.86 petaFLOPS.^[3]

Competitors include Nvidia's Tesla-branded product lines.

History

Background

The Larrabee microarchitecture (in development since 2006^[4]) introduced very wide (512-bit) SIMD units to a x86 architecture based processor design, extended to a cache-coherent multiprocessor system connected via a ring bus to memory; each core was capable of four-way multithreading. Due to the design being intended for GPU as well as general purpose computing the Larrabee chips also included specialised hardware for texture sampling.^[5]^[6] The project to produce a retail GPU product directly from the Larrabee research project was terminated in May 2010.^[7]

Another contemporary Intel research project implementing x86 architecture on a many-multicore processor was the 'Single-chip Cloud Computer' (prototype introduced 2009^[8]), a design mimicking a cloud computing computer datacentre on a single chip with multiple independent cores: the prototype design included 48 cores per chip with hardware support for selective frequency and voltage control of cores to maximize energy efficiency, and incorporated a mesh network for interchip messaging. The design lacked cache-coherent cores and focused on principles that would allow the design to scale to many more cores.^[9]

The Teraflops Research Chip (prototype unveiled 2007^[10]) is an experimental 80-core chip with two floating point units per core, implementing a 96-bit VLIW architecture instead of the x86 architecture.^[11] The project investigated intercore communication methods, per-chip power management, and achieved 1.01 TFLOPS at 3.16 GHz consuming 62 W of power.^[12]^[13]

Knights Ferry

Intel's MIC prototype board, named Knights Ferry, incorporating a processor codenamed Aubrey Isle was announced May 31, 2010. The product was stated to be a derivative of the Larrabee project and other Intel research including the Single-chip Cloud Computer.^[14]^[15]

The development product was offered as a PCIe card with 32 in-order cores at up to 1.2 GHz with four threads per core, 2 GB GDDR5 memory,^[16] and 8 MB coherent L2 cache (256 KB per core with 32 KB L1 cache), and a power requirement of ~300 W,^[16] built at a 45 nm process.^[17] In the Aubrey Isle core a 1,024-bit ring bus (512-bit bi-directional) connects processors to main memory.^[18] Single board performance has exceeded 750 GFLOPS.^[17] The prototype boards only support single precision floating point instructions.^[19]

Initial developers included CERN, Korea Institute of Science and Technology Information (KISTI) and Leibniz Supercomputing Centre. Hardware vendors for prototype boards included IBM, SGI, HP, Dell and others.^[20]

Knights Corner

The Knights Corner product line is made at a 22 nm process size, using Intel's Tri-gate technology with more than 50 cores per chip, and is Intel's first many-cores commercial product.^[14]^[17]

In June 2011, SGI announced a partnership with Intel to use the MIC architecture in its high performance computing products.^[21] In September 2011, it was announced that the Texas Advanced Computing Center (TACC) will use Knights Corner cards in their 10 petaFLOPS "Stampede" supercomputer, providing 8 petaFLOPS of compute power.^[22] According to "Stampede: A Comprehensive Petascale Computing Environment" the "second generation Intel (Knights Landing) MICs will be added when they become available, increasing Stampede's aggregate peak performance to at least 15 PetaFLOPS."^[23]

On November 15, 2011, Intel showed an early silicon version of a Knights Corner processor.^[24]^[25]

On June 5, 2012, Intel released open source software and documentation regarding Knights Corner.^[26]

On June 18, 2012, Intel announced at the 2012 Hamburg International Supercomputing Conference that Xeon Phi will be the brand name used for all products based on their Many Integrated Core architecture.^[1]^[27]^[28]^[29]^[30]^[31]^[32] In June 2012, Cray announced it would be offering 22 nm 'Knight's Corner' chips (branded as 'Xeon Phi') as a co-processor in its 'Cascade' systems.^[33]^[34]

In June 2012, ScaleMP announced it will provide its virtualization software to allow using 'Knight's Corner' chips (branded as 'Xeon Phi') as main processor transparent extension. The virtualization software will allow 'Knight's Corner' to run legacy MMX/SSE code and access unlimited amount of (host) memory without need for code changes.^[35] An important component of the Intel Xeon Phi coprocessor’s core is its vector processing unit (VPU).^[36] The VPU features a novel 512-bit SIMD instruction set, officially known as Intel® Initial Many Core Instructions (Intel® IMCI). Thus, the VPU can execute 16 single-precision (SP) or 8 double-precision (DP) operations per cycle. The VPU also supports Fused Multiply-Add (FMA) instructions and hence can execute 32 SP or 16 DP floating point operations per cycle. It also provides support for integers. The VPU also features an Extended Math Unit (EMU) that can execute operations such as reciprocal, square root, and logarithm, thereby allowing these operations to be executed in a vector fashion with high bandwidth. The EMU operates by calculating polynomial approximations of these functions.

On November 12, 2012, Intel announced two Xeon Phi coprocessor families using the 22 nm process size: the Xeon Phi 3100 and the Xeon Phi 5110P.^[37]^[38]^[39] The Xeon Phi 3100 will be capable of more than 1 teraFLOPS of double precision floating point instructions with 240 GB/sec memory bandwidth at 300 W.^[37]^[38]^[39] The Xeon Phi 5110P will be capable of 1.01 teraFLOPS of double precision floating point instructions with 320 GB/sec memory bandwidth at 225 W.^[37]^[38]^[39] The Xeon Phi 7120P will be capable of 1.2 teraFLOPS of double precision floating point instructions with 352 GB/sec memory bandwidth at 300 W.

On June 17, 2013, the Tianhe-2 supercomputer was announced^[2] by TOP500 as the world's fastest. It used Intel Ivy Bridge Xeon and Xeon Phi processors to achieve 33.86 petaFLOPS. According to the TOP500 list, Tianhe-2 was the world's fastest supercomputer since its introduction in June 2013 through the most recent list in November 2015.^[40]

Knights Landing

Code name for the second generation MIC architecture product from Intel.^[23] Intel officially first revealed details of its second generation Intel Xeon Phi products on June 17, 2013.^[3] Intel said that the next generation of Intel MIC Architecture-based products will be available in two forms, as a coprocessor or a host processor (CPU), and be manufactured using Intel's 14nm process technology. Knights Landing products will include integrated on-package memory for significantly higher memory bandwidth.

Knights Landing will be built using up to 72 Airmont (Atom) cores with four threads per core,^[41]^[42] using LGA 3647 socket^[43] supporting for up to 384 GB of "far" DDR4 RAM and 8–16 GB of stacked "near" 3D MCDRAM, a version of High Bandwidth Memory. Each core will have two 512-bit vector units and will support AVX-512 SIMD instructions, specifically the Intel AVX-512 Foundational Instructions (AVX-512F) with Intel AVX-512 Conflict Detection Instructions (AVX-512CD), Intel AVX-512 Exponential and Reciprocal Instructions (AVX-512ER), and Intel AVX-512 Prefetch Instructions (AVX-512PF).^[44]

The National Energy Research Scientific Computing Center announced that Phase 2 of its newest supercomputing system "Cori" would use Knights Landing Xeon Phi coprocessors.^[45]

On June 20, 2016, Intel launched the Intel Xeon Phi product family x200 based on the Knights Landing architecture, stressing its applicability to not just traditional simulation workloads, but also to machine learning.^[46]^[47] The model lineup announced at launch included only Xeon Phi of bootable form-factor, but two versions of it: standard processors and processors with integrated Intel Omni-Path architecture fabric.^[48] The latter is denoted by the suffix F in the model number. Integrated fabric is expected to provide better the latency lower cost than discrete high-performance network cards.^[46]

On November 14th, the 48th list of Top500 contained 10 systems using Knights Landing platforms.

Knights Hill

Knights Hill is the codename for the third-generation MIC architecture, for which Intel announced the first details at SC14. It will be manufactured in a 10 nm process.^[49]

In April 2015, the United States Department of Energy announced that a supercomputer named Aurora will be deployed at Argonne National Laboratory^[50] based upon the "third-generation Intel Xeon Phi" processor.^[51]

Knights Mill

Knights Mill is Intel's codename for a Xeon Phi product specialized in deep learning.^[52] While little is known about Knights Mill yet, it has been announced that it will improve efficiency. It is also expected to support reduced variable precision which have been used to accelerate machine learning in other products, such as half-precision floating-point variables in Nvidia's Tesla.

Design

The cores of Knights Corner are based on a modified version of P54C design, used in the original Pentium.^[53] The basis of the Intel MIC architecture is to leverage x86 legacy by creating a x86-compatible multiprocessor architecture that can use existing parallelization software tools.^[17] Programming tools include OpenMP, OpenCL,^[54] Cilk/Cilk Plus and specialised versions of Intel's Fortran, C++^[55] and math libraries.^[56]

Design elements inherited from the Larrabee project include x86 ISA, 4-way SMT per core, 512-bit SIMD units, 32 KB L1 instruction cache, 32 KB L1 data cache, coherent L2 cache (512 KB per core^[57]), and ultra-wide ring bus connecting processors and memory.

The Knights Corner instruction set documentation is available from Intel.^[58]^[59]^[60]

Programming

An empirical performance and programmability study has been performed by researchers,^[61] in which the authors claim that achieving high performance with Xeon Phi still needs help from programmers and that merely relying on compilers with traditional programming models is still far from reality. However, research in various domains, such as life sciences,^[62] deep learning^[63] and computer-aided engineering^[64] demonstrated that exploiting both the thread- and SIMD-parallelism of Xeon Phi achieves significant speed-ups.

Competitors

Nvidia Tesla, a direct competitor in the HPC market^[65]
AMD FireStream, another direct competitor in the HPC market^[66]

References

1 2 Radek (June 18, 2012). "Chip Shot: Intel Names the Technology to Revolutionize the Future of HPC - Intel® Xeon® Phi™ Product Family". Intel. Retrieved December 12, 2012.
1 2 "TOP500 - June 2013". TOP500 - June 2013. TOP500. Retrieved June 18, 2013.
1 2 "Intel Powers the World's Fastest Supercomputer, Reveals New and Future High Performance Computing Technologies". Retrieved June 21, 2013.
↑ Charlie Demerjian (July 3, 2006), "New from Intel: It's Mini-Cores!", www.theinquirer.net, The Inquirer
↑ Seiler, L.; Cavin, D.; Espasa, E.; Grochowski, T.; Juan, M.; Hanrahan, P.; Carmean, S.; Sprangle, A.; Forsyth, J.; Abrash, R.; Dubey, R.; Junkins, E.; Lake, T.; Sugerman, P. (August 2008). "Larrabee: A Many-Core x86 Architecture for Visual Computing" (PDF). ACM Transactions on Graphics. Proceedings of ACM SIGGRAPH 2008. 27 (3): 18:11–18:11. doi:10.1145/1360612.1360617. ISSN 0730-0301. Retrieved 2008-08-06.
↑ Tom Forsyth, "SIMD Programming with Larrabee" (PDF), www.stanford.edu, Intel
↑ Ryan Smith (May 25, 2010), "Intel Kills Larrabee GPU, Will Not Bring a Discrete Graphics Product to Market\", www.anandtech.com, AnandTech
↑ Tony Bradley (December 3, 2009), "Intel 48-Core "Single-Chip Cloud Computer" Improves Power Efficiency", www.pcworld.com, PCWorld
↑ "Intel Research : Single-Chip Cloud Computer", techresearch.intel.com, Intel
↑ Ben Ames (February 11, 2007), "Intel Tests Chip Design With 80-Core Processor", www.pcworld.com, IDG News
↑ "Intel Details 80-Core Teraflops Research Chip - X-bit labs". xbitlabs.com. Retrieved August 27, 2015.
↑ "Intel's Teraflops Research Chip" (PDF), download.intel.com, Intel
↑ Anton Shilov (February 12, 2007), "Intel Details 80-Core Teraflops Research Chip", www.xbitlabs.com, Xbit laboratories
1 2 Rupert Goodwins (June 1, 2010), "Intel unveils many-core Knights platform for HPC", www.zdnet.co.uk, ZDNet
↑ "Intel News Release : Intel Unveils New Product Plans dor High-Performance Computing", www.intel.com, Intel, May 31, 2010
1 2 Mike Giles (June 24, 2010), "Runners and riders in GPU steeplechase" (PDF), people.maths.ox.ac.uk, pp. 8–10
1 2 3 4 Gareth Halfacree (June 20, 2011), "Intel pushes for HPC space with Knights Corner", www.thinq.co.uk, Net Communities Limited, UK
↑ "Intel Many Integrated Core Architecture" (PDF), www.many-core.group.cam.ac.uk, Intel, December 2010
↑ Rick Merritt (June 20, 2011), "OEMs show systems with Intel MIC chips", www.eetimes.com, EE Times
↑ Tom R. Halfhill (July 18, 2011), "Intel Shows MIC Progress", www.linleygroup.com, The Linley Group
↑ Andrea Petrou (June 20, 2011), "SGI wants Intel for super supercomputer", news.techeye.net
↑ ""Stampede's" Comprehensive Capabilities to Bolster U.S. Open Science Computational Resources", www.tacc.utexas.edu, Texas Advanced Computing Center, September 22, 2011
1 2 "Stampede: A Comprehensive Petascale Computing Environment" (PDF). IEEE Cluster 2011 Special Topic. Retrieved November 16, 2011.
↑ Yam, Marcus (November 16, 2011), "Intel's Knights Corner: 50+ Core 22nm Co-processor", www.tomshardware.com, Tom's Hardware, retrieved November 16, 2011
↑ Sylvie Barak (November 16, 2011), "Intel unveils 1 TFLOP/s Knights Corner", www.eetimes.com, EE Times, retrieved November 16, 2011
↑ James Reinders (June 5, 2012), Knights Corner: Open source software stack, Intel
↑ Prickett Morgan, Timothy (June 18, 2012), "Intel slaps Xeon Phi brand on MIC coprocessors", 222.theregister.co.uk
↑ Intel Corporation (June 18, 2012), "Latest Intel(R) Xeon(R) Processors E5 Product Family Achieves Fastest Adoption of New Technology on Top500 List", www.marketwatch.com, Intel(R) Xeon(R) Phi(TM) is the new brand name for all future Intel(R) Many Integrated Core Architecture based products targeted at HPC, enterprise, datacenters and workstations. The first Intel(R) Xeon(R) Phi(TM) product family member is scheduled for volume production by the end of 2012
↑ Raj Hazra (June 18, 2012). "Intel Xeon Phi coprocessors accelerate the pace of discovery and innovation". Intel. Retrieved December 12, 2012.
↑ Rick Merritt (June 18, 2012). "Cray will use Intel MIC, branded Xeon Phi". EETimes. Retrieved December 12, 2012.
↑ Terrence O'Brien (June 18, 2012). "Intel christens its 'Many Integrated Core' products Xeon Phi, eyes exascale milestone". Engadget. Retrieved December 12, 2012.
↑ Jeffrey Burt (June 18, 2012). "Intel Wraps Xeon Phi Branding Around MIC Coprocessors". EWeek. Retrieved December 12, 2012.
↑ Merritt, Rick (June 8, 2012), "Cray will use Intel MIC, branded Xeon Phi", www.eetimes.com
↑ Latif, Lawrence (June 19, 2012), "Cray to support Intel's Xeon Phi in Cascade clusters", www.theinquirer.net
↑ "ScaleMP vSMP Foundation to Support Intel Xeon Phi", www.ScaleMP.com, ScaleMP, June 20, 2012
↑ https://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner
1 2 3 IntelPR (November 12, 2012). "Intel Delivers New Architecture for Discovery with Intel® Xeon Phi™ Coprocessors". Intel. Retrieved December 12, 2012.
1 2 3 Agam Shah (November 12, 2012). "Intel ships 60-core Xeon Phi processor". Computerworld. Retrieved December 12, 2012.
1 2 3 Johan De Gelas (November 14, 2012). "The Xeon Phi at work at TACC". AnandTech. Retrieved December 12, 2012.
↑ "Tianhe-2 (MilkyWay-2)". Top500.org. November 14, 2015. Retrieved May 6, 2016.
↑ "Intel Xeon Phi 'Knights Landing' Features Integrated Memory With 500 GB/s Bandwidth and DDR4 Memory Support - Architecture Detailed". WCCFtech. Retrieved August 27, 2015.
↑ Sebastian Anthony (November 26, 2013), Intel unveils 72-core x86 Knights Landing CPU for exascale supercomputing, ExtremeTech
↑ Tom's Hardware: Intel Xeon Phi Knights Landing Now Shipping; Omni Path Update, Too. June 20, 2016
↑ James Reinders (July 23, 2013), AVX-512 Instructions, Intel
↑ http://www.nersc.gov/users/computational-systems/cori
1 2 2016 ISC High Performance: Intel's Rajeeb Hazra Delivers Keynote Address
↑ How Intel® Xeon Phi™ Processors Benefit Machine Learning/Deep Learning Apps and Frameworks
↑ Introducing the Intel® Xeon Phi™ Processor – Your Path to Deeper Insight
↑ Eric Gardner (November 25, 2014), What public disclosures has Intel made about Knights Landing?, Intel Corporation
↑ ALCF staff (April 9, 2015), Introducing Aurora
↑ ALCF staff (April 9, 2015), Aurora
↑ Smith, Ryan (17 August 2016). "Intel Announces Knight's Mill: A Xeon Phi for Deep Learning". Anandtech. Retrieved 17 August 2016.
↑ Hruska, Joel (July 30, 2012). "Intel's 50-core champion: In-depth on Xeon Phi". ExtremeTech. Ziff Davis, Inc. Retrieved December 2, 2012.
↑ Rick Merritt (June 20, 2011), "OEMs show systems with Intel MIC chips", www.eetimes.com, EE Times
↑ Efficient Hybrid Execution of C++ Applications using Intel(R) Xeon Phi(TM) Coprocessor, November 23, 2012, arXiv:1211.5530
↑ "News Fact Sheet: Intel Many Integrated Core (Intel MIC) Architecture ISC'11 Demos and Performance Description" (PDF), newsroom.intel.com, Intel, June 20, 2011, archived from the original (PDF) on 24 March 2012
↑ Tesla vs. Xeon Phi vs. Radeon. A Compiler Writer’s Perspective // The Portland Group (PGI), CUG 2013 Proceedings
↑ "Intel® Many Integrated Core Architecture (Intel MIC Architecture) - RESOURCES (including downloads)". Intel. Retrieved January 6, 2014.
↑ "Intel Xeon Phi Coprocessor Instruction Set Architecture Reference Manual" (PDF). Intel. September 7, 2012. Retrieved January 6, 2014.
↑ "Intel® Developer Zone: Intel Xeon Phi Coprocessor". Intel. Retrieved January 6, 2014.
↑ Fang, Jianbin; Sips, Henk; Zhang, Lilun; Xu, Chuanfu; Yonggang, Che; Varbanescu, Ana Lucia (2014). "Test-Driving Intel Xeon Phi" (PDF). Retrieved December 30, 2013.
↑ Accelerating DNA Sequence Analysis using Intel Xeon Phi, June 29, 2015, arXiv:1506.08612
↑ The Potential of the Intel Xeon Phi for Supervised Deep Learning, June 30, 2015, arXiv:1506.09067
↑ Margetts, L.; Arregui-Mena, J.D.; Hewitt, W.T.; Mason, L. (2–3 June 2016). Parallel finite element analysis using the Intel Xeon Phi. Emerging Technology Conference EMiT2016. Barcelona, Spain.
↑ Jon Stokes (June 20, 2011). "Intel takes wraps off 50-core supercomputing processor plans". Ars Technica.
↑ Johan De Gelas. "Conclusions - Intel's Xeon E5-2600 V2: 12-core Ivy Bridge EP for Servers". anandtech.com. Retrieved August 27, 2015.
1 2 Johan De Gelas (September 11, 2012). "Intel's Xeon Phi in 10 Petaflops supercomputer". AnandTech. Retrieved December 12, 2012.

External links

Wikimedia Commons has media related to Intel MIC.

Intel pages: Xeon Phi Product Family
Hazra, Raj (June 18, 2012), "Intel® Xeon® Phi™ coprocessors accelerate the pace of discovery and innovation", blogs.intel.com, Today, with the announcement of Intel® Xeon® Phi™ coprocessors, we’re going to accelerate the pace of these discoveries and innovations. Intel® Xeon Phi products extend the Intel® Xeon® brand..
Intel teaches Xeon Phi x86 coprocessor snappy new tricks

Intel processors

Discontinued

BCD oriented (4-bit)	4004 (1971) 4040 (1974)

pre-x86 (8-bit)	8008 (1972) 8080 (1974) 8085 (1977)

Early x86 (16-bit)	8086 (1978) 8088 (1979) 80186 (1982) 80188 (1982) 80286 (1982)

x87 (external FPUs)	8/16-bit databus 8087 (1980) 16-bit databus 80187 80287 80387SX 32-bit databus 80387DX 80487

IA-32 (32-bit)	80386 SX 376 EX 80486 SX DX2 DX4 SL RapidCAD OverDrive A100/A110 Celeron (1998) M D (2004) Pentium Original OverDrive Pro II II OverDrive III 4 M Dual-Core Core Solo Duo

x86-64 (64-bit)	Celeron D Dual-Core Pentium 4 D Extreme Edition Dual-Core Core 2 i7 (some)

Other	CISC iAPX 432 RISC i860 i960 StrongARM XScale

Current

IA-32 (32-bit)	Tolapai Atom CE SoC Quark

x86-64 (64-bit)	Atom CE SoC Celeron Pentium Core i3 i5 i7 M Xeon E7 E5 E3 Phi

EPIC	Itanium

Lists

Atom Celeron Core 2 i3 i5 i7 M Itanium Pentium Pro II III 4 D M Xeon

Related

Chipsets PCHs SCHs ICHs PIIXs GPUs Codenames GMA HD and Iris Graphics

Microarchitectures

P5	800 nm P5 600 nm P54C 350 nm P54CS P55C 250 nm Tillamook

P6 / Pentium M / Enhanced Pentium M	500 nm P6 350 nm P6 Klamath 250 nm Mendocino Dixon Tonga Covington Deschutes Katmai Drake Tanner 180 nm Coppermine Coppermine T Timna Cascades 130 nm Tualatin Banias 90 nm Dothan Stealey Tolapai Canmore 65 nm Yonah Sossaman

NetBurst	180 nm Willamette Foster 130 nm Northwood Gallatin Prestonia 90 nm Tejas and Jayhawk Prescott Smithfield Nocona Irwindale Cranford Potomac Paxville 65 nm Cedar Mill Presler Dempsey Tulsa

Core / Penryn	65 nm Merom-L Merom Conroe-L Allendale Conroe Kentsfield Woodcrest Clovertown Tigerton 45 nm Penryn Penryn-QC Wolfdale Yorkfield Wolfdale-DP Harpertown Dunnington

Bonnell / Saltwell	45 nm Silverthorne Diamondville Pineview Lincroft Tunnel Creek Stellarton Sodaville Groveland 32 nm Cedarview Penwell Cloverview Berryville Centerton

Nehalem / Westmere	45 nm Clarksfield Lynnfield Jasper Forest Bloomfield Gainestown (Nehalem-EP) Beckton (Nehalem-EX) 32 nm Arrandale Clarkdale Gulftown (Westmere-EP) Westmere-EX

Sandy Bridge / Ivy Bridge	32 nm Sandy Bridge Sandy Bridge-E Gladden 22 nm Ivy Bridge Ivy Bridge-EP Ivy Bridge-EX

Haswell / Broadwell	22 nm Haswell 14 nm Broadwell

Silvermont / Airmont	22 nm Valleyview Tangier Anniedale 14 nm Cherryview

Skylake/Kaby Lake/Coffee Lake/Cannonlake	14 nm Skylake Kaby Lake Coffee Lake 10 nm Cannonlake

Goldmont	14 nm Goldmont

Future (Icelake/Tigerlake)	10 nm Icelake Tigerlake

This article is issued from Wikipedia - version of the 12/3/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.