Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

Intel Hard- and Software

7 Levels of Parallelism

  1. Node level
  2. Socket level
    • NUMA
    • 2-, 4-, or 8-socket
  3. Hyperthreading: 2 logical threads on one core
    • less cache per thread
    • useful with jobs utilizing different parts of core - disable on HPC
    • now not much of a penalty when enabled
  4. Mesh
  5. GPU-CPU
    • subslice = core (8 on Intel Graphics Gen 11)
    • exec-unit = thread (8)
  6. Instruction level: out-of order execution on different ports/execution units (max 4 IPC [instructions per cycle])
  7. Data parallelism

Vectorization

![ParStudio](2020_02_06-10_23-07-iPhone 8-1924.jpg "Inside Parallel Studio XE")

  • Performance libraries
  • Compiler (fully automized)
  • Compiler with vectorization hints (#pragma)
  • user mandated vectorization (SIMD directive)
  • SIMD intrinsic dlass (F32vec4 add)
  • Vector intrinsic (mm_add_ps())
  • assembler code (`addps)

oneAPI: New Foundation for Exascale Computing

  • unified memory (CPU/GPU)
  • all-to-all connectivity

Summary

  • Code modernization not always easy (analyze & optimize)
  • data / task parallelism ![IntelPy](2020_02_06-10_24-57-iPhone 8-1925.jpg "Intel Distribution for Python") ![MKL](2020_02_06-10_28-11-iPhone 8-1927.jpg "Math Kernel Library") ![DAAL](2020_02_06-10_36-47-iPhone 8-1928.jpg "Data Analytics Acceleration Library") ![DAAL Algo](2020_02_06-10_37-46-iPhone 8-1929.jpg "DAAL Algorithms") ![Diagnostic Tool](2020_02_06-10_46-13-iPhone 8-1930.jpg "Diagnostic Toolset for High Performance Compute Clusters") ![Demo](2020_02_06-10_55-10-iPhone 8-1931.jpg "Demo") ![noFP](2020_02_06-11_00-12-iPhone 8-1932.jpg "Removing FP converts")

Parallel Studio

transition to oneAPI in 2020

in 2020 version

  • VNNI (Vector Neural Network inference) for AI inference speedup
  • persistent memory (Optane) compatible with RAM
  • expanded standard support
    • Fortran 2018
    • C++ 17 (20 in initial stage)
    • move to LLVM (slowly), backend switchable
  • Extended Coarse Grain Profiling
  • HPC cloud support
  • New OS support (e.g., Amazon)

Compiler (v19.0)

Python

Take advantage of Intel's python distribution: add Intel channel for anaconda (in general)

Speedup achieved by optimizing

  • numba (utilizing MKL)
  • scikit learn (utilizing DAAL)
URLs

Performance tools

  • VTune: HPC tuning - now even broader coverage (demo in afternoon)
  • performance snapshot for high level view
  • Advisor provides information on
    • threads
    • vectorization
    • GPU (offload advisor, coming; so far only Intel gen9 or gen11)

MPI library

Intel cluster checker

Vectorization issues

  • Default: compile to early 2000s CPU - only vector length 2 => need compiler flag(s) for better optimization
  • technical bit: when using AVX instructions clock frequency is reduced to compensate for extra power consumption

Example N-body problem

Convert Array of Structures (AoS) -> Structure of Arrays (SoA) for better aligned memory access


oneAPI

open standard for unified programming model across hardware platforms ![oneAPI base](2020_02_06-14_11-18-iPhone 8-1935.jpg "oneAPI Base Toolkit")

C++ (11) + SYCL + Extensions

API libraries

  • Math
  • Analytics/ML
  • DNN
  • ...

oneAPI Toolkits

  • Base
    • currently beta
    • direct programming (DPC++)
  • HPC
  • DL
  • Rendering
  • OpenVINO
  • AI Analytics

![DPC++](2020_02_06-14_16-21-iPhone 8-1936.jpg "DPC++ Compatibility Tool") ![OA.py](2020_02_06-14_47-42-iPhone 8-1937.jpg "Behind run_oa.py")

Developer Access

GPU target currently OpenCL-based, unclear about NVidia support

Impressions on the way home

![pano](2020_02_06-18_11-49-iPhone 8-1939.jpg "Messesee Panorama") ![across](2020_02_06-18_12-18-iPhone 8-1941.jpg "across the lake") ![Entrance](2020_02_06-18_12-49-iPhone 8-1942.jpg "Entrance") ![across2](2020_02_06-18_12-58-iPhone 8-1943.jpg "across2")