Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
# Intel Hard- and Software
### 7 Levels of Parallelism
1. Node level
2. Socket level
- NUMA
- 2-, 4-, or 8-socket
3. Hyperthreading: 2 logical threads on one core
- less cache per thread
- useful with jobs utilizing different parts of core - disable on HPC
- now not much of a penalty when enabled
4. Mesh
5. GPU-CPU
- subslice = core (8 on Intel Graphics Gen 11)
- exec-unit = thread (8)
6. Instruction level: out-of order execution on different ports/execution units (max 4 IPC [instructions per cycle])
7. Data parallelism
<!--- ![](2020_02_06-13_48-01-iPhone 8-1933.jpg "")
![](2020_02_06-13_48-59-iPhone 8-1934.jpg "")
--->
### Vectorization
![ParStudio](2020_02_06-10_23-07-iPhone 8-1924.jpg "Inside Parallel Studio XE")
- Performance libraries
- Compiler (fully automized)
- Compiler with vectorization hints (`#pragma`)
- user mandated vectorization (SIMD directive)
- SIMD intrinsic dlass (`F32vec4 add`)
- Vector intrinsic (`mm_add_ps()`)
- assembler code (`addps)
### oneAPI: New Foundation for Exascale Computing
- unified memory (CPU/GPU)
- all-to-all connectivity
### Summary
- Code modernization not always easy (analyze & optimize)
- data / task parallelism
![IntelPy](2020_02_06-10_24-57-iPhone 8-1925.jpg "Intel Distribution for Python")
![MKL](2020_02_06-10_28-11-iPhone 8-1927.jpg "Math Kernel Library")
![DAAL](2020_02_06-10_36-47-iPhone 8-1928.jpg "Data Analytics Acceleration Library")
![DAAL Algo](2020_02_06-10_37-46-iPhone 8-1929.jpg "DAAL Algorithms")
![Diagnostic Tool](2020_02_06-10_46-13-iPhone 8-1930.jpg "Diagnostic Toolset for High Performance Compute Clusters")
![Demo](2020_02_06-10_55-10-iPhone 8-1931.jpg "Demo")
![noFP](2020_02_06-11_00-12-iPhone 8-1932.jpg "Removing FP converts")
### Parallel Studio
transition to **oneAPI** in 2020
in 2020 version
- VNNI (Vector Neural Network inference) for AI inference speedup
- persistent memory (Optane) compatible with RAM
- expanded standard support
- Fortran 2018
- C++ 17 (20 in initial stage)
- move to LLVM (slowly), backend switchable
- Extended Coarse Grain Profiling
- HPC cloud support
- New OS support (e.g., Amazon)
#### Compiler (v19.0)
#### Python
Take advantage of Intel's python distribution:
[add Intel channel for *anaconda*](https://software.intel.com/en-us/articles/using-intel-distribution-for-python-with-anaconda) ([in general](https://software.intel.com/en-us/distribution-for-python))
Speedup achieved by optimizing
- *numba* (utilizing MKL)
- *scikit learn* (utilizing [DAAL](http://software.intel.com/daal))
##### URLs
- [Installation](https://software.intel.com/en-us/articles/using-intel-distribution-for-python-with-anaconda)
- [Anaconda packages](https://anaconda.org/intel/repo)
### Performance tools
- VTune: HPC tuning - now even broader coverage (demo in afternoon)
- [performance snapshot](intel.com/performance-snapshot) for high level view
- [Advisor](software.intel.com/advisor) provides information on
- threads
- vectorization
- GPU (offload advisor, coming; so far only Intel *gen9* or *gen11*)
### [MPI library](software.intel.com/intel-mpi-library)
### Intel cluster checker
### Vectorization issues
- Default: compile to early 2000s CPU - only vector length 2 => need compiler flag(s) for better optimization
- technical bit: when using AVX instructions clock frequency is reduced to compensate for extra power consumption
#### Example *N*-body problem
Convert Array of Structures (AoS) -> Structure of Arrays (SoA) for better aligned memory access
---
## [oneAPI](https://software.intel.com/en-us/oneapi)
open standard for unified programming model across hardware platforms
![oneAPI base](2020_02_06-14_11-18-iPhone 8-1935.jpg "oneAPI Base Toolkit")
**C++** (11) + **SYCL** + **Extensions**
### API libraries
- Math
- Analytics/ML
- DNN
- ...
### oneAPI Toolkits
- Base
- currently beta
- direct programming (DPC++)
- HPC
- DL
- Rendering
- OpenVINO
- AI Analytics
![DPC++](2020_02_06-14_16-21-iPhone 8-1936.jpg "DPC++ Compatibility Tool")
![OA.py](2020_02_06-14_47-42-iPhone 8-1937.jpg "Behind run_oa.py")
[Developer Access](http://software.intel.com/devcloud/oneapi)
GPU target currently **OpenCL-based**, unclear about NVidia support
- https://software.intel.com/en-us/oneapi
- https://github.com/intel/llvm
## Impressions on the way home
![pano](2020_02_06-18_11-49-iPhone 8-1939.jpg "Messesee Panorama")
![across](2020_02_06-18_12-18-iPhone 8-1941.jpg "across the lake")
![Entrance](2020_02_06-18_12-49-iPhone 8-1942.jpg "Entrance")
![across2](2020_02_06-18_12-58-iPhone 8-1943.jpg "across2")