Intel Advisor
导言
user-friendly performance tool on Intel platform.
Excellent Video Resource
We're still on the lookout for an exceptional blog or overview paper to complement our understanding of this topic. Stay tuned for updates!
Outstanding Blog or Overview Paper
The key words are "rethink", "perspective"
Overview¶
CPU / Memory Roofline Modeling¶
Guess: In comment, the data is from L1 traffic
- Diagram show the info about L1,L2,L3 DRAM bandwidth and theoretical compute bound
- Roofline Arithmetic Intensity: $$ AI = \frac{Performance \times SelfTime}{SelfMemoryTraffic} $$
CARM(L1 + NTS)
Cache Aware Roofline Model (CARM)
NTS: Non Tempraty Store, direct-store2DRAM
CPU Metrics : Self / Total Time/Memory
Self Time: Time actively executing a function/loop, excluding time for callees.1
Total Time: Time actively executing a function/loop, including time for callees.
Total Elapsed Time: Total Time-based wall time from beginning to end of loop/function execution, including time for callees
Self Memory (GB): Data transfers between CPU and memory subsystem (total traffic, including caches and DRAM) in gigabytes, excluding transfers for callees.(Still confusing)
- Each node corresponds one to one with the Topdown stack function call int the bottom of diagram.
- deep color and big size means more
Self Time
occupation