跳转至

Intel Advisor

导言

user-friendly performance tool on Intel platform.

Excellent Video Resource

We're still on the lookout for an exceptional blog or overview paper to complement our understanding of this topic. Stay tuned for updates!

Outstanding Blog or Overview Paper

The key words are "rethink", "perspective"

Overview

Overview of Perspectives

CPU / Memory Roofline Modeling

Roofline Summary

Guess: In comment, the data is from L1 traffic

Roofline under 3 grouped bandwidth(CARM L2 L3)
  1. Diagram show the info about L1,L2,L3 DRAM bandwidth and theoretical compute bound
  2. Roofline Arithmetic Intensity: $$ AI = \frac{Performance \times SelfTime}{SelfMemoryTraffic} $$
CARM(L1 + NTS)

Cache Aware Roofline Model (CARM)

NTS: Non Tempraty Store, direct-store2DRAM

CPU Metrics : Self / Total Time/Memory

Self Time: Time actively executing a function/loop, excluding time for callees.1

Total Time: Time actively executing a function/loop, including time for callees.

Total Elapsed Time: Total Time-based wall time from beginning to end of loop/function execution, including time for callees

Self Memory (GB): Data transfers between CPU and memory subsystem (total traffic, including caches and DRAM) in gigabytes, excluding transfers for callees.(Still confusing)

Roofline with diff-size/color nodes
  1. Each node corresponds one to one with the Topdown stack function call int the bottom of diagram.
  2. deep color and big size means more Self Time occupation

参考文献

评论