LLVM-MCA: Install&RunTests
github¶
https://github.com/llvm/llvm-project/tree/main/llvm/tools/llvm-mca
Quick Start¶
安装¶
下载可执行文件上传服务器,解压
安装遇到的问题¶
- cannot find libtinfo.so.5
- sudo apt install libncurses5
- ln -s /usr/lib/libncursesw.so.6 /usr/lib/libtinfo.so.5 或者类似的 ln -s /usr/lib/libncurses.so.5 /usr/lib/libtinfo.so.5
- 在/snap/core下找到了,但是这是什么目录?是之前Ubuntu的包管理工具,但是已经不用了。
从源码安装¶
node5¶
由于之后要写代码的,还是从头安装更好。
cd llvm-project
mkdir build
cmake -S llvm -B build -G "Unix Makefiles" -DLLVM_ENABLE_PROJECTS="clang;llvm-mca" -DCMAKE_INSTALL_PREFIX="~/Install/llvm" -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_ASSERTIONS=On
cd build
make -j32
make install
kunpeng¶
cmake -S llvm -B build -G "Unix Makefiles" -DLLVM_ENABLE_PROJECTS=all -DCMAKE_INSTALL_PREFIX="~/Install/llvm" -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_ASSERTIONS=On
#change cmake or -DLLVM_ENABLE_PROJECTS="all"
g++: error: unrecognized command line option ‘-mllvm’
g++: error: unrecognized command line option ‘--tail-merge-threshold=0’
g++: error: unrecognized command line option ‘-combiner-global-alias-analysis’
cmake -S llvm -B build -G "Unix Makefiles" -DLLVM_ENABLE_PROJECTS="clang;llvm-mca" -DCMAKE_INSTALL_PREFIX="~/Install/llvm" -DLLVM_TARGETS_TO_BUILD=AArch64 -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_ASSERTIONS=On
使用¶
由于不是X86,llc --version
查看到target是 aarch64-unknown-linux-gnu
clang /home/shaojiemike/Download/llvm-project-main/lldb/test/API/lang/c/forward/foo.c -O2 -target aarch64-unknown-linux-gnu -S -o -|llvm-mca -timeline -show-encoding -all-stats -all-views
可以看出是用TSV110Unit的port,默认cpu是tsv110
名词解释¶
ALU/BRU¶
算数逻辑单元 ALU 负责处理整数运算指令. 跳转处理单元BRU 负责处理跳转指令. BRU 可以与 ALU 合并, 复用 ALU 的逻辑来计算跳转指令的条件和跳转地址, 也可以作为一个单独的功能单元接入到流水线中.
MDU¶
乘除法单元 MDU (mult-divide unit)
需要进一步的研究学习¶
- llvm-mca微指令怎么实现的,怎么把汇编变成微指令
- 在view里加memory的实现
- 考虑了cache命中等影响 https://github.com/andreas-abel/uiCA uops
- 鲲鹏架构 https://bbs.huaweicloud.com/community/usersnew/id_1513665626477516
遇到的问题¶
llvm-mca -mcpu=help
竟然会卡住,不知道为什么- 所以说是华为已经写了一个叫tsv110的,实现2个功能?
开题缘由、总结、反思、吐槽~~¶
参考文献¶
无
样例输出¶
Iterations: 100
Instructions: 200
Total Cycles: 70
Total uOps: 200
Dispatch Width: 4
uOps Per Cycle: 2.86
IPC: 2.86
Block RThroughput: 0.5
No resource or data dependency bottlenecks discovered.
Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[7]: Encoding Size
[1] [2] [3] [4] [5] [6] [7] Encodings: Instructions:
1 1 0.33 4 20 00 80 52 mov w0, #1
1 1 0.50 U 4 c0 03 5f d6 ret
Dynamic Dispatch Stall Cycles:
RAT - Register unavailable: 0
RCU - Retire tokens unavailable: 0
SCHEDQ - Scheduler full: 0
LQ - Load queue full: 0
SQ - Store queue full: 0
GROUP - Static restrictions on the dispatch group: 0
Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
[# dispatched], [# cycles]
0, 20 (28.6%)
4, 50 (71.4%)
Schedulers - number of cycles where we saw N micro opcodes issued:
[# issued], [# cycles]
0, 3 (4.3%)
2, 1 (1.4%)
3, 66 (94.3%)
Scheduler's queue usage:
No scheduler resources used.
Retire Control Unit - number of cycles where we saw N instructions retired:
[# retired], [# cycles]
0, 3 (4.3%)
2, 1 (1.4%)
3, 66 (94.3%)
Total ROB Entries: 128
Max Used ROB Entries: 59 ( 46.1% )
Average Used ROB Entries per cy: 32 ( 25.0% )
Register File statistics:
Total number of mappings created: 100
Max number of mappings used: 29
Resources:
[0.0] - TSV110UnitAB
[0.1] - TSV110UnitAB
[1] - TSV110UnitALU
[2] - TSV110UnitFSU1
[3] - TSV110UnitFSU2
[4.0] - TSV110UnitLdSt
[4.1] - TSV110UnitLdSt
[5] - TSV110UnitMDU
Resource pressure per iteration:
[0.0] [0.1] [1] [2] [3] [4.0] [4.1] [5]
0.66 0.67 0.67 - - - - -
Resource pressure by instruction:
[0.0] [0.1] [1] [2] [3] [4.0] [4.1] [5] Instructions:
0.33 - 0.67 - - - - - mov w0, #1
0.33 0.67 - - - - - - ret
Timeline view:
Index 0123456789
[0,0] DeER . . mov w0, #1
[0,1] DeER . . ret
[1,0] DeER . . mov w0, #1
[1,1] D=eER. . ret
[2,0] .DeER. . mov w0, #1
[2,1] .DeER. . ret
[3,0] .D=eER . mov w0, #1
[3,1] .D=eER . ret
[4,0] . DeER . mov w0, #1
[4,1] . D=eER . ret
[5,0] . D=eER . mov w0, #1
[5,1] . D=eER . ret
[6,0] . D=eER . mov w0, #1
[6,1] . D=eER . ret
[7,0] . D=eER . mov w0, #1
[7,1] . D==eER. ret
[8,0] . D=eER. mov w0, #1
[8,1] . D=eER. ret
[9,0] . D==eER mov w0, #1
[9,1] . D==eER ret
Average Wait times (based on the timeline view):
[0]: Executions
[1]: Average time spent waiting in a scheduler's queue
[2]: Average time spent waiting in a scheduler's queue while ready
[3]: Average time elapsed from WB until retire stage
[0] [1] [2] [3]
0. 10 1.7 1.7 0.0 mov w0, #1
1. 10 2.0 2.0 0.0 ret
10 1.9 1.9 0.0 <total>