跳转至

谭邵杰的计算机奇妙之旅

IPCC Preliminary SLIC Optimization 3

元数据
- 2021年8月6日
- 分类于 Tutorials
- 需要 2 分钟阅读时间

IPCC Preliminary SLIC Optimization 3

node6¶

因为例子太小，导致之前的分析时间波动太大。所以写了个了大一点的例子，而且给每个函数加上了时间的输出，好分析是否有加速。(Qrz,node5有人在用。

技术路线	描述	总时间	加速比	备注
Baseline	串行程序	207 s	1
simpleomp	两处omp	57s
more1omp	maxlab	48s
more2omp	sigma + delete maxxy	24.8s	8.35
more3omp	DetectLabEdges + EnforceLabelConnectivity(该算法无法并行)	21.2s
icpc		13.4s
+ -O3		13.2s
+ -xHost		13.09s
+ -Ofast -xHost	基于icpc	12.97s
+ -ipo		12.73s	16.26
-no-prec-div -static -fp-model fast=2		14.2s		时间还多了，具体其他选项需要到AMD机器上试
### Baseline 207s
1. DoRGBtoLABConversion 10.4s
2. PerformSuperpixelSegmentation_VariableSandM 187.3s
1. core 15.3s
2. maxlab 1s
3. sigma 2.3s
### simpleomp 57s
1. DoRGBtoLABConversion 0.89s
2. PerformSuperpixelSegmentation_VariableSandM 46s
1. core 0.94-1.8s
2. maxlab 1s
3. sigma 2.3-2.6s
### more1omp 48s
1. DoRGBtoLABConversion 0.82s
2. PerformSuperpixelSegmentation_VariableSandM 37s
1. core 1-2.3s
2. maxlab 0.04-0.1s
3. sigma 2.3s
### more2omp 24.8s
1. DoRGBtoLABConversion 0.85s
2. PerformSuperpixelSegmentation_VariableSandM 13.5s
1. core 0.8-1.7s
2. maxlab 0.02-0.1s
3. sigma 0.1s
3. DetectLabEdges 3.7s
4. EnforceLabelConnectivity 5.2s

more2omp 21.2s¶

DoRGBtoLABConversion 0.74s
PerformSuperpixelSegmentation_VariableSandM 12.3s
core 1.1s
maxlab 0.02-0.1s
sigma 0.1s
DetectLabEdges 0.7s
EnforceLabelConnectivity 5.8s (需要换算法
PerformSuperpixelSegmentation_VariableSandM (vector声明的时间,可以考虑拿到外面去） 1.6s

icpc 13.4s¶

DoRGBtoLABConversion 0.44s
PerformSuperpixelSegmentation_VariableSandM 8.49s
core 0.5-1.1s
maxlab 0.04s
sigma 0.05s
DetectLabEdges 0.54s
EnforceLabelConnectivity 2.79s (需要换算法
PerformSuperpixelSegmentation_VariableSandM (vector声明的时间,可以考虑拿到外面去） 1.16s

12.7s¶

DoRGBtoLABConversion 0.42s
PerformSuperpixelSegmentation_VariableSandM 7.98s
core 0.5-1.1s
maxlab 0.04s
sigma 0.05s
DetectLabEdges 0.49s
EnforceLabelConnectivity 2.69s (需要换算法
PerformSuperpixelSegmentation_VariableSandM (vector声明的时间,可以考虑拿到外面去） 1.13s

IPCC AMD¶

技术路线	描述	总时间	加速比	备注
Baseline	串行程序	161.7s s	1
more3omp	前面都是可以证明的有效优化 omp_num=32	14.08s
more3omp	前面都是可以证明的有效优化 omp_num=64	11.4s
deletevector	把sz大小的3个vector,移到全局变量，但是需要提前知道sz大小/声明一个特别大的	10.64s	可以看出写成全局变量也不会影响访问时间
enforce_Lscan	ipcc opt 4	8.49s
### Baseline 161.7s
1. DoRGBtoLABConversion 11.5s
2. PerformSuperpixelSegmentation_VariableSandM 143s
1. core 11.5s
2. maxlab 0.8s
3. sigma 1.7s
3. DetectLabEdges 2.74s
4. EnforceLabelConnectivity 3.34s
5. PerformSuperpixelSegmentation_VariableSandM 1.11s

more2omp 14.08s¶

DoRGBtoLABConversion 0.69s
PerformSuperpixelSegmentation_VariableSandM 8.08s
core 0.73s
maxlab 0.02s
sigma 0.05s
DetectLabEdges 0.37s
EnforceLabelConnectivity 3.8s
PerformSuperpixelSegmentation_VariableSandM 1.1s

more2omp 11.4s¶

DoRGBtoLABConversion 0.61s
PerformSuperpixelSegmentation_VariableSandM 5.86s
core 0.53s
maxlab 0.02s
sigma 0.03s
DetectLabEdges 0.33s
EnforceLabelConnectivity 3.5s
PerformSuperpixelSegmentation_VariableSandM 1.02s

deletevector 10.64s¶

DoRGBtoLABConversion 0.59s
PerformSuperpixelSegmentation_VariableSandM 5.75s
core 0.53s
maxlab 0.02s
sigma 0.03s
DetectLabEdges 0.41s
EnforceLabelConnectivity 3.84s
PerformSuperpixelSegmentation_VariableSandM 0s

enforce_Lscan 8.49s¶

DoRGBtoLABConversion 0.56s
PerformSuperpixelSegmentation_VariableSandM 5.52s
core 0.53s
maxlab 0.02s
sigma 0.03s
DetectLabEdges 0.31s
EnforceLabelConnectivity 1.19s
PerformSuperpixelSegmentation_VariableSandM 0.88s

需要进一步的研究学习¶

外面声明vector
EnforceLabelConnectivity 换并行算法
数据结构要求：
1. 保存已经染色区域的位置，之后可能要还原
  1. 可以无序，有序最好，会访存连续
  2. x,y或者index也行。还是xy好判断边界
2. 是4分还是8分，既然有重复，记录来的方向/路径,只向某方向移动。4是符合理论的，8不和要求，2有情况不能全部遍历。
3. 3分倒是可以，但是实现小麻烦
flood fill 与 PBFS 特定结合
openmp线程池+锁(sz 大小的两个数组存 x y，nlabels存新的分类结果)+计时声明与flood+把这些在sz声明放外面
openmp线程池+队列(最后可以并行处理吧，要一个个pop?)+需要锁吗(这取决于队列的实现有没有靠计数器)
openmpfor+双队列*4/2？+需要锁吗
扫描行实现 + 上下建线程，左右在线程里跑
1. 多线程的访问存储连续性
队列/栈是怎么实现代码的，速度怎么样（写入读取push pop，还有size）
栈有size吗
在AMD机器加入MPI进行混合编程，运行2节点

遇到的问题¶

暂无

开题缘由、总结、反思、吐槽~~¶

参考文献¶

无