2023¶

2023年9月6日
分类于 Tutorials
需要 4 分钟阅读时间

Conda

conda

Anaconda和Miniconda都是针对数据科学和机器学习领域的Python发行版本，它们包含了许多常用的数据科学包和工具，使得安装和管理这些包变得更加简单。

解决了几个痛点：

不同python环境的切换(类似VirtualEnv)
高效的包管理工具(类似pip,特别是在Windows上好用)

anaconda

Anaconda是一个全功能的Python发行版本，由Anaconda, Inc.（前称Continuum Analytics）提供。

它包含了Python解释器以及大量常用的数据科学、机器学习和科学计算的第三方库和工具，如NumPy、Pandas、Matplotlib、SciPy等。
Anaconda还包含一个名为Conda的包管理器，用于安装、更新和管理这些库及其依赖项。
Anaconda发行版通常较大(500MB)，因为它预装了许多常用的包，适用于不希望从头开始搭建环境的用户。

Miniconda

Miniconda是Anaconda的轻量级版本(50MB)，它也由Anaconda, Inc.提供。

与Anaconda不同，Miniconda只包含了Python解释器和Conda包管理器，没有预装任何其他包。这意味着用户可以根据自己的需求手动选择要安装的包，从而实现一个精简而高度定制化的Python环境。
对于希望从零开始构建数据科学环境或需要更细粒度控制的用户，Miniconda是一个很好的选择。

Install miniconda

According to the official website,

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# choose local path to install, maybe ~/.local
# init = yes, will auto modified the .zshrc to add the miniconda to PATH

# If you'd prefer that conda's base environment not be activated on startup,
#    set the auto_activate_base parameter to false:
conda config --set auto_activate_base false

you need to close all terminal(all windows in one section including all split windows), and reopen a terminal will take effect;

Python on windows¹

换源

超时报错：CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.anaconda.com/pkgs/main/linux-64/repodata.json>

面对如下报错

> conda create -n opensora-t00906153 python=3.8 -y
Channels:
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): failed

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.anaconda.com/pkgs/main/linux-64/repodata.json>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

If your current network has https://repo.anaconda.com blocked, please file
a support request with your network engineering team.

'https//repo.anaconda.com/pkgs/main/linux-64'

修改~/.condarc

ssl_verify: true
show_channel_urls: true

channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
  - conda-forge

如果还是有超时错误，多半是下载多了被拦截

通过 curl -v http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ 检查是不是请求被阻拦了。

可以换成科大源或者default 源。

- https://mirrors.ustc.edu.cn/anaconda/pkgs/main/
- https://mirrors.ustc.edu.cn/anaconda/pkgs/free/

创建与激活

# 激活环境(base)，路径为指定的 conda 安装路径下的 `bin/activate` 文件
source /home/m00876805/anaconda3/bin/activate
# 或者 conda init zsh

# 使用以下命令创建一个名为"myenv"的虚拟环境（您可以将"myenv"替换为您喜欢的环境名称）：
conda create --name myenv python=3.8

# list existed env
conda env list
/home/m00876805/anaconda3/bin/conda env list

# 查看具体环境的详细信息
conda env export --name <env_name>

# 激活,退出
conda activate name
conda deactivate name

`conda pack`

目的: conda pack 用于将现有的 Conda 环境打包成一个压缩文件（如 .tar.gz），便于在其他系统上分发和安装。
打包内容: 打包的内容包括环境中的所有依赖、库和包(定制修改包)，通常用于在不使用 Anaconda 或 Miniconda 的系统上还原环境。
恢复方式: 打包后的环境可以解压缩到指定位置，之后运行 conda-unpack 来修复路径，使其在新环境中正常工作。

打包

conda-pack 可以将 Conda 环境打包成一个 .tar.gz 文件，以便于跨机器或系统移动和还原环境。以下是使用 conda-pack 打包和还原环境的步骤：

1. 打包环境

假设要打包的环境名为 my_env：

conda pack -n my_env -o my_env.tar.gz

这会在当前目录生成一个 my_env.tar.gz 文件。你可以将这个文件复制到其他系统或机器上解压还原。

2. 还原环境

在一个特定的 conda 环境目录（例如 /home/anaconda3）下还原和激活打包的环境，可以按以下步骤操作：

假设场景

目标 conda 激活路径：/home/anaconda3/bin/activate
打包文件：my_env.tar.gz
解压后的环境名称：my_env

步骤

解压文件到 conda 环境目录

首先，将打包文件解压到指定的 conda 环境目录下的 envs 目录：

mkdir -p /home/anaconda3/envs/my_env
tar -xzf my_env.tar.gz -C /home/anaconda3/envs/my_env --strip-components 1

这里的 --strip-components 1 会去掉 tar.gz 包中的顶层目录结构，使内容直接解压到 my_env 文件夹内。

激活并修复环境

激活该环境，并运行 conda-unpack 来修复路径：

source /home/anaconda3/bin/activate /home/anaconda3/envs/my_env
conda-unpack

现在，my_env 环境已在 /home/anaconda3 目录下的 envs 文件夹中完成还原，可以正常使用。

`conda env export`

目的: conda env export > freeze.yml 用于导出当前 Conda 环境的配置，包括所有安装的包和它们的版本信息，以 YAML 格式保存。
导出内容: 导出的内容主要是依赖项和版本号，而不包括包的实际二进制文件。适用于在相同或不同系统上重建环境。
恢复方式: 使用 conda env create -f freeze.yml 可以根据导出的 YAML 文件创建一个新环境。

conda list -e > requirements.txt 和 conda env export > freeze.yml

conda list -e > requirements.txt 和 conda env export > freeze.yml 都是用于记录和管理 Conda 环境中安装的包，但它们之间有一些关键的区别：

conda list -e > requirements.txt
conda install --yes --file requirements.txt

conda list -e

用途: 这个命令生成一个以简单文本格式列出当前环境中所有包及其版本的文件（requirements.txt）。
内容: 列出的内容通常仅包括包的名称和版本，而不包含环境的依赖关系、渠道等信息。
安装方式: 通过 conda install --yes --file requirements.txt 可以尝试使用 Conda 安装这些列出的包。这种方式适合简单的包管理，但可能在处理复杂依赖时存在问题。

conda env export

用途: 这个命令生成一个 YAML 文件（freeze.yml），它包含了当前环境的完整配置，包括所有包、版本、渠道等信息。
内容: 导出的 YAML 文件包含了完整的依赖关系树，可以确保在重建环境时完全匹配原始环境的状态。
安装方式: 通过 conda env create -f freeze.yml 可以根据 YAML 文件创建一个新的环境，确保与原环境一致。

关系与总结

复杂性: conda env export 更加全面和可靠，适合重建相同的环境；而 conda list -e 更简单，适合快速记录包。
使用场景: 对于需要准确重建环境的情况，使用 freeze.yml 是更好的选择；而对于简单的包列表管理，requirements.txt 可能足够用。

因此，如果你的目标是确保环境的一致性，使用 conda env export 和 freeze.yml 是推荐的做法；如果只是想快速记录并安装一组包，requirements.txt 是一个方便的选择。

安装

在conda命令无效时使用pip命令来代替

while read requirement; do conda install --yes $requirement || pip install $requirement; done < requirements.txt

The double pipe (“||”) is a control operator that represents the logical OR operation. It is used to execute a command or series of commands only if the previous command or pipeline has failed or has returned a non-zero status code.

复制已有环境(fork)

conda create -n 新环境名称 --clone 原环境名称 --copy

虽然是完全复制，但是pip install -e安装的包会因为源文件的改动而失效

pip install -e 是用于在开发模式下安装 Python 包的命令，允许你在不复制包文件的情况下，将项目源代码直接安装到 Python 环境中，并保持源代码与环境中的包同步更新。这对于开发过程中频繁修改和测试代码非常有用。

以下是 pip install -e 的使用方法：pip install -e /path/to/project

详细解释：

/path/to/project：项目的根目录，通常包含 setup.py 文件。setup.py 文件定义了包的名称、依赖、入口点等信息。
-e 选项：表示“可编辑安装”（editable），意味着它不会复制项目文件到 Python 环境的 site-packages 目录，而是创建一个符号链接，指向原始项目路径。这样你可以在原路径下修改源代码，Python 环境中的包会实时反映这些修改。

通过 pip freeze 命令更好地查看

通过 pip freeze 命令更好地查看：

如果你想明确区分哪些包是通过 pip install -e 安装的，可以使用 pip freeze 命令。与 pip list 不同，pip freeze 会将包的版本和安装源显示出来。对于 -e（editable mode）安装的包，pip freeze 会有特殊标记。

运行以下命令：

pip freeze

输出示例：

-e git+https://github.com/example/project.git@abc123#egg=my_project
numpy==1.21.0
requests==2.25.1

在这里，带有 -e 标记的行表示这个包是通过 pip install -e 安装的，后面跟的是包的源代码路径（例如 Git 仓库 URL 或本地路径），而不是直接列出包的版本号。

输出解析：
-e 标记：表示这个包是以开发模式安装的。
普通包：对于直接通过 pip install 安装的包（不是开发模式），它们会以 包名==版本号 的形式列出。
git URL 或本地路径：开发模式下安装的包会指向源代码的路径，通常是 git 仓库 URL 或本地路径（如果是通过本地文件系统安装的）。

参考文献

https://blog.csdn.net/Mao_Jonah/article/details/89502380

ref ↩↩

2023年9月6日
分类于 Tutorials
需要 2 分钟阅读时间

Linux Terminal

导言

对程序员来说，一个好用、易用的terminal，就是和军人手上有把顺手的好枪一样。

基础知识

用户的环境变量和配置文件

在Linux系统中，用户的环境变量和配置文件可以在不同的节点生效。以下是这些文件的功能和它们生效的时机：

/etc/environment:
- 功能: 设置系统范围的环境变量。
- 生效时机: 在用户登录时读取，但不会执行shell命令。它主要用于设置变量，如PATH、LANG等。
/etc/profile:
- 功能: 为系统的每个用户设置环境信息。
- 生效时机: 当用户登录时，会读取并执行该文件中的配置。它是针对登录shell（例如，通过终端登录或ssh登录）的。
/etc/profile.d/:
- 功能: 存放多个脚本，这些脚本会被/etc/profile读取和执行。
- 生效时机: 与/etc/profile相同，登录shell时执行。它使得系统管理员可以将不同的配置分散到多个文件中管理。
/etc/bash.bashrc:
- 功能: 为所有用户设置bash shell的配置。
- 生效时机: 对于非登录shell（例如，打开一个新的终端窗口）时会读取并执行。
~/.profile:
- 功能: 为单个用户设置环境信息。
- 生效时机: 用户登录时读取并执行，主要针对登录shell。
~/.bashrc:
- 功能: 为单个用户配置bash shell的设置。
- 生效时机: 用户打开一个新的bash shell（非登录shell）时读取并执行。

总结：

/etc/environment 和 /etc/profile 主要用于系统范围的环境变量设置，前者不会执行shell命令，后者会执行。
/etc/profile.d/ 中的脚本作为 /etc/profile 的扩展，用于更灵活的管理配置。
/etc/bash.bashrc 适用于所有用户的bash配置，但只针对非登录shell。
~/.profile 和 ~/.bashrc 适用于单个用户，前者用于登录shell，后者用于非登录shell。

通过这些文件，系统和用户可以灵活地设置和管理环境变量和shell配置，以满足不同的需求和使用场景。

\n \r 回车换行

符号	ASCII码	意义
\n	10	换行NL：本义是光标往下一行（不一定到下一行行首），n的英文newline，控制字符可以写成LF，即Line Feed
\r	13	回车CR：本义是光标重新回到本行开头，r的英文return，控制字符可以写成CR，即Carriage Return

在不同的操作系统这几个字符表现不同：

在WIN系统下，这两个字符就是表现的本义，
在UNIX类系统，换行\n就表现为光标下一行并回到行首，
在MAC上，\r就表现为回到本行开头并往下一行，至于ENTER键的定义是与操作系统有关的。通常用的Enter是两个加起来。

\n: UNIX 系统行末结束符
\n\r: window 系统行末结束符
\r: MAC OS 系统行末结束符

终端命令行代理

在任意层级的SHELL配置文件里添加

export http_proxy=http://yourproxy:port
export https_proxy=http://yourproxy:port

写成bashrc的脚本命令

#YJH proxy
export proxy_addr=localhost
export proxy_http_port=7890
export proxy_socks_port=7890
function set_proxy() {
   export http_proxy=http://$proxy_addr:$proxy_http_port #如果使用git 不行，这两个http和https改成socks5就行
   export https_proxy=http://$proxy_addr:$proxy_http_port
   export all_proxy=socks5://$proxy_addr:$proxy_socks_port
   export no_proxy=127.0.0.1,.huawei.com,localhost,local,.local 
}
function unset_proxy() {
   unset http_proxy
   unset https_proxy
   unset all_proxy
}
function test_proxy() {
   curl -v -x http://$proxy_addr:$proxy_http_port https://www.google.com | egrep 'HTTP/(2|1.1) 200'
   # socks5h://$proxy_addr:$proxy_socks_port
}
# set_proxy # 如果要登陆时默认启用代理则取消注释这句

常用命令

check process create time

ps -eo pid,lstart,cmd |grep bhive
date

kill all process by name

 sudo ps -ef | grep 'bhive-re' | grep -v grep | awk '{print $2}' | sudo xargs -r kill -9

常见问题

鼠标滚轮输出乱码

滚轮乱码，是tmux set mouse on的原因

进入tmux后退出，并运行reset即可

sudo后找不到命令

当你使用sudo去执行一个程序时，处于安全的考虑，这个程序将在一个新的、最小化的环境中执行，也就是说，诸如PATH这样的环境变量，在sudo命令下已经被重置成默认状态了。

添加所需要的路径(如 /usr/local/bin）到/etc/sudoers文件"secure_path"下

Defaults    secure_path = /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin

在用python使用curses写多进程进度条的时候，混乱输出

解决办法如下：

stdscr = curses.initscr() # 不要设置为全局变量
# 而且 使用set_win unset_win 保持区域换行的行为

参考文献

2023年9月6日
分类于 Tutorials
需要 2 分钟阅读时间

Zsim-tlb: bug

bug

zsim-tlb simulate in icarus0

pinbin: build/opt/zsim.cpp:816: LEVEL_BASE::VOID VdsoCallPoint(LEVEL_VM::THREADID): Assertion `vdsoPatchData[tid].level' failed.
Pin app terminated abnormally due to signal 6.

locate error

VOID VdsoCallPoint(THREADID tid) {
    //level=0,invalid
    assert(vdsoPatchData[tid].level);
    vdsoPatchData[tid].level++;
    // info("vDSO internal callpoint, now level %d", vdsoPatchData[tid].level); //common
}

vDSO (virtual dynamic shared object) is a kernel machanism for exporting a carefully set kernel space routines (eg. not secret api, gettid() and gettimeofday()) to user spapce to eliminate the performance penalty of user-kernel mode switch according to wiki. vDSO
You can use some __vdso_getcpu() C library, and kernel will auto move it to user-space
vDSO overcome vsyscall(first linux-kernel machanism to accelerate syscall) drawback.
In zsim, vDSO have only four function enum VdsoFunc {VF_CLOCK_GETTIME, VF_GETTIMEOFDAY, VF_TIME, VF_GETCPU};

vDSO simulate part

// Instrumentation function, called for EVERY instruction
VOID VdsoInstrument(INS ins) {
    ADDRINT insAddr = INS_Address(ins); //get ins addr
    if (unlikely(insAddr >= vdsoStart && insAddr < vdsoEnd)) {
        //INS is vdso syscall
        if (vdsoEntryMap.find(insAddr) != vdsoEntryMap.end()) {
            VdsoFunc func = vdsoEntryMap[insAddr];
            //call VdsoEntryPoint function
            //argv are: tid ,func(IARG_UINT32),arg0(LEVEL_BASE::REG_RDI),arg1(LEVEL_BASE::REG_RSI) 
            INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) VdsoEntryPoint, IARG_THREAD_ID, IARG_UINT32, (uint32_t)func, IARG_REG_VALUE, LEVEL_BASE::REG_RDI, IARG_REG_VALUE, LEVEL_BASE::REG_RSI, IARG_END);
        } else if (INS_IsCall(ins)) {   //call instruction
            INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) VdsoCallPoint, IARG_THREAD_ID, IARG_END);
        } else if (INS_IsRet(ins)) {    //Ret instruction
            INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) VdsoRetPoint, IARG_THREAD_ID, IARG_REG_REFERENCE, LEVEL_BASE::REG_RAX /* return val */, IARG_END);
        }
    }

    //Warn on the first vsyscall code translation
    if (unlikely(insAddr >= vsyscallStart && insAddr < vsyscallEnd && !vsyscallWarned)) {
        warn("Instrumenting vsyscall page code --- this process executes vsyscalls, which zsim does not virtualize!");
        vsyscallWarned = true;
    }
}

INS_Address is from pin-kit, but INS_InsertCall is pin api.

try:

.level is just show the level of nested vsyscall. I think comment the assert which trigerd when callfunc before entryfunc is just fun.

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

上面回答部分来自ChatGPT-3.5，没有进行正确性的交叉校验。

无

2023年9月2日
分类于 diary
需要 1 分钟阅读时间

Diary 230827: 上海二次元之旅

缘由

华为实习要结束了，作为二次元，在中国秋叶原怎么能不好好逛逛呢？

目标

百联zx，外文书店，
百米香榭
迪美地下城（香港名街关门装修了
第一百货和新世界
大丸百货的4F的华漫潮玩
静安大悦城的间谍过家家的快闪点。
徐家汇的一楼jump店、龙猫店，二楼GSC店
mihoyo总部

爱上海

上海真是包容性极强的地方。原本内心对二次元的热爱，竟然这么多人也喜欢。不必隐藏，时刻伪装。可以暂时放松自我的感觉真好。

论对二次元人物的喜爱

爱的定义

爱或者热爱是最浓烈的情感。对象一般是可以交互的人物，物体说不定也可以。但是至少要能与他持续产生美好的回忆和点滴，来支持这份情感。

比如说，我一直想让自己能热爱我的工作，就需要创造小的阶段成功和胜利来支持自己走下去。

区分喜爱与贪恋美色

首先和对方待在一起很舒服，很喜欢陪伴的感觉，想长期走下去。
其实不是满脑子瑟瑟的想法
外表美肯定是加分项，但是更关注气质，想法和精神层面的东西。

三次元与二次元人物

三次元的人物包括偶像歌手，和演员。需要演出，演唱会来与粉丝共创回忆，演员也需要影视剧作品。

二次元人物大多数来自于动画，因为游戏一般不以刻画人物为目的，比如主机游戏当然galgame和二次元手游除外。

日本动画以远超欧美和国创的题材和人物的细腻刻画（不愧是galgame大国，Band Dream it’s my go到人物心里描写简直一绝）创造了许多令人喜爱的角色。

比较优势

表现能力的上限来看，动画也是远超游戏（不然游戏里为什么要动画CG）和真人影视剧的。
二次元人物的二次创作的低门槛（无论从还原难度还是法律约束上来说，毕竟三次元人物经常和真人强绑定）和舆论高包容性（传统二次元社区可比饭圈干净多了）都有远超三次元的优势。
此外二创Cosplay的平易近人或者说触手可及的真实感。二创能创造出远超原本作品的人物记忆和羁绊
另一点可能的是二创的低门槛带来的创作快乐，这一点在之前分析音乐的快乐有提到。二创主要有音乐，mmd，~~iwara动画~~
cos 可以让原本平凡的人生，染上对应角色不平凡经历的色彩
最后一点就是永恒性吧，第一点是之前我分析过人们喜欢在变化的生活中追求不变，或者相反。三次元人物或者演员会老去，但是二次元人物能在一部新剧场版下重现活力
另一点就是不会被背叛。

比较缺点

对于二次元角色的喜爱在时间的长河里是单向的，除开代入主角，很难收获二次元角色对自己的喜爱(这样看galgame稍微弥补了这点)。交流交互隔着次元的屏障。
成长可塑性的略微欠缺：如果作品已经完结了，除开少量二创，角色形象基本就确定了。除非输入到AI里训练，使之生命延续。
惊喜性缺失：真实人物是多面的，不可控的。但是二次元角色的反转特性只存在于剧集的剧情里。

初步结论

女朋友 > 喜欢二次元(连载 > 完结) >> 追星

图片轰炸

23.08.27 to do

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

上面回答部分来自ChatGPT-3.5，没有进行正确性的交叉校验。

无

2023年9月2日
分类于 diary
需要 1 分钟阅读时间

UnimportantView: Anime Recommendation

起源与目标

看番没有形成自己的喜好，导致看到不对的，反而有副作用，
什么番可以成为精神支柱，而不是看了之后。反而精神内耗更严重了。（看了happy sugar life后，直接抑郁了）

说明

不同于恋爱番，催泪番，这样的分类。其实我更在意作品想表达的主题，作者想展现给读者什么。无论是各种道理，还是就是某个环境，虚幻世界。

绊

羁绊：对人的爱，爱情、亲情、友情。

何为爱的寻爱之旅

番剧名	精神内核	评语	喜爱的角色	音乐
Happy Sugar Life	守护你是我的爱语	难以理解的爱的世界里，两位迷途少女相遇，救赎，领悟爱的蜜罐生活	砂糖、盐	金丝雀、ED、悲伤小提琴

我推的孩子（第一集）

Violet Garden

羁绊的破碎和重组

BanG Dream It's my go !!!!! 初羁绊(友情，百合，重女)的破碎和reunion

病名为爱

未来日记

家有女友、渣愿

点滴恋爱

百合类的成长：终将成为你，

我心危

轮回宿命类

跨越时空也无法阻止我爱你

命运石之门

RE0

无法抵达的简单幸福未来

寒蝉鸣泣之时

魔法少女小圆

史诗类

复杂、紧张的鸿篇巨制。多非单一的精神内核可以概括。多为群像剧。

奇幻、幻想世界史诗

Fate Zero

钢炼

EVA

to do

刀剑

四谎

CLANND

龙与虎

巨人

超炮

凉宫

鲁鲁修

轻音

补番列表

物语系列

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

上面回答部分来自ChatGPT-3.5，没有进行正确性的交叉校验。

无

2023年8月31日
分类于 operating system
需要 5 分钟阅读时间

TLB: real pagewalk overhead

简介

TLB的介绍，请看

页表相关

理论基础

大体上是应用访问越随机，数据量越大，pgw开销越大。

ISCA 2013 shows the pgw overhead in big memory servers.

Basu 等 - Efficient Virtual Memory for Big Memory Servers.pdf

or ISCA 2020 Guvenilir 和 Patt - 2020 - Tailored Page Sizes.pdf

机器配置

# shaojiemike @ snode6 in ~/github/hugoMinos on git:main x [11:17:05]
$ cpuid -1 -l 2
CPU:
      0x63: data TLB: 2M/4M pages, 4-way, 32 entries
            data TLB: 1G pages, 4-way, 4 entries
      0x03: data TLB: 4K pages, 4-way, 64 entries
      0x76: instruction TLB: 2M/4M pages, fully, 8 entries
      0xff: cache data is in CPUID leaf 4
      0xb5: instruction TLB: 4K, 8-way, 64 entries
      0xf0: 64 byte prefetching
      0xc3: L2 TLB: 4K/2M pages, 6-way, 1536 entries
# if above command turns out empty
cpuid -1 |grep TLB -A 10 -B 5
# will show sth like

L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
    instruction # entries     = 0x40 (64)
    instruction associativity = 0xff (255)
    data # entries            = 0x40 (64)
    data associativity        = 0xff (255)
L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx):
    instruction # entries     = 0x40 (64)
    instruction associativity = 0xff (255)
    data # entries            = 0x40 (64)
    data associativity        = 0xff (255)
L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax):
    instruction # entries     = 0x200 (512)
    instruction associativity = 2-way (2)
    data # entries            = 0x800 (2048)
    data associativity        = 4-way (4)
L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx):
    instruction # entries     = 0x200 (512)
    instruction associativity = 4-way (4)
    data # entries            = 0x800 (2048)
    data associativity        = 8-way (6)

OS config

default there is no hugopage(usually 4MB) to use.

$ cat /proc/meminfo | grep huge -i
AnonHugePages:      8192 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB

explained is here.

设置页表大小

other ways: change source code

way1: Linux transparent huge page (THP) support allows the kernel to automatically promote regular memory pages into huge pages, cat /sys/kernel/mm/transparent_hugepage/enabled but achieve this needs some details.
way2: Huge pages are allocated from a reserved pool which needs to change sys-config. for example echo 20 > /proc/sys/vm/nr_hugepages. And you need to write speacial C++ code to use the hugo page

# using mmap system call to request huge page
mount -t hugetlbfs \
    -o uid=<value>,gid=<value>,mode=<value>,pagesize=<value>,size=<value>,\
    min_size=<value>,nr_inodes=<value> none /mnt/huge

without recompile

But there is a blog using unmaintained tool hugeadm and iodlr library to do this.

sudo apt install libhugetlbfs-bin
sudo hugeadm --create-global-mounts
sudo hugeadm --pool-pages-min 2M:64

So meminfo is changed

$ cat /proc/meminfo | grep huge -i
AnonHugePages:      8192 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:      64
HugePages_Free:       64
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:          131072 kB

using iodlr library

git clone

应用测量

Measurement tools from code

# shaojiemike @ snode6 in ~/github/PIA_huawei on git:main x [17:40:50]
$ ./investigation/pagewalk/tlbstat -c '/staff/shaojiemike/github/sniper_PIMProf/PIMProf/gapbs/sssp.inj -f /staff/shaojiemike/github/sniper_PIMProf/PIMProf/gapbs/benchmark/kron-20.wsg -n1'
command is /staff/shaojiemike/github/sniper_PIMProf/PIMProf/gapbs/sssp.inj -f /staff/shaojiemike/github/sniper_PIMProf/PIMProf/gapbs/benchmark/kron-20.wsg -n1
K_CYCLES   K_INSTR      IPC DTLB_WALKS ITLB_WALKS K_DTLBCYC  K_ITLBCYC  DTLB% ITLB%
324088     207256      0.64 733758     3276       18284      130         5.64  0.04
21169730   11658340    0.55 11802978   757866     316625     24243       1.50  0.11

平均单次开销(开始到稳定)： dtlb miss read need 24~50 cycle ，itlb miss read need 40~27 cycle

案例的时间分布：

读数据开销占比不大，2.5%左右
pagerank等图应用并行计算时，飙升至 22%
bfs 最多就是 5%，没有那么随机的访问。
但是gemv 在65000 100000超内存前，即使是全部在计算，都是0.24%
原因：访存模式：图应用的访存模式通常是随机的、不规则的。它们不像矩阵向量乘法（gemv）等应用那样具有良好的访存模式，后者通常以连续的方式访问内存。连续的内存访问可以利用空间局部性，通过预取和缓存块的方式减少TLB缺失的次数。
github - GUOPS can achive 90%
DAMOV - ligra - pagerank can achive 90% in 20M input case

gemm

nomal gemm can achive 100% some situation
matrix too big can not be filled in cache, matrix2 access jump lines so always cache miss
O3 flag seems no time reduce, beacause there is no SIMD assembly in code
memory access time = pgw + tlb access time + load data 2 cache time

gemm

the gemm's core line is

for(int i=0; i<N; i++){
   // ignore the overflow, do not influence the running time.
   for(int j=0; j<N; j++){
      for(int l=0; l<N; l++){
            // gemm
            // ans[i * N + j] += matrix1[i * N + l] * matrix2[l * N + j];

            // for gemm sequantial
            ans[i * N + j] += matrix1[i * N + l] * matrix2[j * N + l];
      }
   }
}

and real time breakdown is as followed. to do

first need to perf get the detail time

bigJump

manual code to test if tlb entries is run out

$ ./tlbstat -c '../../test/manual/bigJump.exe 1 10 100'
command is ../../test/manual/bigJump.exe 1 10 100
K_CYCLES   K_INSTR      IPC DTLB_WALKS ITLB_WALKS K_DTLBCYC  K_ITLBCYC  DTLB% ITLB%
2002404    773981      0.39 104304528  29137      2608079    684        130.25  0.03

$ perf stat -e mem_uops_retired.all_loads -e mem_uops_retired.all_stores -e mem_uops_retired.stlb_miss_loads -e mem_uops_retired.stlb_miss_stores ./bigJump.exe 1 10 500
Number read from command line: 1 10 (N,J should not big, [0,5] is best.)
result 0
 Performance counter stats for './bigJump.exe 1 10 500':

          10736645      mem_uops_retired.all_loads
         532100339      mem_uops_retired.all_stores
             57715      mem_uops_retired.stlb_miss_loads
         471629056      mem_uops_retired.stlb_miss_stores

In this case, tlb miss rate up to 47/53 = 88.6%

Big bucket hash table

using big hash table

other apps

Any algorithm that does random accesses into a large memory region will likely suffer from TLB misses. Examples are plenty: binary search in a big array, large hash tables, histogram-like algorithms, etc.

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

上面回答部分来自ChatGPT-3.5，没有进行正确性的交叉校验。

无

2023年8月29日
分类于 Artificial Intelligence
需要 2 分钟阅读时间

AI Compiler

百度

秋招面试时遇到高铁柱前辈。问了相关的问题（对AI专业的人可能是基础知识）

nvcc编译器不好用吗？为什么要开发tvm之类的编译器？
答：首先，nvcc是类似与gcc, msvc(Microsoft Visual C++) 之类的传统的编译器，支持的是CUDA C/C++ 代码。
但是tvm编译器是张量编译器，支持的是python之类的代码，将其中的网络设计，编译拆解成各种算子，然后使用cudnn或者特定硬件的高效机器码来执行。

蔚来

数字信号处理器 (Digital signal processor)

HLO 简单理解为编译器 IR。

TVM介绍

https://tvm.apache.org

TVM解决的问题：
2017年，deploy Deep learning(TF,Pytorch) everywhere(hardware).
Before TVM,
1. 手动调优：loop tiling for locality.
2. operator fusion 算子融合。虽然性能高，但是部署不高效
编译优化思路引入深度学习
定义了算子描述到部署空间的映射。核心是感知调度空间，并且实现compute/schedule 分离
TVM当前的发展
上层计算图表示：NNVM Relay Relax
底层优化方式：manual -> AutoTVM(schedule最优参数的搜索，基于AI的cost model) -> Ansor(也不再需要手动写AutoTVM模版，使用模版规则生成代码)
TVM的额外工作
HeteroCL: TVM + FPGA

output Fusion
减少Global Memory Copy

把中间算子库替换成编译器？

暂时不好支持张量

AI自动调整变化来调优

自动调参。缺点：

需要人工写模版
人工导致解空间变小

随机各级循环应用优化策略（并行，循环展开，向量化

介绍了Ansor效果很好

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

无

2023年8月28日
分类于 Algorithms
需要 1 分钟阅读时间

Graph Algorithms: Pagerank

Pagerank

Network and social network can be identified as a weighted graph
How to do the important ranking is Pagerank

how to design a graph when pr executing memory access jump data-array very random?

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

上面回答部分来自ChatGPT-3.5，没有进行正确性的交叉校验。

https://zhuanlan.zhihu.com/p/137561088

2023年8月28日
分类于 Programming
需要 4 分钟阅读时间

Python: DataStructure

开发基础

size 大小

len(day)

空值判断

strings, lists, tuples

# Correct:
if not seq:
if seq:

# Wrong:
if len(seq):
if not len(seq):

中断捕获

try:
    # sth
except Exception as e:
    # 可以使用rich包
    pprint.pprint(list)
    raise e
finally:
    un_set()

for

间隔值

调参需要测试间隔值

for i in range(1, 101, 3):
    print(i)

遍历修改值

使用 enumerate 函数结合 for 循环遍历 list，以修改 list 中的元素。
enumerate 函数返回一个包含元组的迭代器，其中每个元组包含当前遍历元素的索引和值。在 for 循环中，我们通过索引 i 修改了列表中的元素。

# 对于 二维list appDataDict
baseline = appDataDict[0][0] # CPU Total
for i, line in enumerate(appDataDict):
    for j, entry in enumerate(line):
        appDataDict[i][j] = round(entry/baseline, 7)

itertools

itertools --- 为高效循环而创建迭代器的函数

for a,b,c in permutations((a,b,c)):

小数位

x = round(x,3)# 保留小数点后三位

String 字符串

%c  格式化字符及其ASCII码
%s  格式化字符串
%d  格式化整数
%u  格式化无符号整型
%o  格式化无符号八进制数
%x  格式化无符号十六进制数
%X  格式化无符号十六进制数（大写）
%f  格式化浮点数字，可指定小数点后的精度
%e  用科学计数法格式化浮点数
%E  作用同%e，用科学计数法格式化浮点数
%g  %f和%e的简写
%G  %F 和 %E 的简写
%p  用十六进制数格式化变量的地址

print("My name is %s and weight is %d kg!" % ('Zara', 21))

string <-> list

' '.join(pass_list) and pass_list.split(" ")

对齐"\n".join(["%-10s" % item for item in List_A])

开头判断

text = "Hello, world!"

if text.startswith("Hello"):
    print("The string starts with 'Hello'")
else:
    print("The string does not start with 'Hello'")

格式化

Python2.6 开始，通过 {} 和 : 来代替以前的 %

>>>"{} {}".format("hello", "world")    # 不设置指定位置，按默认顺序
'hello world'

>>> "{1} {0} {1}".format("hello", "world")  # 设置指定位置
'world hello world'

# 字符串补齐100位，<表示左对齐
variable = "Hello"
padded_variable = "{:<100}".format(variable)

数字处理

print("{:.2f}".format(3.1415926)) # 保留小数点后两位

{:>10d} 右对齐 (默认, 宽度为10)
{:^10d} 中间对齐 (宽度为10)

NumPy

布尔索引

保留 frame_indices 中的值小于 max_frame 的元素。

frame_indices = frame_indices[frame_indices < max_frame]

容器：List

https://www.runoob.com/python/python-lists.html

初始化以及访问

list = ['physics', 'chemistry', 1997, 2000]
list = []          ## 空列表
print(list[0])

切片

格式：[start_index:end_index:step]

不包括end_index的元素

二维数组

list_three = [[0 for i in range(3)] for j in range(3)]

//numpy 创建连续的，可自动向量化，线程并行
import numpy as np
# 创建一个 3x4 的数组且所有值全为 0
x3 = np.zeros((3, 4), dtype=int)
# 创建一个 3x4 的数组，然后将所有元素的值填充为 2
x5 = np.full((3, 4), 2, dtype=int)

排序

# take second element for sort
def takeSecond(elem):
    return elem[2]

LCData.sort(key=takeSecond)

# [1740, '黄业琦', 392, '第 196 场周赛'],
# [1565, '林坤贤', 458, '第 229 场周赛'],
# [1740, '黄业琦', 458, '第 229 场周赛'],
# [1509, '林坤贤', 460, '第 230 场周赛'],
# [1740, '黄业琦', 460, '第 230 场周赛'],
# [1779, '黄业琦', 558, '第 279 场周赛'],

对应元素相加到一个变量

tmp_list = [[],[],[],[]]
# 注意不需要右值赋值
[x.append(copy.deepcopy(entry)) for x,entry in zip(tmp_list, to_add)]

两个list对应元素相加

对于等长的

list1 = [1, 2, 3, 4, 5]
list2 = [6, 7, 8, 9, 10]

result = [x + y for x, y in zip(list1, list2)]
print(result)

如果两个列表的长度不同，你可以使用zip_longest()函数来处理它们。zip_longest()函数可以处理不等长的列表，并使用指定的填充值填充缺失的元素。

from itertools import zip_longest

list1 = [1, 2, 3, 4, 5]
list2 = [6, 7, 8]

result = [x + y for x, y in zip_longest(list1, list2, fillvalue=0)]
print(result)

如果是二维list

list1 = [[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]]

list2 = [[10, 11, 12],
        [13, 14, 15]]

rows = max(len(list1), len(list2))
cols = max(len(row) for row in list1 + list2)

result = [[0] * cols for _ in range(rows)]

for i in range(rows):
    for j in range(cols):
        if i < len(list1) and j < len(list1[i]):
            result[i][j] += list1[i][j]
        if i < len(list2) and j < len(list2[i]):
            result[i][j] += list2[i][j]

print(result)

# 将一个二维列表的所有元素除以一个数A
result = [[element / A for element in row] for row in list1]

容器：元组Tuple

元组和列表类似，但是不同的是元组不能修改，但可以对元组进行连接组合，元组使用小括号。
元组中只包含一个元素时，需要在元素后面添加逗号，否则括号会被当作运算符使用。

#创建元组
tup = (1, 2, 3, 4, 5)
tup1 = (23, 78);
tup2 = ('ab', 'cd')
tup3 = tup1 + tup2

容器：Dict

初始化

>>> tinydict = {'a': 1, 'b': 2, 'b': '3'}
>>> tinydict['b']
'3'

empty dict

a= {}
a=dict()

a_dict = {'color': 'blue'}
for key in a_dict:
 print(key)
# color
for key in a_dict:
    print(key, '->', a_dict[key])
# color -> blue
for item in a_dict.items():
    print(item)
# ('color', 'blue')
for key, value in a_dict.items():
 print(key, '->', value)
# color -> blue

key和value支持tuple元组，常用于保存函数入参与结果映射

类似c++ 的 pair<int,int>

class RoPE3D(nn.Module):

    def __init__(self, freq=10000.0, interpolation_scale=(1, 1, 1)):
        super().__init__()
        self.cache = {}

    def get_cos_sin(self, dim, seq_len, device, dtype, interpolation_scale=1):
        if (dim, seq_len, device, dtype) not in self.cache:
            # ...
            self.cache[dim, seq_len, device, dtype] = (cos, sin)
        return self.cache[dim, seq_len, device, dtype]

关于 if (dim, seq_len, device, dtype) not in self.cache 的条件判断
这里检查的是 整个元组作为字典的键 是否存在，而不是检查单个元素。
例如：字典 self.cache 的键是类似 (128, 100, "cuda", torch.float32) 的元组。
当且仅当 (dim, seq_len, device, dtype) 这个完整元组不在字典中时，条件为 True，才会执行后续代码。
如果元组中的任意一个元素不同（如 seq_len 变化），则会被视为不同的键。
关于 self.cache[...] = (cos, sin) 的赋值
这是一个 将元组作为键，元组作为值 的字典设计。
键是 (dim, seq_len, device, dtype) 的四元组，用于唯一标识一组计算参数。
值是对应的 (cos值张量, sin值张量) 的二元组，因为这两个张量总是成对出现。
例如：self.cache[(128, 100, "cuda", float32)] = (cos_tensor, sin_tensor)

进一步解释这种设计的目的：

缓存机制：避免重复计算相同参数的 cos/sin，不同参数组合会生成不同的键。
元组作为键的合法性：Python允许用不可变类型（如元组）作为字典键。
高效查询：通过参数元组直接定位到预计算结果，提升性能。

你可以通过一个简单例子验证：

cache = {}
key = (128, 100, "cuda", "float32")
cache[key] = ("cos", "sin")

print( (128, 100, "cuda", "float32") in cache )  # True（完全匹配）
print( (128, 200, "cuda", "float32") in cache )  # False（seq_len不同）

key 支持tuple元组, 但是不能json.dump()

bblHashDict[(tmpHigherHash,tmpLowerHash)]=tmpBBL

但是这样就不支持json.dump， json.dump() 无法序列化 Python 中元组(tuple)作为字典的 key，这会导致 json.dump() 函数在写入此类字典数据时会进入死循环或陷入卡住状态

删

del tinydict['Name']  # 删除键是'Name'的条目
tinydict.clear()      # 清空字典所有条目
del tinydict          # 删除字典

改

tinydict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}

tinydict['Age'] = 8 # 更新
tinydict['School'] = "RUNOOB" # 添加

合并

dict1 = {'a': 10, 'b': 8} 
dict2 = {'d': 6, 'c': 4} 

# dict2保留了合并的结果
dict2.update(dict1)
print(dict2)
{'d': 6, 'c': 4, 'a': 10, 'b': 8}

查

判断key 是否存在

以下是两种常用的方法：

方法一：使用in操作符: in操作符返回一个布尔值，True表示存在，False表示不存在。

Copy code
my_dict = {"key1": "value1", "key2": "value2", "key3": "value3"}

# 判断是否存在指定的键
if "key2" in my_dict:
    print("Key 'key2' exists in the dictionary.")
else:
    print("Key 'key2' does not exist in the dictionary.")

方法二：使用dict.get()方法: dict.get()方法在键存在时返回对应的值，不存在时返回None。根据需要选择适合的方法进行判断。

Copy code
my_dict = {"key1": "value1", "key2": "value2", "key3": "value3"}

# 判断是否存在指定的键
if my_dict.get("key2") is not None:
    print("Key 'key2' exists in the dictionary.")
else:
    print("Key 'key2' does not exist in the dictionary.")

这两种方法都可以用来判断字典中是否存在指定的键。

容器：set

无序不重复序列

初始化

a=  set() # 空set

thisset = set(("Google", "Runoob", "Taobao"))
>>> basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
>>> print(basket)                      # 这里演示的是去重功能

增

thisset.add("Facebook")

删

s.remove( x )
# 使用 discard() 移除元素
my_set.discard(3) # 如果元素不存在则什么也不做。也不会报错 KeyError
a.clear()

改

合并

x = {"apple", "banana", "cherry"}
y = {"google", "runoob", "apple"}

z = x.union(y) 

print(z)
# {'cherry', 'runoob', 'google', 'banana', 'apple'}

类型转换

list2set

setL=set(listV)

set2list

my_set = {'Geeks', 'for', 'geeks'}

s = list(my_set)
print(s)
# ['Geeks', 'for', 'geeks']

参考文献

https://blog.csdn.net/weixin_63719049/article/details/125680242

2023年8月28日
分类于 toLearn
需要 11 分钟阅读时间

Benchmark

导言

以现实中实际使用的应用为基础，根据其领域和应用计算特点来分类。