笔记¶

2022年5月6日
分类于 toLearn
需要 1 分钟阅读时间

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

无

2022年5月4日
分类于 Tutorials
需要 1 分钟阅读时间

Hugo

Hugo is a Go-based static site generator known for its speed and flexibility in 2013.
Hugo has set itself apart by being fast. More precisely, it has set itself apart by being much faster than Jekyll.
Jekyll uses Liquid as its templating language. Hugo uses Go templating. Most people seem to agree that it is a little bit easier to learn Jekyll’s syntax than Hugo’s.¹

2022年4月28日
分类于 thinking
需要 4 分钟阅读时间

Presentation & Visualization : PPT

导言

学术分享：

目标：让读者理解原理。

工作汇报：

目标：听者听懂、明白背景、工作的要点难点、明确阶段成果
流程：STAR法则组织。

2022年4月24日
分类于 network
需要 6 分钟阅读时间

Tcpdump & wireshark

命令行查看当前机器公网ip

> curl myip.ipip.net
当前 IP：117.136.101.72  来自于：中国 安徽   移动

检测机器端口开放

# 网页服务直接下载检查内容
wget 4.shaojiemike.top:28096
# -z 选项指示 nc 仅扫描打开的端口，而不发送任何数据，并且 -v 用于获取更多详细信息。
nc -z -v 4.shaojiemike.top 28096

或者扫描指定端口

# IPV6 也行
$ nmap -6 -p 8096 2001:da8:d800:611:5464:f7ab:9560:a646
Starting Nmap 7.80 ( https://nmap.org ) at 2023-01-04 19:33 CST
Nmap scan report for 2001:da8:d800:611:5464:f7ab:9560:a646
Host is up (0.00099s latency).

PORT     STATE SERVICE
8096/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.05 seconds

$ nmap -p 28096 4.shaojiemike.top
Starting Nmap 7.80 ( https://nmap.org ) at 2023-01-04 19:19 CST
Nmap scan report for 4.shaojiemike.top (114.214.181.97)
Host is up (0.0011s latency).

PORT      STATE SERVICE
28096/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.05 seconds

全部端口，但是会很慢。50分钟

sudo nmap -sT -p- 4.shaojiemike.top

wireshark

显示过滤

上方的过滤窗口

tcp.port==80&&(ip.dst==192.168.1.2||ip.dst==192.168.1.3)

ip.addr ==192.168.1.1 //显示所有目标或源地址是192.168.1.1的数据包
eth.addr== 80:f6:2e:ce:3f:00 //根据MAC地址过滤，详见“wireshark过滤MAC地址/物理地址”
tcp.port==23

捕捉过滤

抓包前在capture option中设置，仅捕获符合条件的包，可以避免产生较大的捕获文件和内存占用，但不能完整的复现测试时的网络环境。

host 192.168.1.1 //抓取192.168.1.1 收到和发出的所有数据包
src host 192.168.1.1 //源地址，192.168.1.1发出的所有数据包
dst host 192.168.1.1 //目标地址，192.168.1.1收到的所有数据包

color 含义

tcpdump

传统命令行抓包工具

常用参数

注意过滤规则间的and

-nn :
单个 n 表示不解析域名，直接显示 IP；
两个 n 表示不解析域名和端口。
方便查看 IP 和端口号，
不需要域名解析会非常高效。
-i 指定网卡 -D查看网卡
-v，-vv 和 -vvv 来显示更多的详细信息
port 80 抓取 80 端口上的流量，通常是 HTTP。在前面加src,dst限定词
tcpudmp -i eth0 -n arp host 192.168.199 抓取192.168.199.* 网段的arp协议包，arp可以换为tcp,udp等。
-A,-X,-xx会逐渐显示包内容更多信息
-e : 显示数据链路层信息。
默认情况下 tcpdump 不会显示数据链路层信息，使用 -e 选项可以显示源和目的 MAC 地址，以及 VLAN tag 信息。

输出说明

192.168.1.106.56166 > 124.192.132.54.80

ip 是 192.168.1.106，源端口是 56166，
目的地址是 124.192.132.54，目的端口是 80。
> 符号代表数据的方向。

Flags

常见的三次握手 TCP 报文的 Flags:

[S] : SYN（开始连接）
[.] : 没有 Flag
[P] : PSH（推送数据）
[F] : FIN （结束连接）
[R] : RST（重置连接）

常见用途

根据目的IP，筛选网络经过的网卡和端口
能抓各种协议的包比如ping，ssh

案例分析

curl --trace-ascii - www.github.com

github ip 为 20.205.243.166

ifconfig显示 ibs5的网卡有21TB的带宽上限，肯定是IB卡了。

sudo tcpdump -i ibs5 '((tcp) and (host 20.205.243.166))'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ibs5, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
15:53:53.848619 IP snode0.59878 > 20.205.243.166.http: Flags [S], seq 879685062, win 64128, options [mss 2004,sackOK,TS val 4096492456 ecr 0,nop,wscale 7], length 0
15:53:53.952705 IP 20.205.243.166.http > snode0.59878: Flags [S.], seq 1917452372, ack 879685063, win 65535, options [mss 1436,sackOK,TS val 1127310087 ecr 4096492456,nop,wscale 10], length 0
15:53:53.952728 IP snode0.59878 > 20.205.243.166.http: Flags [.], ack 1, win 501, options [nop,nop,TS val 4096492560 ecr 1127310087], length 0
15:53:53.953208 IP snode0.59878 > 20.205.243.166.http: Flags [P.], seq 1:79, ack 1, win 501, options [nop,nop,TS val 4096492561 ecr 1127310087], length 78: HTTP: GET / HTTP/1.1
15:53:54.058654 IP 20.205.243.166.http > snode0.59878: Flags [P.], seq 1:89, ack 79, win 64, options [nop,nop,TS val 1127310193 ecr 4096492561], length 88: HTTP: HTTP/1.1 301 Moved Permanently
15:53:54.058668 IP snode0.59878 > 20.205.243.166.http: Flags [.], ack 89, win 501, options [nop,nop,TS val 4096492666 ecr 1127310193], length 0
15:53:54.059092 IP snode0.59878 > 20.205.243.166.http: Flags [F.], seq 79, ack 89, win 501, options [nop,nop,TS val 4096492667 ecr 1127310193], length 0
15:53:54.162608 IP 20.205.243.166.http > snode0.59878: Flags [F.], seq 89, ack 80, win 64, options [nop,nop,TS val 1127310297 ecr 4096492667], length 0

$ sudo tcpdump -i ibs5 -nn -vvv -e '((port 80) and (tcp) and (host 20.205.243.166))'                                                                                                                                                 tcpdump: listening on ibs5, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
16:09:38.743478 Out ethertype IPv4 (0x0800), length 76: (tos 0x0, ttl 64, id 15215, offset 0, flags [DF], proto TCP (6), length 60)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [S], cksum 0x1fd5 (incorrect -> 0x98b6), seq 1489092902, win 64128, options [mss 2004,sackOK,TS val 4097437351 ecr 0,nop,wscale 7], length 0
16:09:38.848164  In ethertype IPv4 (0x0800), length 76: (tos 0x0, ttl 48, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    20.205.243.166.80 > 10.1.13.50.38376: Flags [S.], cksum 0x69ba (correct), seq 3753100548, ack 1489092903, win 65535, options [mss 1436,sackOK,TS val 3712395681 ecr 4097437351,nop,wscale 10], length 0
16:09:38.848212 Out ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 15216, offset 0, flags [DF], proto TCP (6), length 52)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [.], cksum 0x1fcd (incorrect -> 0x9613), seq 1, ack 1, win 501, options [nop,nop,TS val 4097437456 ecr 3712395681], length 0
16:09:38.848318 Out ethertype IPv4 (0x0800), length 146: (tos 0x0, ttl 64, id 15217, offset 0, flags [DF], proto TCP (6), length 130)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [P.], cksum 0x201b (incorrect -> 0x9f0a), seq 1:79, ack 1, win 501, options [nop,nop,TS val 4097437456 ecr 3712395681], length 78: HTTP, length: 78
        GET / HTTP/1.1
        Host: www.github.com
        User-Agent: curl/7.68.0
        Accept: */*

16:09:38.954152  In ethertype IPv4 (0x0800), length 156: (tos 0x0, ttl 48, id 45056, offset 0, flags [DF], proto TCP (6), length 140)
    20.205.243.166.80 > 10.1.13.50.38376: Flags [P.], cksum 0x024d (correct), seq 1:89, ack 79, win 64, options [nop,nop,TS val 3712395786 ecr 4097437456], length 88: HTTP, length: 88
        HTTP/1.1 301 Moved Permanently
        Content-Length: 0
        Location: https://www.github.com/

16:09:38.954207 Out ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 15218, offset 0, flags [DF], proto TCP (6), length 52)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [.], cksum 0x1fcd (incorrect -> 0x949a), seq 79, ack 89, win 501, options [nop,nop,TS val 4097437562 ecr 3712395786], length 0
16:09:38.954884 Out ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 15219, offset 0, flags [DF], proto TCP (6), length 52)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [F.], cksum 0x1fcd (incorrect -> 0x9498), seq 79, ack 89, win 501, options [nop,nop,TS val 4097437563 ecr 3712395786], length 0
16:09:39.060177  In ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 48, id 45057, offset 0, flags [DF], proto TCP (6), length 52)
    20.205.243.166.80 > 10.1.13.50.38376: Flags [F.], cksum 0x95e2 (correct), seq 89, ack 80, win 64, options [nop,nop,TS val 3712395892 ecr 4097437563], length 0
16:09:39.060221 Out ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 15220, offset 0, flags [DF], proto TCP (6), length 52)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [.], cksum 0x1fcd (incorrect -> 0x93c4), seq 80, ack 90, win 501, options [nop,nop,TS val 4097437668 ecr 3712395892], length 0
16:09:46.177269 Out ethertype IPv4 (0x0800), length 76: (tos 0x0, ttl 64, id 38621, offset 0, flags [DF], proto TCP (6), length 60)

snode0 ip 是 10.1.13.50

traceroute

mtr = traceroute+ping

$ traceroute www.baid.com
traceroute to www.baidu.com (182.61.200.6), 30 hops max, 60 byte packets                                                                                                                                                           
1  acsa-nfs (10.1.13.1)  0.179 ms  0.180 ms  0.147 ms                                                                                                                                                                            
2  192.168.252.1 (192.168.252.1)  2.016 ms  1.954 ms  1.956 ms                                                                                                                                                                   
3  202.38.75.254 (202.38.75.254)  4.942 ms  3.941 ms  4.866 ms

traceroute命令用于显示数据包到主机间的路径。

NETWORKMANAGER 管理

# shaojiemike @ snode0 in /etc/NetworkManager [16:49:55]
$ nmcli general status
STATE         CONNECTIVITY  WIFI-HW  WIFI     WWAN-HW  WWAN
disconnected  unknown       enabled  enabled  enabled  enabled

# shaojiemike @ snode0 in /etc/NetworkManager [16:50:40]
$ nmcli connection show
NAME                     UUID                                  TYPE        DEVICE
InfiniBand connection 1  7edf4eea-0591-48ba-868a-e66e8cb720ce  infiniband  --

好像之前使用过的样子。

# shaojiemike @ snode0 in /etc/NetworkManager [16:56:36] C:127
$ service network-manager status
● NetworkManager.service - Network Manager
     Loaded: loaded (/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-03-14 11:52:06 CST; 1 months 10 days ago
       Docs: man:NetworkManager(8)
   Main PID: 1339 (NetworkManager)
      Tasks: 3 (limit: 154500)
     Memory: 12.0M
     CGroup: /system.slice/NetworkManager.service
             └─1339 /usr/sbin/NetworkManager --no-daemon

Warning: some journal files were not opened due to insufficient permissions.

应该是这个 Secure site-to-site connection with Linux IPsec VPN 来设置的

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

FJW说所有网络都是通过NFS一起出去的

参考文献

无

2022年4月24日
分类于 Tutorials
需要 1 分钟阅读时间

Servers

通过IPMI芯片的静态IP远程重启和配置机器

https://cloud.tencent.com/developer/article/1448642

Group

当前组

shaojiemike@snode6:~$ groups shaojiemike
shaojiemike : staff sudo

所有组

cat /etc/group

User

whoami

一般用户位置

/etc/passwd

LDAP教程

如果发现自己不在/etc/passwd里，很可能使用了ldap 集中身份认证。可以在多台机器上实现分布式账号登录，用同一个账号。

 getent passwd

first reboot server

ctrl + alt + F3     #jump into command line
login
su - {user-name}
sudo -s
sudo -i
# If invoked without a user name, su defaults to becoming the superuser
ip a |less          #check ip address fjw弄了静态IP就没这个问题了

限制当前shell用户爆内存

宕机一般是爆内存，进程分配肯定会注意不超过物理核个数。

在zshrc里写入 25*1024*1024 = 25GB的内存上限

ulimit -v 26214400

当前shell程序超内存，会输出Memory Error结束。

测试读取200GB大文件到内存

with open("/home/shaojiemike/test/DynamoRIO/OpenBLASRawAssembly/openblas_utest.log", 'r') as f:
    data= f.readlines()
    print(len(data))

有文章说Linux有些版本内核会失效

2022年4月13日
分类于 Programming
需要 6 分钟阅读时间

PyTorchGeometric

PyTorch Geometric Liberty

PyG是一个基于PyTorch的用于处理不规则数据（比如图）的库，或者说是一个用于在图等数据上快速实现表征学习的框架。它的运行速度很快，训练模型速度可以达到DGL（Deep Graph Library ）v0.2 的40倍（数据来自论文）。除了出色的运行速度外，PyG中也集成了很多论文中提出的方法（GCN,SGC,GAT,SAGE等等）和常用数据集。因此对于复现论文来说也是相当方便。

经典的库才有函数可以支持，自己的模型，自己根据自动微分实现。还要自己写GPU并行。

MessagePassing 是网络交互的核心

数据

数据怎么存储

torch_geometric.data.Data (下面简称Data) 用于构建图

每个节点的特征 x
形状是[num_nodes, num_node_features]。
节点之间的边 edge_index
形状是 [2, num_edges]
节点的标签 y
假如有。形状是[num_nodes, *]
边的特征 edge_attr
[num_edges, num_edge_features]

数据支持自定义

通过data.face来扩展Data

获取数据

在 PyG 中，我们使用的不是这种写法，而是在get()函数中根据 index 返回torch_geometric.data.Data类型的数据，在Data里包含了数据和 label。

数据处理的例子

由于是无向图，因此有 4 条边：(0 -> 1), (1 -> 0), (1 -> 2), (2 -> 1)。每个节点都有自己的特征。上面这个图可以使用 torch_geometric.data.Data来表示如下：

import torch
from torch_geometric.data import Data
# 由于是无向图，因此有 4 条边：(0 -> 1), (1 -> 0), (1 -> 2), (2 -> 1)
edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
# 节点的特征                         
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index)

注意edge_index中边的存储方式，有两个list，第 1 个list是边的起始点，第 2 个list是边的目标节点。注意与下面的存储方式的区别。

import torch
from torch_geometric.data import Data

edge_index = torch.tensor([[0, 1],
                           [1, 0],
                           [1, 2],
                           [2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index.t().contiguous())

这种情况edge_index需要先转置然后使用contiguous()方法。关于contiguous()函数的作用，查看 PyTorch中的contiguous。

数据集

Dataset

import torch
from torch_geometric.data import InMemoryDataset


class MyOwnDataset(InMemoryDataset): # or (Dataset)
    def __init__(self, root, transform=None, pre_transform=None):
        super(MyOwnDataset, self).__init__(root, transform, pre_transform)
        self.data, self.slices = torch.load(self.processed_paths[0])

    # 返回一个包含没有处理的数据的名字的list。如果你只有一个文件，那么它返回的list将只包含一个元素。事实上，你可以返回一个空list，然后确定你的文件在后面的函数process()中。
    @property
    def raw_file_names(self):
        return ['some_file_1', 'some_file_2', ...]

    # 很像上一个函数，它返回一个包含所有处理过的数据的list。在调用process()这个函数后，通常返回的list只有一个元素，它只保存已经处理过的数据的名字。
    @property
    def processed_file_names(self):
        return ['data.pt']

    def download(self):
        pass
        # Download to `self.raw_dir`. or just pass

    # 整合你的数据成一个包含data的list。然后调用 self.collate()去计算将用DataLodadr的片段。
    def process(self):
        # Read data into huge `Data` list.
        data_list = [...]

        if self.pre_filter is not None:
            data_list [data for data in data_list if self.pre_filter(data)]

        if self.pre_transform is not None:
            data_list = [self.pre_transform(data) for data in data_list]

        data, slices = self.collate(data_list)
        torch.save((data, slices), self.processed_paths[0])

DataLoader

DataLoader 这个类允许你通过batch的方式feed数据。创建一个DotaLoader实例，可以简单的指定数据集和你期望的batch size。

loader = DataLoader(dataset, batch_size=512, shuffle=True)

DataLoader的每一次迭代都会产生一个Batch对象。它非常像Data对象。但是带有一个‘batch’属性。它指明了了对应图上的节点连接关系。因为DataLoader聚合来自不同图的的batch的x,y 和edge_index，所以GNN模型需要batch信息去知道那个节点属于哪一图。

for batch in loader:
    batch
    >>> Batch(x=[1024, 21], edge_index=[2, 1568], y=[512], batch=[1024])

MessagePassing(核心)

其中，x 表示表格节点的 embedding，e 表示边的特征，ϕ 表示 message 函数，□ 表示聚合 aggregation 函数，γ 表示 update 函数。上标表示层的 index，比如说，当 k = 1 时，x 则表示所有输入网络的图结构的数据。

为了实现这个，我们需要定义：

message
定义了对于每个节点对 (xi,xj)，怎样生成信息（message）。
update
aggregation scheme
propagate(edge_index, size=None, **kwargs)
这个函数最终会按序调用 message、aggregate 和 update 函数。
update(aggr_out, **kwargs)
这个函数利用聚合好的信息（message）更新每个节点的 embedding。

propagate(edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], size: Optional[Tuple[int, int]] = None, **kwargs)

edge_index (Tensor or SparseTensor)
输入的边的信息，定义底层图形连接/消息传递流。
torch.LongTensor类型
1. its shape must be defined as [2, num_messages], where messages from nodes in edge_index[0] are sent to nodes in edge_index[1]
torch_sparse.SparseTensor类型
1. its sparse indices (row, col) should relate to row = edge_index[1] and col = edge_index[0].
也不一定是方形节点矩阵。x=(x_N, x_M).

MessagePassing.message(...)

会根据 flow=“source_to_target”和if flow=“target_to_source”或者x_i,x_j,来区分处理的边。

x_j表示提升张量，它包含每个边的源节点特征，即每个节点的邻居。通过在变量名后添加_i或_j，可以自动提升节点特征。事实上，任何张量都可以通过这种方式转换，只要它们包含源节点或目标节点特征。

_j表示每条边的起点，_i表示每条边的终点。x_j表示的就是每条边起点的x值（也就是Feature）。如果你手动加了别的内容，那么它的_j, _i也会自动进行处理，这个自己稍微单步执行一下就知道了

在实现message的时候，节点特征会自动map到各自的source and target nodes。

aggregate(inputs: torch.Tensor, index: torch.Tensor, ptr: Optional[torch.Tensor] = None, dim_size: Optional[int] = None, aggr: Optional[str] = None) → torch.Tensor

aggregation scheme 只需要设置参数就好，“add”, “mean”, “min”, “max” and “mul” operations

MessagePassing.update(aggr_out, ...)

aggregation 输出作为第一个参数，后面的参数是 propagate()的

实现GCN 例子

\[ \mathbf{x}_i^{(k)} = \sum_{j \in \mathcal{N}(i) \cup \{ i \}} \frac{1}{\sqrt{\deg(i)} \cdot \sqrt{\deg(j)}} \cdot \left( \mathbf{\Theta}^{\top} \cdot \mathbf{x}_j^{(k-1)} \right) \]

该式子先将周围的节点与权重矩阵\theta相乘, 然后通过节点的度degree正则化，最后相加

步骤可以拆分如下

添加self-loop 到邻接矩阵（Adjacency Matrix）。
节点特征的线性变换。
计算归一化系数
Normalize 节点特征。
sum相邻节点的feature（“add”聚合）。

步骤1 和 2 需要在message passing 前被计算好。 3 - 5 可以torch_geometric.nn.MessagePassing 类。

添加self-loop的目的是让featrue在聚合的过程中加入当前节点自己的feature，没有self-loop聚合的就只有邻居节点的信息。

import torch
from torch_geometric.nn import MessagePassing
from torch_geometric.utils import add_self_loops, degree

class GCNConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super().__init__(aggr='add')  # "Add" aggregation (Step 5).
        self.lin = torch.nn.Linear(in_channels, out_channels)

    def forward(self, x, edge_index):
        # x has shape [N, in_channels]
        # edge_index has shape [2, E]

        # Step 1: Add self-loops to the adjacency matrix.
        edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))

        # Step 2: Linearly transform node feature matrix.
        x = self.lin(x)

        # Step 3: Compute normalization.
        row, col = edge_index
        deg = degree(col, x.size(0), dtype=x.dtype)
        deg_inv_sqrt = deg.pow(-0.5)
        deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
        norm = deg_inv_sqrt[row] * deg_inv_sqrt[col]

        # Step 4-5: Start propagating messages.
        return self.propagate(edge_index, x=x, norm=norm)

    def message(self, x_j, norm):
        # x_j has shape [E, out_channels]

        # Step 4: Normalize node features.
        return norm.view(-1, 1) * x_j

所有的逻辑代码都在forward()里面，当我们调用propagate()函数之后，它将会在内部调用message()和update()。

使用 GCN 的例子

conv = GCNConv(16, 32)
x = conv(x, edge_index)

SAGE的例子

聚合函数（aggregation）我们用最大池化（max pooling），这样上述公示中的 AGGREGATE 可以写为：上述公式中，对于每个邻居节点，都和一个 weighted matrix 相乘，并且加上一个 bias，传给一个激活函数。相关代码如下(对应第二个图)：

class SAGEConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super(SAGEConv, self).__init__(aggr='max')
        self.lin = torch.nn.Linear(in_channels, out_channels)
        self.act = torch.nn.ReLU()

    def message(self, x_j):
        # x_j has shape [E, in_channels]

        x_j = self.lin(x_j)
        x_j = self.act(x_j)

        return x_j

对于 update 方法，我们需要聚合更新每个节点的 embedding，然后加上权重矩阵和偏置(对应第一个图第二行)：

class SAGEConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        self.update_lin = torch.nn.Linear(in_channels + out_channels, in_channels, bias=False)
        self.update_act = torch.nn.ReLU()

    def update(self, aggr_out, x):
        # aggr_out has shape [N, out_channels]

        new_embedding = torch.cat([aggr_out, x], dim=1)
        new_embedding = self.update_lin(new_embedding)
        new_embedding = torch.update_act(new_embedding)

        return new_embedding

综上所述，SageConv 层的定于方法如下：

import torch
from torch.nn import Sequential as Seq, Linear, ReLU
from torch_geometric.nn import MessagePassing
from torch_geometric.utils import remove_self_loops, add_self_loops
class SAGEConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super(SAGEConv, self).__init__(aggr='max') #  "Max" aggregation.
        self.lin = torch.nn.Linear(in_channels, out_channels)
        self.act = torch.nn.ReLU()
        self.update_lin = torch.nn.Linear(in_channels + out_channels, in_channels, bias=False)
        self.update_act = torch.nn.ReLU()

    def forward(self, x, edge_index):
        # x has shape [N, in_channels]
        # edge_index has shape [2, E]

        # Removes every self-loop in the graph given by edge_index, so that (i,i)∉E for every i ∈ V.
        edge_index, _ = remove_self_loops(edge_index)
        # Adds a self-loop (i,i)∈ E to every node i ∈ V in the graph given by edge_index
        edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))


        return self.propagate(edge_index, size=(x.size(0), x.size(0)), x=x)

    def message(self, x_j):
        # x_j has shape [E, in_channels]

        x_j = self.lin(x_j)
        x_j = self.act(x_j)

        return x_j

    def update(self, aggr_out, x):
        # aggr_out has shape [N, out_channels]


        new_embedding = torch.cat([aggr_out, x], dim=1)

        new_embedding = self.update_lin(new_embedding)
        new_embedding = self.update_act(new_embedding)

        return new_embedding

batch的实现

GNN的batch实现和传统的有区别。

zzq的观点

将网络复制batch次，batchSize的数据产生batchSize个Loss。通过Sum或者Max处理Loss，整体同时更新所有的网络参数。至于网络中循环输入和输出的H^(t-1)和Ht。（感觉直接平均就行了。

有几个可能的问题 1. 网络中参数不是线性层，CNN这种的网络。pytorch会自动并行吗？还需要手动 2. 还有个问题，如果你还想用PyG的X和edge。并不能额外拓展维度。

图像和语言处理领域的传统基本思路：

通过 rescaling or padding(填充) 将相同大小的网络复制，来实现新添加维度。而新添加维度的大小就是batch_size。

但是由于图神经网络的特殊性：边和节点的表示。传统的方法要么不可行，要么会有数据的重复表示产生的大量内存消耗。

ADVANCED MINI-BATCHING in PyG

为此引入了ADVANCED MINI-BATCHING来实现对大量数据的并行。

https://pytorch-geometric.readthedocs.io/en/latest/notes/batching.html

实现：

邻接矩阵以对角线的方式堆叠(创建包含多个孤立子图的巨大图)
节点和目标特征只是在节点维度中串联???

优势

依赖message passing 方案的GNN operators不需要修改，因为消息仍然不能在属于不同图的两个节点之间交换。
没有计算或内存开销。例如，此batching 过程完全可以在不填充节点或边特征的情况下工作。请注意，邻接矩阵没有额外的内存开销，因为它们以稀疏方式保存，只保存非零项，即边。

torch_geometric.loader.DataLoader

可以实现将多个图batch成一个大图。通过重写collate()来实现，并继承了pytorch的所有参数，比如num_workers.

在合并的时候，除开edge_index [2, num_edges]通过增加第二维度。其余（节点）都是增加第一维度的个数。

最重要的作用

# 原本是[2*4]
# 自己实现的话，是直接连接
 >>> tensor([[0, 0, 1, 1, 0, 0, 1, 1],
             [0, 1, 1, 2, 0, 1, 1, 2]])
# 会修改成新的边
 print(batch.edge_index)
 >>> tensor([[0, 0, 1, 1, 2, 2, 3, 3],
             [0, 1, 1, 2, 3, 4, 4, 5]])

torch_geometric.loader.DataLoader 例子1

from torch_geometric.data import Data
from torch_geometric.loader import DataLoader

data_list = [Data(...), ..., Data(...)]
loader = DataLoader(data_list, batch_size=32)

torch_geometric.loader.DataLoader 例子2

from torch_geometric.datasets import TUDataset
from torch_geometric.loader import DataLoader

dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES', use_node_attr=True)
loader = DataLoader(dataset, batch_size=32, shuffle=True)

for batch in loader:
    batch
    >>> DataBatch(batch=[1082], edge_index=[2, 4066], x=[1082, 21], y=[32])

    batch.num_graphs
    >>> 32

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

无

2022年4月13日
分类于 Artificial Intelligence
需要 2 分钟阅读时间

GNN

图神经网络（Graph Neural Networks，GNN）以及特点

GNN可以分析对象之间的关系，来实现精准的推荐
问题
因为图是不规则的，每个图都有一个大小可变的无序节点，图中的每个节点都有不同数量的相邻节点，导致卷积等操作不适合图。
现有深度学习算法的一个核心假设是数据样本之间彼此独立。对于图来说，每个数据样本（节点）都会有边与图中其他实数据样本（节点）相关，这些信息可用于捕获实例之间的相互依赖关系。

图嵌入 & 网络嵌入

图神经网络的研究与图嵌入（对图嵌入不了解的读者可以参考我的这篇文章《图嵌入综述》）或网络嵌入密切相关。

真实的图（网络）往往是高维、难以处理的，图嵌入的目标是发现高维图的低维向量表示。

图分析任务

节点分类，
链接预测，
聚类，
可视化

图神经网络分类

图卷积网络（Graph Convolution Networks，GCN）
图注意力网络（Graph Attention Networks）
图注意力网络（GAT）是一种基于空间的图卷积网络，它的注意机制是在聚合特征信息时，将注意机制用于确定节点邻域的权重。
图自编码器（ Graph Autoencoders）
图生成网络（ Graph Generative Networks）
图时空网络（Graph Spatial-temporal Networks）。

https://mp.weixin.qq.com/s/PSrgm7frsXIobSrlcoCWxw

https://zhuanlan.zhihu.com/p/142948273

https://developer.huaweicloud.com/hero/forum.php?mod=viewthread&tid=109580

2022年3月31日
分类于 Tutorials
需要 2 分钟阅读时间

https://www.cnblogs.com/dufeixiang/p/11624210.html

改shell

复杂还有bug,我还是改profile吧

https://ibug.io/blog/2022/03/linux-openldap-server/#user-chsh

挂载

挂在同一个地方，肯定是一样的

# shaojiemike @ snode2 in ~ [20:18:20]
$ df -h .
Filesystem       Size  Used Avail Use% Mounted on
10.1.13.1:/home   15T   11T  3.1T  78% /staff

# shaojiemike @ snode0 in ~ [20:25:51]
$ mount|grep staff
10.1.13.1:/home on /staff type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.13.50,local_lock=none,addr=10.1.13.1)

tmpfs是磁盘里的虚拟内存的意思。

设置

具体设置要登录到中央机器上去

# shaojiemike @ hades1 in ~ [20:41:06]
$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 hades1
# 222.195.72.30 hades0
# 202.38.72.64 hades1
# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

114.214.198.26  synology
10.1.13.1       acsa-nfs
10.1.13.6       discovery
10.1.13.50      snode0
10.1.13.51      snode1
10.1.13.52      snode2
10.1.13.53      snode3
10.1.13.54      snode4
10.1.13.55      snode5
10.1.13.56      snode6
10.1.13.114     swabl
10.1.13.119     node19
10.1.13.102     node2
10.1.13.58      hades0
10.1.13.57      hades1

# shaojiemike @ snode0 in ~ [20:36:26]
$ sudo cat /etc/nslcd.conf
# /etc/nslcd.conf
# nslcd configuration file. See nslcd.conf(5)
# for details.

# The user and group nslcd should run as.
uid nslcd
gid nslcd

# The location at which the LDAP server(s) should be reachable.
uri ldaps://ldap.swangeese.fun

需要进一步的研究学习

总共涉及几台机器

2022年3月30日
分类于 Tutorials
需要 1 分钟阅读时间

2022年3月30日
分类于 Tutorials
需要 5 分钟阅读时间

Python MPI

全局解释器锁（GIL,Global Interpreter Lock)

Python代码的执行由Python虚拟机（解释器）来控制。

对Python虚拟机的访问由全局解释器锁（GIL）来控制，正是这个锁能保证同时只有一个线程在运行。所以就会出现尽管你设置了多线程的任务，但是只能跑一个的情况。

但是I/O密集的程序(爬虫)相对好一点，因为I/O操作会调用内建的操作系统C代码，所以这时会释放GIL锁，达到部分多线程的效果。

通常我们用的解释器是官方实现的CPython，要真正利用多核，除非重写一个不带GIL的解释器。