教學課程

在這個學習筆記中，我們將概覽 scikit-network，一個開源的 Python 套件，可協助您在圖形上進行機器學習。

1. 安裝

安裝 scikit-network 最簡單的方法就是透過 pip。

[1]:

# uncomment to install
# !pip install scikit-network

2. 數據

鄰接矩陣

每個包含 \(n\) 個節點的圖形都使用一個大小為 \(n \times n\) 的方形鄰接矩陣 \(A\) 表示。最簡單的形式下，矩陣 \(A\) 是二元的，且只有在節點 \(i\) 與節點 \(j\) 之間有連結時，\(A_{ij} =1\)。如果圖形無導向，矩陣 \(A\) 是對稱的。

[2]:

import numpy as np

[3]:

adjacency = np.array([[0, 1, 1, 1, 0], [1, 0, 0, 0, 1], [1, 0, 0, 1, 0], [1, 0, 1, 0, 1], [0, 1, 0, 1, 0]])

[4]:

adjacency

[4]:

array([[0, 1, 1, 1, 0],
       [1, 0, 0, 0, 1],
       [1, 0, 0, 1, 0],
       [1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0]])

稀疏矩陣

在 scikit-network 中，鄰接矩陣使用 scipy 的稀疏 CSR 格式表示。

[5]:

from scipy import sparse

[6]:

adjacency = sparse.csr_matrix(adjacency)

[7]:

adjacency

[7]:

<5x5 sparse matrix of type '<class 'numpy.int64'>'
        with 12 stored elements in Compressed Sparse Row format>

視覺化

以下是上述圖形的視覺化。

[8]:

from IPython.display import SVG
from sknetwork.visualization import visualize_graph

[9]:

# name nodes by indices
n_nodes, _ = adjacency.shape
names = np.arange(n_nodes)

[10]:

# visualization
SVG(visualize_graph(adjacency, names=names))

[10]:

../../_images/tutorials_overview_get_started_16_0.svg

有向圖

鄰接矩陣不一定對稱。可能從節點 \(i\) 到節點 \(j\) 有連線，但從節點 \(j\) 到節點 \(i\) 沒有連線。

[11]:

adjacency = np.array([[0, 1, 1, 1, 0], [0, 0, 0, 1, 1], [0, 1, 0, 0, 0], [1, 0, 0, 0, 1], [0, 1, 0, 0, 0]])

[12]:

adjacency

[12]:

array([[0, 1, 1, 1, 0],
       [0, 0, 0, 1, 1],
       [0, 1, 0, 0, 0],
       [1, 0, 0, 0, 1],
       [0, 1, 0, 0, 0]])

[13]:

adjacency = sparse.csr_matrix(adjacency)

[14]:

# visualization
SVG(visualize_graph(adjacency, names=names))

[14]:

../../_images/tutorials_overview_get_started_21_0.svg

二部圖

二部圖是在兩組節點間有邊緣的特殊圖形。二部圖可以用其雙鄰接矩陣 \(B\) 來表示，\(B\) 是連結兩組節點的矩形矩陣。當且僅當第一組的節點 \(i\) (\(B\) 的行列表示) 和第二組的節點 \(j\) (\(B\) 的欄位表示) 之間有邊緣時，\(B_{ij}=1\)。

[15]:

biadjacency = np.array([[1, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 1, 1, 1, 0], [0, 0, 0, 1, 1]])

[16]:

biadjacency

[16]:

array([[1, 1, 0, 1, 0],
       [0, 0, 1, 0, 1],
       [1, 1, 1, 1, 0],
       [0, 0, 0, 1, 1]])

[17]:

biadjacency = sparse.csr_matrix(biadjacency)

[18]:

biadjacency

[18]:

<4x5 sparse matrix of type '<class 'numpy.int64'>'
        with 11 stored elements in Compressed Sparse Row format>

[19]:

from sknetwork.visualization import visualize_bigraph

[20]:

# name nodes by indices
n_row, n_col = biadjacency.shape
names_row = np.arange(n_row)
names_col = np.arange(n_col)

[21]:

# visualization
SVG(visualize_bigraph(biadjacency, names_row=names_row, names_col=names_col, reorder=False))

[21]:

../../_images/tutorials_overview_get_started_29_0.svg

加權圖形

邊緣的權重可以用具有非負輸入項目的鄰接矩陣 (或二部圖的雙鄰接矩陣) 來表示。

[22]:

adjacency = np.array([[0, 2, 1, 3, 0], [2, 0, 0, 0, 3], [1, 0, 0, 2, 0], [3, 0, 2, 0, 1], [0, 3, 0, 1, 0]])

[23]:

adjacency

[23]:

array([[0, 2, 1, 3, 0],
       [2, 0, 0, 0, 3],
       [1, 0, 0, 2, 0],
       [3, 0, 2, 0, 1],
       [0, 3, 0, 1, 0]])

[24]:

adjacency = sparse.csr_matrix(adjacency)

[25]:

# visualization
SVG(visualize_graph(adjacency, names=names))

[25]:

../../_images/tutorials_overview_get_started_34_0.svg

子圖形

子圖形可以透過切割鄰接矩陣輕鬆取得。

[26]:

index = [0, 2, 3, 4]
sub_adjacency = adjacency[index][:, index]

[27]:

# visualization
SVG(visualize_graph(sub_adjacency, names=names[index]))

[27]:

../../_images/tutorials_overview_get_started_37_0.svg

3. 算算法

基本工具

一些用於讀取、處理和視覺化圖形的工具可用作函式。

模組	說明	函式
數據	讀取和儲存圖形	`from_csv`, `save`, `load`, `load_netset`, …
拓撲	連線性和結構	`get_connected_components`, `is_acyclic`, …
路徑	最短路徑和距離	`get_distances`, `get_shortest_path`, …
線性代數	矩陣運算	`normalize`, `diagonal_pseudo_inverse`, …
公用程式	有用工具	`directed2undirected`, `get_degrees`, `get_neighbors`, …
視覺化	視覺化工具	`visualize_graph`, `svg_bigraphs`, …

[28]:

from sknetwork.data import karate_club
from sknetwork.utils import get_degrees

[29]:

adjacency = karate_club()

[30]:

get_degrees(adjacency)

[30]:

array([16,  9, 10,  6,  3,  4,  4,  4,  5,  2,  3,  1,  2,  5,  2,  2,  2,
        2,  2,  3,  2,  2,  2,  5,  3,  3,  2,  4,  3,  4,  4,  6, 12, 17],
      dtype=int32)

機器學習

Scikit-network 適用於圖形的機器學習。

每種演算法可用作具有下列方法之一的物件

fit：根據圖形調整演算法。此方法為強制選項。
predict：預測輸出（例如：節點的標籤）。
predict_proba：預測帶有機率的輸出（例如：標籤上的機率分布）。
transform：轉換圖形。
fit_predict：套用 fit + predict。
fit_predict_proba：套用 fit + predict_proba。
fit_transform：套用 fit + transform。

輸出是物件的屬性，並帶了下底線（例如：labels_）。

模組	說明	演算法	輸出
分群	建立節點叢集	`Louvain`、`PropagationClustering`	`labels_`
分類	對節點分類	`DiffusionClassifier`、`NNClassifier`，…	`labels_`
GNN	圖形神經網路	`GNNClassifier`	`labels_`
迴歸	將值指派給節點	`Diffusion`、`Dirichlet`	`values_`
階層	取得節點的階層表示	`Paris`、`LouvainHierarchy`	`dendrogram_`
嵌入	取得節點的向量表示	`Spectral`、`SVD`、`LouvainEmbedding`，…	`embedding_`
排名	針對節點賦予重要性分數	`PageRank`、`Katz`、`Betweenness`，…	`scores_`
連結預測	預測節點之間的連結	`NNLinker`	`links_`

[31]:

from sknetwork.data import karate_club
from sknetwork.classification import DiffusionClassifier

[32]:

dataset = karate_club(metadata=True)
adjacency = dataset.adjacency
position = dataset.position
labels = dataset.labels

[33]:

classifier = DiffusionClassifier()
classifier.fit(adjacency, {node: labels[node] for node in [0, 1, 33]})

[33]:

DiffusionClassifier(n_iter=10, centering=True, scale=5)

[34]:

labels_pred = classifier.predict()

[35]:

SVG(visualize_graph(adjacency, position, labels=labels_pred))

[35]:

../../_images/tutorials_overview_get_started_51_0.svg

[36]:

probs = classifier.predict_proba()

[37]:

scores = probs[:, 1]

[38]:

SVG(visualize_graph(adjacency, position, scores=scores))

[38]:

../../_images/tutorials_overview_get_started_54_0.svg

4. 範例

在此，我們示範如何使用 scikit-network 為圖形進行分群，這張圖形最初儲存在純文字檔案中，並呈邊緣清單的形式。

資料

[39]:

# show first lines
with open('miserables.tsv', 'r') as f:
    data = f.readlines()
    print(''.join(data[:5]))

Myriel  Napoleon
Myriel  Mlle Baptistine
Myriel  Mme Magloire
Myriel  Countess de Lo
Myriel  Geborand

載入

[40]:

from sknetwork.data import from_csv

[41]:

graph = from_csv('miserables.tsv')

[42]:

adjacency = graph.adjacency
names = graph.names

[43]:

adjacency

[43]:

<77x77 sparse matrix of type '<class 'numpy.int64'>'
        with 508 stored elements in Compressed Sparse Row format>

內嵌

在此，我們計算一個 2D 內嵌以供視覺化。

[44]:

from sknetwork.embedding import Spring

[45]:

algorithm = Spring()

[46]:

position = algorithm.fit_transform(adjacency)

[47]:

SVG(visualize_graph(adjacency, position, names=names))

[47]:

../../_images/tutorials_overview_get_started_67_0.svg

分群

最後，我們使用 Louvain 演算法為該圖形進行分群。

[48]:

from sknetwork.clustering import Louvain

[49]:

algorithm = Louvain()
labels = algorithm.fit_predict(adjacency)

[50]:

SVG(visualize_graph(adjacency, position, names=names, labels=labels))

[50]:

../../_images/tutorials_overview_get_started_72_0.svg

二部圖

請注意，大多數演算法都適用於二部圖，並使用 fit 函式套用至二鄰接矩陣。我們展示電影和演員之間二部圖範例。

[51]:

# show first lines
with open('movie_actor.tsv', 'r') as f:
    data = f.readlines()
    print(''.join(data[:5]))

Inception       Leonardo DiCaprio
Inception       Marion Cotillard
Inception       Joseph Gordon Lewitt
The Dark Knight Rises   Marion Cotillard
The Dark Knight Rises   Joseph Gordon Lewitt

[52]:

graph = from_csv('movie_actor.tsv', bipartite=True)

[53]:

biadjacency = graph.biadjacency
movies = graph.names_row
actors = graph.names_col

[54]:

algorithm = Louvain()
algorithm.fit(biadjacency)
labels_row = algorithm.labels_row_
labels_col = algorithm.labels_col_

[55]:

SVG(visualize_bigraph(biadjacency, names_row=movies, names_col=actors, labels_row=labels_row, labels_col=labels_col))

[55]:

../../_images/tutorials_overview_get_started_79_0.svg