圖神經網路
本筆記示範如何使用 圖神經網路 在圖形中執行節點分類。
[1]:
from IPython.display import SVG
[2]:
import numpy as np
from scipy import sparse
[3]:
from sknetwork.data import art_philo_science
from sknetwork.classification import get_accuracy_score
from sknetwork.gnn import GNNClassifier
from sknetwork.visualization import visualize_graph
具特徵的圖形
讓我們載入 art_philo_science
展示用資料集。其中包括 30 篇維基百科文章,以及它們之間的連結。每一篇文章都由其摘要中使用的一些文字描述,其中包含 11 個文字的清單。每一篇文章都屬於以下 3 類其中一類:藝術、哲學或科學。
目標是從其他文章的類別(訓練集)擷取某些文章的類別(測試集)。
[4]:
graph = art_philo_science(metadata=True)
adjacency = graph.adjacency
features = graph.biadjacency
names = graph.names
names_features = graph.names_col
names_labels = graph.names_labels
labels_true = graph.labels
position = graph.position
[5]:
adjacency
[5]:
<30x30 sparse matrix of type '<class 'numpy.bool_'>'
with 240 stored elements in Compressed Sparse Row format>
[6]:
print(names)
['Isaac Newton' 'Albert Einstein' 'Carl Linnaeus' 'Charles Darwin'
'Ptolemy' 'Gottfried Wilhelm Leibniz' 'Carl Friedrich Gauss'
'Galileo Galilei' 'Leonhard Euler' 'John von Neumann' 'Leonardo da Vinci'
'Richard Wagner' 'Ludwig van Beethoven' 'Bob Dylan' 'Igor Stravinsky'
'The Beatles' 'Wolfgang Amadeus Mozart' 'Richard Strauss' 'Raphael'
'Pablo Picasso' 'Aristotle' 'Plato' 'Augustine of Hippo' 'Thomas Aquinas'
'Immanuel Kant' 'Bertrand Russell' 'David Hume' 'René Descartes'
'John Stuart Mill' 'Socrates']
[7]:
print(len(names))
30
[8]:
features
[8]:
<30x11 sparse matrix of type '<class 'numpy.int64'>'
with 101 stored elements in Compressed Sparse Row format>
[9]:
print(names_features)
['contribution' 'theory' 'invention' 'time' 'modern' 'century' 'study'
'logic' 'school' 'author' 'compose']
[10]:
len(names_features)
[10]:
11
[11]:
print(names_labels)
['science' 'arts' 'philosophy']
[12]:
# Number of labels
n_labels = len(set(labels_true))
GCN
預設的 GNN 是一個空間圖形卷積網路 (GCN)。我們在此使用一個隱藏層。可以透過參數 dims
(維度清單)指定更多隱藏層。
[13]:
# GNN classifier with a single hidden layer
hidden_dim = 5
gnn = GNNClassifier(dims=[hidden_dim, n_labels],
layer_types='Conv',
activations='ReLu',
verbose=True)
[14]:
print(gnn)
GNNClassifier(
Convolution(layer_type: conv, out_channels: 5, activation: ReLu, use_bias: True, normalization: both, self_embeddings: True)
Convolution(layer_type: conv, out_channels: 3, activation: Cross entropy, use_bias: True, normalization: both, self_embeddings: True)
)
[15]:
# Training set
labels = labels_true.copy()
np.random.seed(42)
train_mask = np.random.random(size=len(labels)) < 0.5
labels[train_mask] = -1
[16]:
# Training
labels_pred = gnn.fit_predict(adjacency, features, labels, n_epochs=200, random_state=42)
In epoch 0, loss: 1.053, train accuracy: 0.462
In epoch 20, loss: 0.834, train accuracy: 0.692
In epoch 40, loss: 0.819, train accuracy: 0.692
In epoch 60, loss: 0.831, train accuracy: 0.692
In epoch 80, loss: 0.839, train accuracy: 0.692
In epoch 100, loss: 0.839, train accuracy: 0.692
In epoch 120, loss: 0.825, train accuracy: 0.692
In epoch 140, loss: 0.771, train accuracy: 0.769
In epoch 160, loss: 0.557, train accuracy: 1.000
In epoch 180, loss: 0.552, train accuracy: 1.000
[17]:
# History for each training epoch
gnn.history_.keys()
[17]:
dict_keys(['loss', 'train_accuracy'])
[18]:
# Accuracy on test set
test_mask = ~train_mask
get_accuracy_score(labels_true[test_mask], labels_pred[test_mask])
[18]:
1.0
[19]:
# Visualization
image = visualize_graph(adjacency, position=position, names=names, labels=labels_pred)
SVG(image)
[19]:
[20]:
# probability distribution over labels
probs = gnn.predict_proba()
[21]:
label = 1
scores = probs[:, label]
[22]:
# Visualization
image = visualize_graph(adjacency, position=position, names=names, scores=scores)
SVG(image)
[22]:
GraphSAGE
另一個可用的 GNN 是 GraphSAGE。
[23]:
# GraphSAGE layers
gnn = GNNClassifier(dims=[5, 3], layer_types='Sage')
[24]:
print(gnn)
GNNClassifier(
Convolution(layer_type: sage, out_channels: 5, activation: ReLu, use_bias: True, normalization: left, self_embeddings: True, sample_size: 25)
Convolution(layer_type: sage, out_channels: 3, activation: Cross entropy, use_bias: True, normalization: left, self_embeddings: True, sample_size: 25)
)
[25]:
# Training
labels_pred = gnn.fit_predict(adjacency, features, labels, n_epochs=200, random_state=42)
[26]:
# Accuracy on test set
test_mask = ~train_mask
get_accuracy_score(labels_true[test_mask], labels_pred[test_mask])
[26]:
1.0
[27]:
# Parameters of the GNN
weights = [layer.weight for layer in gnn.layers]
biases = [layer.bias for layer in gnn.layers]
[28]:
[weight.shape for weight in weights]
[28]:
[(11, 5), (5, 3)]
[29]:
[bias.shape for bias in biases]
[29]:
[(1, 5), (1, 3)]
[30]:
# probability distribution over labels
probs = gnn.predict_proba()
[31]:
label = 1
scores = probs[:, label]
[32]:
# Visualization
image = visualize_graph(adjacency, position=position, names=names, scores=scores)
SVG(image)
[32]:
[33]:
# Parameters of the GNN
weights = [layer.weight for layer in gnn.layers]
biases = [layer.bias for layer in gnn.layers]
[34]:
[weight.shape for weight in weights]
[34]:
[(11, 5), (5, 3)]
[35]:
[bias.shape for bias in biases]
[35]:
[(1, 5), (1, 3)]
[36]:
# probability distribution over labels
probs = gnn.predict_proba()
[37]:
label = 1
scores = probs[:, label]
[38]:
# Visualization
image = visualize_graph(adjacency, position=position, names=names, scores=scores)
SVG(image)
[38]: