Tutorial 4: Convolutional Layers - Spectral methods¶
Outline¶
Why convolution in ML
Some theory on convolution
Convolution on graphs
Spectral-convolutional layers in PyTorch Geometric
[ ]:
import os
import torch
os.environ['TORCH'] = torch.__version__
print(torch.__version__)
!pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q git+https://github.com/pyg-team/pytorch_geometric.git
1.12.1+cu113
|████████████████████████████████| 7.9 MB 5.5 MB/s
|████████████████████████████████| 3.5 MB 5.7 MB/s
Building wheel for torch-geometric (setup.py) ... done
Imports¶
[ ]:
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 16})
Why convolution in ML¶
Weight sharing
Detection of translational invariant and local features
Some theory on convolution¶
Definition¶
\begin{align*} c[n] = (v * w)[n] = \sum_{m=0}^{N-1} v[m] \cdot w[n-m] \end{align*}
[ ]:
def conv(v, w):
c = np.zeros(v.shape)
for n in range(len(v)):
c[n] = 0
for m in range(len(v)):
c[n] += v[m] * w[n - m]
return c
[ ]:
N = 20
v = np.zeros(N)
v[8:12] = 1
w = np.zeros(N)
w[1:5] = 1
c = conv(v, w)
fig = plt.figure()
ax = fig.gca()
ax.plot(v, '.-')
ax.plot(w, '.-')
ax.plot(c, '.-')
ax.legend(['v', 'w', 'c'])
ax.grid(True)
Fourier transform¶
Transformation \(\mathcal F: \mathbb{R}^N \to \mathbb{R}^N\) with
\begin{align*} \mathcal F^{-1}(\mathcal F (v)) &= v\\ \mathcal F(v * w) &= \mathcal F(v) \cdot \mathcal F(w). \end{align*}
This implies \begin{align*} v * w &= \mathcal F^{-1}(\mathcal F (v * w))\\ &= \mathcal F^{-1}(\mathcal F(v) \cdot \mathcal F(w)) \end{align*}
[ ]:
v, w = np.random.rand(N), np.random.rand(N)
conv(v, w)
array([3.71066741, 2.82560427, 3.60638731, 3.31932246, 3.04970839,
3.02831807, 3.39914657, 3.41759529, 4.00571303, 3.43251439,
3.76970824, 3.84467728, 3.81935431, 3.73835745, 3.81441997,
4.22630189, 3.84332602, 3.45189449, 3.26571067, 3.43742144])
[ ]:
from scipy.fft import fft, ifft # Fast Fourier Transform / Inverse FFT
np.abs(ifft(fft(v) * fft(w)))
array([3.71066741, 2.82560427, 3.60638731, 3.31932246, 3.04970839,
3.02831807, 3.39914657, 3.41759529, 4.00571303, 3.43251439,
3.76970824, 3.84467728, 3.81935431, 3.73835745, 3.81441997,
4.22630189, 3.84332602, 3.45189449, 3.26571067, 3.43742144])
Definition of the Fourier transform¶
The Fourier transform can be computed as
\begin{align*} \mathcal F(v) = U\cdot v, \;\;\mathcal F^{-1}(v) = \frac{1}{N}\ U^H \cdot v \end{align*}
where the \(N\times N\) matrix \(U\) is defined as \begin{align*} \\ U = \begin{bmatrix} u_0(0) & u_1(0) & \dots & u_{N-1}(0)\\ u_0(1) & u_1(1) & \dots & u_{N-1}(1)\\ \vdots & \vdots& & \vdots\\ u_0(N-1) & u_1(N-1) & \dots & u_{N-1}(N-1)\\ \end{bmatrix} \end{align*}
and \(u_0, \dots, u_{N-1}\) are functions defined as
\begin{align*} u_n(x)&:= \cos\left(2 \pi \frac{n}{N} x\right) - i \sin\left(2 \pi \frac{n}{N} x\right). \end{align*}
[ ]:
def matrix_U(N):
u = lambda n, N: np.cos(2 * np.pi / N * n * np.arange(N)) - 1j * np.sin(2 * np.pi / N * n * np.arange(N))
U = np.empty((N, 0))
for n in range(N):
U = np.c_[U, u(n, N)]
return U
def fourier_transform(v):
N = v.shape[0]
U = matrix_U(N)
return U @ v
def inverse_fourier_transform(v):
N = v.shape[0]
U = matrix_U(N)
return (U.conj().transpose() @ v) / N
[ ]:
fft(v) - fourier_transform(v)
array([-1.77635684e-15-0.00000000e+00j, 0.00000000e+00+2.77555756e-16j,
4.44089210e-16-3.33066907e-16j, 1.11022302e-16+6.66133815e-16j,
3.88578059e-16-3.33066907e-16j, 5.55111512e-15+1.22124533e-15j,
-2.10942375e-15-9.99200722e-16j, -3.55271368e-15+2.22044605e-15j,
4.44089210e-16+1.55431223e-15j, -4.10782519e-15+2.77555756e-15j,
-8.88178420e-16+3.00350636e-15j, 1.44328993e-15+1.22124533e-15j,
1.33226763e-15-1.33226763e-15j, -5.99520433e-15-4.21884749e-15j,
-1.27675648e-15+1.05471187e-15j, 0.00000000e+00+4.88498131e-15j,
-1.38777878e-15+1.11022302e-16j, -5.88418203e-15-1.31006317e-14j,
3.83026943e-15+3.83026943e-15j, -5.77315973e-15-7.21644966e-16j])
[ ]:
ifft(v) - inverse_fourier_transform(v)
array([ 0.00000000e+00-0.00000000e+00j, 2.77555756e-17-5.20417043e-18j,
2.90566182e-17+2.25514052e-17j, -3.46944695e-18-4.16333634e-17j,
2.08166817e-17+2.77555756e-17j, 2.77555756e-16-5.55111512e-17j,
-1.04083409e-16+4.51028104e-17j, -1.80411242e-16-1.00613962e-16j,
2.08166817e-17-7.63278329e-17j, -1.90819582e-16-1.52655666e-16j,
-4.16333634e-17-1.50175318e-16j, 5.20417043e-17-4.85722573e-17j,
6.24500451e-17+6.24500451e-17j, -2.91433544e-16+2.04697370e-16j,
-6.41847686e-17-5.55111512e-17j, 0.00000000e+00-2.42861287e-16j,
-6.93889390e-17+6.93889390e-18j, -2.81025203e-16+6.66133815e-16j,
1.98408998e-16-1.89084859e-16j, -2.91433544e-16+5.03069808e-17j])
Connection with the Laplacian¶
The functions \(u_n\) (the columns of the Fourier transform matrix) are eigenvectors of the Laplacian:
\begin{align*} u_n(x)&:= \cos\left(2 \pi \frac{n}{N} x\right) - i \sin\left(2 \pi \frac{n}{N} x\right)\\ \Delta u_n(x)&:= \left(-4 \pi ^ 2\frac{n^2}{N^2}\right) u_n(x) \end{align*}
Summary¶
\begin{align*} v * w = U^H ((U w) \odot (U v)) \end{align*}
or if \(g_w=\mbox{diag}(U w)\) is filter \begin{align*} v * w = U^H g_w U w \end{align*}
[ ]:
U = matrix_U(N)
np.abs((U.conj().transpose() / N) @ ((U @ v) * (U @ w)))
array([3.71066741, 2.82560427, 3.60638731, 3.31932246, 3.04970839,
3.02831807, 3.39914657, 3.41759529, 4.00571303, 3.43251439,
3.76970824, 3.84467728, 3.81935431, 3.73835745, 3.81441997,
4.22630189, 3.84332602, 3.45189449, 3.26571067, 3.43742144])
[ ]:
conv(v, w)
array([3.71066741, 2.82560427, 3.60638731, 3.31932246, 3.04970839,
3.02831807, 3.39914657, 3.41759529, 4.00571303, 3.43251439,
3.76970824, 3.84467728, 3.81935431, 3.73835745, 3.81441997,
4.22630189, 3.84332602, 3.45189449, 3.26571067, 3.43742144])
Convolution on graphs¶
Plan: - Define the graph Laplacian - Compute the spectrum - Define a Fourier transform - Define convolution on a graph
Note: From now on \(G = (V, E)\) is an undirected, unweighted, simple graph.
Graph Laplacian¶
Adjacency matrix \begin{align*} A_{ij} = \left\{ \begin{array}{ll} 1 &\text{ if } e_{ij}\in E\\ 0 &\text{ if } e_{ij}\notin E \end{array} \right. \end{align*}
Degree matrix \begin{align*} D_{ij} = \left\{ \begin{array}{ll} \mbox{deg}(v_i) &\text{ if } i=j\\ 0 &\text{ if } i\neq j \end{array} \right. \end{align*}
Laplacian \begin{align*} L &= D - A. \end{align*}
Normalized Laplacian \begin{align*} L &= I - D^{-1/2} A D^{-1/2}. \end{align*}
Graph spectrum, Fourier transform, and convolution¶
Spectral decomposition of the Laplacian: \begin{align*} L = U \Lambda U^T\\ \end{align*}
Fourier transform: if \(v\) is a vector of features on the graph, then \begin{align*} \mathcal F (v) = U \cdot v, \;\;\mathcal F^{-1} (v) = U^T \cdot v\\ \end{align*}
Convolution with a filter \(U \cdot w\) \begin{align*} v * w = U ((U^T w) \odot (U^T v) ) \end{align*}
Or \(g_w = \mbox{diag}(U^T w)\) is a filter, then \begin{align*} v * w = U g_w U^T v \end{align*}
Spectral-convolutional layers in PyTorch Geometric¶
Problem: Computing the spectrum is a global and very expensive property.
Goal: Implementation as message passing.
ChebConv¶
Goal:¶
Compute \(U g_w U^T x\) with \(g_w = g_w(\Lambda)\) a filter.
Chebyshev approximation¶
Chebyshev polynomials \(T_k\): \begin{align*} T_{k}(x) = 2 x T_{k-1}(x) - T_{k-2}(x), \;\; T_0(x) = 1, T_1(x) = x \end{align*}
Chebyshev approximation of the filter¶
Aproximation of the filter: \begin{align*} g_w(\Lambda) = \sum_{k=0}^K \theta_k T_k(\tilde \Lambda),\;\;\;\;\tilde \Lambda = \frac{2}{\lambda_\max} \Lambda - I \end{align*}
Property¶
If \(L = U \Lambda U^T\) then \(T_k(L) = U T_k(\Lambda) U^T\).
Fast approximated convolution¶
\begin{align*} v * w &= U g_w U^T x = U \left(\sum_{k=0}^K \theta_k T_k(\tilde \Lambda) \right)U^T x =\sum_{k=0}^K \theta_k U T_k(\tilde \Lambda) U^T x\\ &=\sum_{k=0}^K \theta_k T_k(\tilde L) x \end{align*}
\begin{align*} \tilde L = \frac{2}{\lambda_\max} L - I \end{align*}
Properties:¶
Depends on \(L\) and \(\lambda_\max\), not on \(U, \Sigma\)
Uses only \(K\)-powers \(\Rightarrow\) only the \(K\)-th neighborhood of each node, localized filter
As message passing: 



GCNConv¶
Start from ChebConv and assume 1. \(K=1\) (linear approximation) so \begin{align*}
v * w
&=\sum_{k=0}^1 \theta_k T_k(\tilde L) x
= \theta_0 x + \theta_1 \tilde L x\\
\end{align*}
\(\lambda_\max =2\) so \begin{align*} v * w &= \theta_0 x + \theta_1 (L - I) x\\ &= \theta_0 x - \theta_1 D^{-1/2} A D^{1/2} x\\ \end{align*}
\(\theta_0=-\theta_1= \theta\) so \begin{align*} v * w = \left(I + D^{-1/2} A D^{1/2}\right) x \theta \end{align*}
Renormalization of \(\theta\) by using \begin{align*} \tilde A&:= I + A\\ \tilde D_{ii}&:= \sum_j \tilde A_{ij} \end{align*} so \begin{align*} v * w = \left(D^{-1/2} A D^{1/2}\right) x \theta \end{align*}
If \(x\) is a \(F\)-dimensional feature vector, and we want an \(F'\)-dimensional feature vector as output: use \(W'\in \mathbb{R}^{F\times F'}\) \begin{align*} v * w = \left(D^{-1/2} A D^{1/2}\right) x \Theta \end{align*}
Nodewise: 
As message passing¶
See Tutorial3