Tutorial 4: Convolutional Layers - Spectral methods

Outline

  • Why convolution in ML

  • Some theory on convolution

  • Convolution on graphs

  • Spectral-convolutional layers in PyTorch Geometric

[ ]:
import os
import torch
os.environ['TORCH'] = torch.__version__
print(torch.__version__)

!pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q git+https://github.com/pyg-team/pytorch_geometric.git
1.12.1+cu113
     |████████████████████████████████| 7.9 MB 5.5 MB/s
     |████████████████████████████████| 3.5 MB 5.7 MB/s
  Building wheel for torch-geometric (setup.py) ... done

Imports

[ ]:
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 16})

Why convolution in ML

  • Weight sharing

  • Detection of translational invariant and local features

fully_connected Source

cnn.gif Source

Convolution.png Source

Some theory on convolution

Convolution_of_box_signal_with_itself2.gif Source

Definition

\begin{align*} c[n] = (v * w)[n] = \sum_{m=0}^{N-1} v[m] \cdot w[n-m] \end{align*}

[ ]:
def conv(v, w):
    c = np.zeros(v.shape)
    for n in range(len(v)):
        c[n] = 0
        for m in range(len(v)):
            c[n] += v[m] * w[n - m]
    return c
[ ]:
N = 20
v = np.zeros(N)
v[8:12] = 1
w = np.zeros(N)
w[1:5] = 1
c = conv(v, w)

fig = plt.figure()
ax = fig.gca()
ax.plot(v, '.-')
ax.plot(w, '.-')
ax.plot(c, '.-')
ax.legend(['v', 'w', 'c'])
ax.grid(True)
../../../_images/ipynbs_colabs_pyg_tutorial_project_Tutorial04_16_0.png

Fourier transform

Transformation \(\mathcal F: \mathbb{R}^N \to \mathbb{R}^N\) with

\begin{align*} \mathcal F^{-1}(\mathcal F (v)) &= v\\ \mathcal F(v * w) &= \mathcal F(v) \cdot \mathcal F(w). \end{align*}

This implies \begin{align*} v * w &= \mathcal F^{-1}(\mathcal F (v * w))\\ &= \mathcal F^{-1}(\mathcal F(v) \cdot \mathcal F(w)) \end{align*}

[ ]:
v, w = np.random.rand(N), np.random.rand(N)
conv(v, w)
array([3.71066741, 2.82560427, 3.60638731, 3.31932246, 3.04970839,
       3.02831807, 3.39914657, 3.41759529, 4.00571303, 3.43251439,
       3.76970824, 3.84467728, 3.81935431, 3.73835745, 3.81441997,
       4.22630189, 3.84332602, 3.45189449, 3.26571067, 3.43742144])
[ ]:
from scipy.fft import fft, ifft # Fast Fourier Transform / Inverse FFT
np.abs(ifft(fft(v) * fft(w)))
array([3.71066741, 2.82560427, 3.60638731, 3.31932246, 3.04970839,
       3.02831807, 3.39914657, 3.41759529, 4.00571303, 3.43251439,
       3.76970824, 3.84467728, 3.81935431, 3.73835745, 3.81441997,
       4.22630189, 3.84332602, 3.45189449, 3.26571067, 3.43742144])

Definition of the Fourier transform

The Fourier transform can be computed as

\begin{align*} \mathcal F(v) = U\cdot v, \;\;\mathcal F^{-1}(v) = \frac{1}{N}\ U^H \cdot v \end{align*}

where the \(N\times N\) matrix \(U\) is defined as \begin{align*} \\ U = \begin{bmatrix} u_0(0) & u_1(0) & \dots & u_{N-1}(0)\\ u_0(1) & u_1(1) & \dots & u_{N-1}(1)\\ \vdots & \vdots& & \vdots\\ u_0(N-1) & u_1(N-1) & \dots & u_{N-1}(N-1)\\ \end{bmatrix} \end{align*}

and \(u_0, \dots, u_{N-1}\) are functions defined as

\begin{align*} u_n(x)&:= \cos\left(2 \pi \frac{n}{N} x\right) - i \sin\left(2 \pi \frac{n}{N} x\right). \end{align*}

[ ]:
def matrix_U(N):
    u = lambda n, N: np.cos(2 * np.pi / N * n * np.arange(N)) - 1j * np.sin(2 * np.pi / N * n * np.arange(N))
    U = np.empty((N, 0))
    for n in range(N):
        U = np.c_[U, u(n, N)]
    return U


def fourier_transform(v):
    N = v.shape[0]
    U = matrix_U(N)
    return U @ v


def inverse_fourier_transform(v):
    N = v.shape[0]
    U = matrix_U(N)
    return (U.conj().transpose() @ v) / N
[ ]:
fft(v) - fourier_transform(v)
array([-1.77635684e-15-0.00000000e+00j,  0.00000000e+00+2.77555756e-16j,
        4.44089210e-16-3.33066907e-16j,  1.11022302e-16+6.66133815e-16j,
        3.88578059e-16-3.33066907e-16j,  5.55111512e-15+1.22124533e-15j,
       -2.10942375e-15-9.99200722e-16j, -3.55271368e-15+2.22044605e-15j,
        4.44089210e-16+1.55431223e-15j, -4.10782519e-15+2.77555756e-15j,
       -8.88178420e-16+3.00350636e-15j,  1.44328993e-15+1.22124533e-15j,
        1.33226763e-15-1.33226763e-15j, -5.99520433e-15-4.21884749e-15j,
       -1.27675648e-15+1.05471187e-15j,  0.00000000e+00+4.88498131e-15j,
       -1.38777878e-15+1.11022302e-16j, -5.88418203e-15-1.31006317e-14j,
        3.83026943e-15+3.83026943e-15j, -5.77315973e-15-7.21644966e-16j])
[ ]:
ifft(v) - inverse_fourier_transform(v)
array([ 0.00000000e+00-0.00000000e+00j,  2.77555756e-17-5.20417043e-18j,
        2.90566182e-17+2.25514052e-17j, -3.46944695e-18-4.16333634e-17j,
        2.08166817e-17+2.77555756e-17j,  2.77555756e-16-5.55111512e-17j,
       -1.04083409e-16+4.51028104e-17j, -1.80411242e-16-1.00613962e-16j,
        2.08166817e-17-7.63278329e-17j, -1.90819582e-16-1.52655666e-16j,
       -4.16333634e-17-1.50175318e-16j,  5.20417043e-17-4.85722573e-17j,
        6.24500451e-17+6.24500451e-17j, -2.91433544e-16+2.04697370e-16j,
       -6.41847686e-17-5.55111512e-17j,  0.00000000e+00-2.42861287e-16j,
       -6.93889390e-17+6.93889390e-18j, -2.81025203e-16+6.66133815e-16j,
        1.98408998e-16-1.89084859e-16j, -2.91433544e-16+5.03069808e-17j])

Connection with the Laplacian

The functions \(u_n\) (the columns of the Fourier transform matrix) are eigenvectors of the Laplacian:

\begin{align*} u_n(x)&:= \cos\left(2 \pi \frac{n}{N} x\right) - i \sin\left(2 \pi \frac{n}{N} x\right)\\ \Delta u_n(x)&:= \left(-4 \pi ^ 2\frac{n^2}{N^2}\right) u_n(x) \end{align*}

Summary

\begin{align*} v * w = U^H ((U w) \odot (U v)) \end{align*}

or if \(g_w=\mbox{diag}(U w)\) is filter \begin{align*} v * w = U^H g_w U w \end{align*}

[ ]:
U = matrix_U(N)
np.abs((U.conj().transpose() / N) @ ((U @ v) * (U @ w)))
array([3.71066741, 2.82560427, 3.60638731, 3.31932246, 3.04970839,
       3.02831807, 3.39914657, 3.41759529, 4.00571303, 3.43251439,
       3.76970824, 3.84467728, 3.81935431, 3.73835745, 3.81441997,
       4.22630189, 3.84332602, 3.45189449, 3.26571067, 3.43742144])
[ ]:
conv(v, w)
array([3.71066741, 2.82560427, 3.60638731, 3.31932246, 3.04970839,
       3.02831807, 3.39914657, 3.41759529, 4.00571303, 3.43251439,
       3.76970824, 3.84467728, 3.81935431, 3.73835745, 3.81441997,
       4.22630189, 3.84332602, 3.45189449, 3.26571067, 3.43742144])

Convolution on graphs

Plan: - Define the graph Laplacian - Compute the spectrum - Define a Fourier transform - Define convolution on a graph

Note: From now on \(G = (V, E)\) is an undirected, unweighted, simple graph.

Graph Laplacian

Adjacency matrix \begin{align*} A_{ij} = \left\{ \begin{array}{ll} 1 &\text{ if } e_{ij}\in E\\ 0 &\text{ if } e_{ij}\notin E \end{array} \right. \end{align*}

Degree matrix \begin{align*} D_{ij} = \left\{ \begin{array}{ll} \mbox{deg}(v_i) &\text{ if } i=j\\ 0 &\text{ if } i\neq j \end{array} \right. \end{align*}

Laplacian \begin{align*} L &= D - A. \end{align*}

Normalized Laplacian \begin{align*} L &= I - D^{-1/2} A D^{-1/2}. \end{align*}

Graph spectrum, Fourier transform, and convolution

  1. Spectral decomposition of the Laplacian: \begin{align*} L = U \Lambda U^T\\ \end{align*}

  2. Fourier transform: if \(v\) is a vector of features on the graph, then \begin{align*} \mathcal F (v) = U \cdot v, \;\;\mathcal F^{-1} (v) = U^T \cdot v\\ \end{align*}

  3. Convolution with a filter \(U \cdot w\) \begin{align*} v * w = U ((U^T w) \odot (U^T v) ) \end{align*}

Or \(g_w = \mbox{diag}(U^T w)\) is a filter, then \begin{align*} v * w = U g_w U^T v \end{align*}

Spectral-convolutional layers in PyTorch Geometric

Problem: Computing the spectrum is a global and very expensive property.

Goal: Implementation as message passing.

ChebConv

Goal:

Compute \(U g_w U^T x\) with \(g_w = g_w(\Lambda)\) a filter.

Chebyshev approximation

Chebyshev polynomials \(T_k\): \begin{align*} T_{k}(x) = 2 x T_{k-1}(x) - T_{k-2}(x), \;\; T_0(x) = 1, T_1(x) = x \end{align*}

Chebyshev approximation of the filter

Aproximation of the filter: \begin{align*} g_w(\Lambda) = \sum_{k=0}^K \theta_k T_k(\tilde \Lambda),\;\;\;\;\tilde \Lambda = \frac{2}{\lambda_\max} \Lambda - I \end{align*}

Property

If \(L = U \Lambda U^T\) then \(T_k(L) = U T_k(\Lambda) U^T\).

Fast approximated convolution

\begin{align*} v * w &= U g_w U^T x = U \left(\sum_{k=0}^K \theta_k T_k(\tilde \Lambda) \right)U^T x =\sum_{k=0}^K \theta_k U T_k(\tilde \Lambda) U^T x\\ &=\sum_{k=0}^K \theta_k T_k(\tilde L) x \end{align*}

\begin{align*} \tilde L = \frac{2}{\lambda_\max} L - I \end{align*}

Properties:

  • Depends on \(L\) and \(\lambda_\max\), not on \(U, \Sigma\)

  • Uses only \(K\)-powers \(\Rightarrow\) only the \(K\)-th neighborhood of each node, localized filter

As message passing: cheb_init.png

cheb_norm.png

cheb_forward.png

cheb_message.png

GCNConv

Start from ChebConv and assume 1. \(K=1\) (linear approximation) so \begin{align*} v * w &=\sum_{k=0}^1 \theta_k T_k(\tilde L) x = \theta_0 x + \theta_1 \tilde L x\\ \end{align*}

  1. \(\lambda_\max =2\) so \begin{align*} v * w &= \theta_0 x + \theta_1 (L - I) x\\ &= \theta_0 x - \theta_1 D^{-1/2} A D^{1/2} x\\ \end{align*}

  2. \(\theta_0=-\theta_1= \theta\) so \begin{align*} v * w = \left(I + D^{-1/2} A D^{1/2}\right) x \theta \end{align*}

  3. Renormalization of \(\theta\) by using \begin{align*} \tilde A&:= I + A\\ \tilde D_{ii}&:= \sum_j \tilde A_{ij} \end{align*} so \begin{align*} v * w = \left(D^{-1/2} A D^{1/2}\right) x \theta \end{align*}

If \(x\) is a \(F\)-dimensional feature vector, and we want an \(F'\)-dimensional feature vector as output: use \(W'\in \mathbb{R}^{F\times F'}\) \begin{align*} v * w = \left(D^{-1/2} A D^{1/2}\right) x \Theta \end{align*}

Nodewise: image.png

As message passing

See Tutorial3