Lecture 17 - Geometric Deep Learning: Part 2

约 1308 个字预计阅读时间 4 分钟

Speaker: Dr. Elena Gal

Groups

A symmetry of an object is a transformation that preserves a specific property or structure.

Symmetries can be composed.
Symmetries are invertible.

Hence, symmetries form a group.

Definition

A group \(G\) is a set equipped with a composition operator \(\circ: G \times G \rightarrow G\), satisfying:

Associativity: \((f \circ g) \circ h = f \circ (g \circ h)\) for all \(f, g, h \in G\)
Identity: There exists an element \(e \in G\) such that \(e \circ g = g \circ e = g\) for all \(g \in G\)
Inverse: For every \(g \in G\), there exists a unique \(g^{-1} \in G\) such that \(g \circ g^{-1} = g^{-1} \circ g = e\)

Remark

Group operations are not necessarily commutative; \(g \circ h \neq h \circ g\) in general.

Group Actions

A group action of \(G\) on a set \(\Omega\) is a map \((g, u) \rightarrow g(u) \in \Omega\) satisfying \((g \circ h)(u) = g(h(u))\) for all \(g, h \in G\), and \(e(u) = u\) for all \(u \in \Omega\).

Example

The group \(E_{2}\) (Euclidean isometries) is generated by reflections, rotations, and translations in the plane, preserving Euclidean distances.

\(E_{2}\) acts on both the plane and the space of planar images (modelled as pixel grids).

Claim

\(E_{2}\) acts on the signal space \(\mathrm{X}(\Omega)\) via \((g \cdot x)(u) = x(g^{-1}(u))\)

Proof

For \(g, h \in G\) and \(x \in \mathrm{X}(\Omega)\), we verify:

\[(g \circ h) \cdot x = g \cdot (h \cdot x)\]

Specifically, \(\(((g \circ h) \cdot x)(u) = x((g \circ h)^{-1}(u)) = x(h^{-1}(g^{-1}(u))) = (h \cdot x)(g^{-1}(u)) = (g \cdot (h \cdot x))(u)\)\)

An action on the signal space can always be induced from a group action on \(\Omega\).

The signal space is a vector space, and the group action is linear: \(\(g \cdot (\alpha x + \beta y) = \alpha (g \cdot x) + \beta (g \cdot y)\)\)

\[g(\alpha x + \beta y) = \alpha(g(x)) + \beta(g(y))\]

A linear group action is equivalent to a representation \(\rho: G \rightarrow \text{GL}(V)\), where each \(g \in G\) maps to an invertible matrix \(\rho(g)\).

Subgroups and Levels of Structure

Symmetries preserve specific structures, and sets of such symmetries form groups. When multiple structures coexist on a domain \(\Omega\), they define hierarchical levels of structure. As more structures are imposed, the corresponding symmetry groups become smaller.

Example

In histopathology slide segmentation, flipped samples are considered equivalent. However, spatial distances between points must remain invariant.

In road sign classification, flipping a sign may alter its semantic meaning.

Definition

A subset \(H \subset G\) is a subgroup if \(h_{1} \circ h_{2} \in H\) for all \(h_{1}, h_{2} \in H\), \(e \in H\), and \(h^{-1} \in H\) for all \(h \in H\).

In previous examples, the hierarchy \(SE_{2} \subset E_{2} \subset \text{Diff}(\mathbb{R}^2)\) holds, where \(SE_{2}\) is the special Euclidean group (rigid motions), \(E_{2}\) includes reflections, and \(\text{Diff}(\mathbb{R}^2)\) is the diffeomorphism group.

Invariant and Equivariant Functions

Definition

A function \(f: \mathrm{X}(\Omega) \rightarrow \mathrm{Y}\) is \(G\)-invariant if \(f(\rho(g) \cdot x) = f(x)\) for all \(g \in G\) and \(x \in \mathrm{X}(\Omega)\).

The output of \(f\) is unaffected by the group action on the input.

Example: Image classification is \(G\)-invariant if rotated or translated inputs yield identical predictions.

A function \(f: \mathrm{X}(\Omega) \rightarrow \mathrm{Y}\) is \(G\)-equivariant if \(f(\rho(g) \cdot x) = \rho'(g) \cdot f(x)\) for all \(g \in G\) and \(x \in \mathrm{X}(\Omega)\), where \(\rho'\) is a group action on \(\mathrm{Y}\).

The output of \(f\) is affected by the group action in the same way as the input.

Example: Image segmentation is \(G\)-equivariant if rotating the input image rotates the output segmentation mask correspondingly.

Deformation Stability

Concept of Deformation Stability

Small deformations of the input signal \(x\) should induce only small changes in the output \(f(x)\).

Mathematical Formulation

Let \(\tau \in \text{Diff}(\Omega)\) be a deformation and \(c(\tau)\) a complexity measure quantifying the deviation of \(\tau\) from the subgroup \(G \subseteq \text{Diff}(\Omega)\).

Exact invariance is generalised to deformation stability:

\[|f(\rho(\tau) \cdot x) - f(x)| \leq C \cdot c(\tau) \cdot \|x\|\]

Here, \(C\) is a constant, and \(c(\tau) = 0\) for \(\tau \in G\), ensuring consistency with exact invariance when deformations lie within \(G\).

A function \(f\) satisfying this inequality is termed geometrically stable.

Example

For planar images, define \(c^{2}(\tau) := \int_{\Omega} \|\nabla \tau(u)\|^{2} \, du\), measuring the elastic energy of \(\tau\) relative to rigid translations.

Scale Separation

Coarsening the domain \(\Omega\) involves aggregating nearby points. A precise definition of nearby requires the concept of a metric.

Definition

A metric or distance is a function \(d: \Omega \times \Omega \rightarrow [0, \infty)\) satisfying for all \(u, v, w \in \Omega\)

\(d(u, v) = 0\) if \(u = v\)
\(d(u, v) = d(v, u)\)
\(d(u, v) \leq d(u, w) + d(w, v)\)

A space equipped with a metric \((\Omega, d)\) is called a \(metric space\).

Example

The Euclidean distance on the plane is given by the formula

\[d_{E}(u, v) := \sqrt{(u_{1} - v_{1})^{2} + (u_{2} - v_{2})^{2}}\]

A function \(f: \mathrm{X}(\Omega) \rightarrow \mathrm{Y}\) is locally stable if it admits a factorisation \(f \approx f' \circ P\), where \(f': \Omega' \rightarrow \mathrm{Y}\) and \(P: \mathrm{X}(\Omega) \rightarrow \mathrm{X}(\Omega')\) is a coarse-graining operation.

Non-Linearities, Invariance, and Equivariance

To achieve high expressivity, non-linear elements must be introduced, as a linear \(G\)-invariant function \(f\) satisfies

\[f(x) = \frac{1}{\mu(G)} \int_{G} f(g(x)) \text{d} \mu(g) = f \left( \frac{1}{\mu(G)} \int_{G} g(x) \text{d} \mu(g) \right)\]

Such a function \(f\) depends on \(x\) only through the average over all \(g(x)\). This is equivalent to depending solely on the average pixel colour across all translations in the case of images.

Consider a linear equivariant function

\[B: \mathrm{X}(\Omega, C) \rightarrow \mathrm{X}(\Omega, C'), B(g(x)) = g(B(x))\]

\(B\) can be composed with a non-linearity \(\rho: C' \rightarrow C''\) by defining

\[\mathrm{U}(x)(w) := (\rho \circ B)(x)(w) := \rho(B(x)(w))\]

A linear map is local if \((Bx)(u)\) depends only on the values of \(x(v)\) for \(v \in N_{u} = \{ v: d(u, v) \leq r \}\), where \(r\) is a small radius.

Combining multiple local equivariant maps yields

\[U_{J} \circ U_{J - 1} \circ \cdots \circ U_{1}\]

This gradually increases the receptive field of the network.

Global invariance is achieved by composing with group averaging.

Geometric Deep Learning Blueprint

Let \(\Omega\) and \(\Omega'\) be domains, and let \(G\) be a group of symmetries acting on \(\Omega\).

We define the following building blocks:

A linear \(G\)-equivariant layer \(B: \mathrm{X}(\Omega, C) \rightarrow \mathrm{X}(\Omega, C')\) satisfying \(B(g(x)) = g(B(x))\) for all \(g \in G\) and \(x \in \mathrm{X}(\Omega, C)\).
A local pooling (coarsening) operation \(P: \mathrm{X}(\Omega, C) \rightarrow \mathrm{X}(\Omega', C)\), where \(\Omega' \subset \Omega\).
A \(G\)-invariant layer (global pooling) \(A: \mathrm{X}(\Omega, C) \rightarrow \mathrm{Y}\) satisfying \(A(g(x)) = A(x)\) for all \(g \in G\) and \(x \in \mathrm{X}(\Omega, C)\).

Using these building blocks, \(G\)-invariant functions \(f: \mathrm{X}(\Omega, C) \rightarrow \mathrm{Y}\) can be constructed as

\[f = A \circ \rho_{J} \circ B_{J} \circ P_{J - 1} \circ \cdots \circ P_{1} \circ \rho_{1} \circ B_{1}\]

Different blocks may utilise distinct symmetry groups \(G\).