Variational Formulation vs Physics-Informed Neural Networks

The Fundamental Question

How do we solve partial differential equations numerically? For decades, the answer has been the variational formulation leading to finite element methods—a mathematically rigorous approach with proven convergence guarantees. Recently, physics-informed neural networks have emerged, bypassing variational principles entirely and working directly with the strong form. This comparison explores both approaches through the lens of Mandel's problem in poromechanics.

Variational Formulation

Classical Approach • Weak Form • FEM

Start with Strong Form (PDE)

Begin with the governing equations in their original form. For Mandel's problem (coupled poromechanics):

\begin{aligned} \nabla \cdot \boldsymbol{\sigma} + \mathbf{f} &= 0 \\ \frac{\partial}{\partial t}\left(\frac{1}{M}p + \alpha\nabla \cdot \mathbf{u}\right) - \kappa\nabla^2 p &= 0 \end{aligned}

Derive Weak Form

Multiply by test functions and integrate by parts. This reduces smoothness requirements and naturally incorporates boundary conditions.

\begin{aligned} &\text{Find } \mathbf{u} \in V, p \in Q \text{ such that } \forall \mathbf{w} \in V, q \in Q: \\ &\int_\Omega \boldsymbol{\sigma}(\mathbf{u}) : \nabla \mathbf{w} \, d\Omega = \int_{\partial\Omega} \mathbf{t} \cdot \mathbf{w} \, d\Gamma \\ &\int_\Omega \left[\frac{1}{M}\frac{\partial p}{\partial t}q + \kappa \nabla p \cdot \nabla q + \alpha \frac{\partial}{\partial t}(\nabla \cdot \mathbf{u})q\right] d\Omega = 0 \end{aligned}

Key: Integration by parts transfers derivatives from solution to test function

Choose Finite Element Space

Select finite-dimensional subspaces with prescribed basis functions (shape functions). Common choices: linear, quadratic, or higher-order polynomials.

\begin{aligned} V_h &\subset V, \quad \dim(V_h) = N \\ \mathbf{u}_h(x) &= \sum_{i=1}^{N} \mathbf{u}_i \phi_i(x) \\ p_h(x) &= \sum_{j=1}^{M} p_j \psi_j(x) \end{aligned}

where \(\phi_i\) and \(\psi_j\) are pre-defined polynomial basis functions with compact support.

Discretize: Galerkin Method

Substitute finite element approximations into weak form. This converts the infinite-dimensional problem into a finite system.

\begin{aligned} a(\mathbf{u}_h, \mathbf{w}_h) &= L(\mathbf{w}_h) \quad \forall \mathbf{w}_h \in V_h \\ \Rightarrow \mathbf{K}\mathbf{u} &= \mathbf{f} \end{aligned}

where stiffness matrix: \(K_{ij} = \int_\Omega \nabla \phi_j : \mathbb{C} : \nabla \phi_i \, d\Omega\)

Assembly and Integration

Compute integrals element-by-element using numerical quadrature (Gaussian integration), then assemble global system.

for each element K:
    compute K_local via quadrature
    for i,j in element nodes:
        K_global[i,j] += K_local[i,j]
                        

Requires mesh generation and element connectivity

Solve Linear/Nonlinear System

Apply boundary conditions and solve using direct methods (LU decomposition) or iterative solvers (CG, GMRES, multigrid).

\begin{aligned} \mathbf{K}\mathbf{u} &= \mathbf{f} \\ \text{or nonlinear: } \mathbf{R}(\mathbf{u}) &= 0 \text{ (Newton-Raphson)} \end{aligned}

Error Estimation & Refinement

Compute a posteriori error estimates and adaptively refine mesh where error is large.

\|u - u_h\|_{H^1} \leq Ch^k |u|_{H^{k+1}}

Guaranteed convergence rates with mesh refinement!

Physics-Informed Neural Network

Modern Approach • Strong Form • Deep Learning

Start with Strong Form (PDE)

Same starting point! Begin directly with the differential equations. No need to derive a weak form.

\begin{aligned} \nabla \cdot \boldsymbol{\sigma} + \mathbf{f} &= 0 \\ \frac{\partial}{\partial t}\left(\frac{1}{M}p + \alpha\nabla \cdot \mathbf{u}\right) - \kappa\nabla^2 p &= 0 \end{aligned}

Skip Weak Form Entirely!

PINNs work directly with the strong form. No integration by parts, no test functions, no variational principles needed.

This is the key difference: bypass all the variational machinery

Instead, we'll enforce the PDEs at specific collocation points using automatic differentiation.

Define Neural Network Ansatz

Represent the solution as a neural network. The network parameters (weights and biases) are the unknowns to optimize.

\begin{aligned} \hat{\mathbf{u}}(x,y,t) &= \mathcal{N}_u(x,y,t; \mathbf{W}_u, \mathbf{b}_u) \\ \hat{p}(x,y,t) &= \mathcal{N}_p(x,y,t; \mathbf{W}_p, \mathbf{b}_p) \end{aligned}

Each network: \(\mathbf{o}^{(l)} = \varphi(\mathbf{W}^{(l)}\mathbf{o}^{(l-1)} + \mathbf{b}^{(l)})\)

Sample Collocation Points

No mesh needed! Randomly sample points in the domain using Latin hypercube or uniform sampling.

N_interior = 15000  # Interior points
N_boundary = 500    # Per boundary
N_initial = 200     # Initial condition

X_interior = latin_hypercube_sample(N_interior)
X_boundary = sample_boundaries(N_boundary)
                        

Meshless approach: no connectivity, no element assembly

Formulate Loss Function

The objective directly enforces PDE residuals at collocation points, plus boundary/initial conditions.

\begin{aligned} \mathcal{L} &= \underbrace{\frac{1}{N_f}\sum_{i=1}^{N_f} \|\Pi(\hat{\mathbf{u}}, \hat{p})\|^2}_{\text{PDE residuals}} \\ &\quad + \underbrace{\frac{1}{N_b}\sum_{j=1}^{N_b} \|BC_j\|^2}_{\text{Boundary conditions}} \\ &\quad + \underbrace{\frac{1}{N_0}\sum_{k=1}^{N_0} \|IC_k\|^2}_{\text{Initial conditions}} \end{aligned}

where \(\Pi\) represents the PDE operators (computed via automatic differentiation).

Automatic Differentiation

Compute derivatives of neural network outputs with respect to inputs automatically using the chain rule through the computational graph.

\frac{\partial \hat{p}}{\partial x} = \frac{\partial \mathcal{N}_p}{\partial x}, \quad \frac{\partial^2 \hat{p}}{\partial x^2} = \frac{\partial^2 \mathcal{N}_p}{\partial x^2}

import tensorflow as tf

with tf.GradientTape() as tape:
    p_pred = neural_net(X)
    dp_dx = tape.gradient(p_pred, X[:, 0])
                        

Optimize via Gradient Descent

Minimize loss function using Adam, L-BFGS, or other optimizers. Update network weights iteratively.

\mathbf{W}^{k+1} = \mathbf{W}^k - \eta \nabla_{\mathbf{W}}\mathcal{L}(\mathbf{W}^k)

                            No convergence guarantees—success depends on optimization landscape and hyperparameters
                        

Key Differences Summary

Aspect	Variational Formulation (FEM)	Physics-Informed Neural Networks
Starting Point	Strong form → Weak form (variational)	Strong form directly (no weak form)
Mathematical Tool	Integration by parts, test functions	Automatic differentiation
Basis Functions	Pre-defined (polynomials, FE shape functions)	Learned (neural network discovers representation)
Discretization	Mesh required (elements, nodes, connectivity)	Meshless (random collocation points)
Integration	Numerical quadrature (Gaussian integration)	Point evaluation (no integration needed)
System Type	Linear/nonlinear algebraic system: \(\mathbf{K}\mathbf{u} = \mathbf{f}\)	Non-convex optimization problem: \(\min \mathcal{L}(\mathbf{W})\)
Solver	Direct (LU) or iterative (CG, GMRES)	Gradient descent (Adam, L-BFGS)
Convergence Theory	✓ Well-established (Céa's lemma, error estimates)	✗ Largely unknown (empirical, no guarantees)
Error Estimates	✓ A priori & a posteriori available	✗ No computable bounds
Convergence Rate	✓ Known: \(\mathcal{O}(h^k)\) for degree \(k\)	✗ Unknown (depends on architecture, optimizer)
Hyperparameters	Mesh size \(h\), polynomial degree \(k\)	Layers, neurons, learning rate, activation, batch size...
Complexity for User	Moderate (requires mesh generation expertise)	High (requires ML expertise, hyperparameter tuning)
Computational Cost	Predictable (depends on DOF count)	Variable (depends on convergence, architecture)
Maturity	✓ 50+ years of development and theory	~5 years of active research
Trust for Engineering	✓ High (certified, proven)	✗ Low (research stage, no guarantees)

The Bottom Line

FEM provides a mathematically rigorous path: variational formulation → discretization → guaranteed convergence. You know exactly what to expect and can prove your solution is correct.

PINNs offer conceptual elegance: skip the weak form, work directly with PDEs, and let the network learn the solution. But you're navigating in the dark—no convergence theory, no error bounds, just empirical observation and hope that gradient descent finds something reasonable.

For Mandel's problem specifically, the non-monotonic pressure response and disparate magnitude scales challenge PINNs significantly, requiring careful choice of activation functions (truncated sine) and optimizers (L-BFGS) with no theoretical guidance on why these choices work.