Two Paradigms for Solving PDEs

Variational Formulation (FEM) vs Physics-Informed Neural Networks (PINNs)

The Fundamental Question

How do we solve partial differential equations numerically? For decades, the answer has been the variational formulation leading to finite element methods—a mathematically rigorous approach with proven convergence guarantees. Recently, physics-informed neural networks have emerged, bypassing variational principles entirely and working directly with the strong form. This comparison explores both approaches through the lens of Mandel's problem in poromechanics.

Variational Formulation

Classical Approach • Weak Form • FEM

1

Start with Strong Form (PDE)

Begin with the governing equations in their original form. For Mandel's problem (coupled poromechanics):

\[ \begin{aligned} \nabla \cdot \boldsymbol{\sigma} + \mathbf{f} &= 0 \\ \frac{\partial}{\partial t}\left(\frac{1}{M}p + \alpha\nabla \cdot \mathbf{u}\right) - \kappa\nabla^2 p &= 0 \end{aligned} \]
2

Derive Weak Form

Multiply by test functions and integrate by parts. This reduces smoothness requirements and naturally incorporates boundary conditions.

\[ \begin{aligned} &\text{Find } \mathbf{u} \in V, p \in Q \text{ such that } \forall \mathbf{w} \in V, q \in Q: \\ &\int_\Omega \boldsymbol{\sigma}(\mathbf{u}) : \nabla \mathbf{w} \, d\Omega = \int_{\partial\Omega} \mathbf{t} \cdot \mathbf{w} \, d\Gamma \\ &\int_\Omega \left[\frac{1}{M}\frac{\partial p}{\partial t}q + \kappa \nabla p \cdot \nabla q + \alpha \frac{\partial}{\partial t}(\nabla \cdot \mathbf{u})q\right] d\Omega = 0 \end{aligned} \]
Key: Integration by parts transfers derivatives from solution to test function
3

Choose Finite Element Space

Select finite-dimensional subspaces with prescribed basis functions (shape functions). Common choices: linear, quadratic, or higher-order polynomials.

\[ \begin{aligned} V_h &\subset V, \quad \dim(V_h) = N \\ \mathbf{u}_h(x) &= \sum_{i=1}^{N} \mathbf{u}_i \phi_i(x) \\ p_h(x) &= \sum_{j=1}^{M} p_j \psi_j(x) \end{aligned} \]

where \(\phi_i\) and \(\psi_j\) are pre-defined polynomial basis functions with compact support.

4

Discretize: Galerkin Method

Substitute finite element approximations into weak form. This converts the infinite-dimensional problem into a finite system.

\[ \begin{aligned} a(\mathbf{u}_h, \mathbf{w}_h) &= L(\mathbf{w}_h) \quad \forall \mathbf{w}_h \in V_h \\ \Rightarrow \mathbf{K}\mathbf{u} &= \mathbf{f} \end{aligned} \]

where stiffness matrix: \(K_{ij} = \int_\Omega \nabla \phi_j : \mathbb{C} : \nabla \phi_i \, d\Omega\)

5

Assembly and Integration

Compute integrals element-by-element using numerical quadrature (Gaussian integration), then assemble global system.

for each element K: compute K_local via quadrature for i,j in element nodes: K_global[i,j] += K_local[i,j]
Requires mesh generation and element connectivity
6

Solve Linear/Nonlinear System

Apply boundary conditions and solve using direct methods (LU decomposition) or iterative solvers (CG, GMRES, multigrid).

\[ \begin{aligned} \mathbf{K}\mathbf{u} &= \mathbf{f} \\ \text{or nonlinear: } \mathbf{R}(\mathbf{u}) &= 0 \text{ (Newton-Raphson)} \end{aligned} \]
7

Error Estimation & Refinement

Compute a posteriori error estimates and adaptively refine mesh where error is large.

\[ \|u - u_h\|_{H^1} \leq Ch^k |u|_{H^{k+1}} \]
Guaranteed convergence rates with mesh refinement!

Physics-Informed Neural Network

Modern Approach • Strong Form • Deep Learning

1

Start with Strong Form (PDE)

Same starting point! Begin directly with the differential equations. No need to derive a weak form.

\[ \begin{aligned} \nabla \cdot \boldsymbol{\sigma} + \mathbf{f} &= 0 \\ \frac{\partial}{\partial t}\left(\frac{1}{M}p + \alpha\nabla \cdot \mathbf{u}\right) - \kappa\nabla^2 p &= 0 \end{aligned} \]
2

Skip Weak Form Entirely!

PINNs work directly with the strong form. No integration by parts, no test functions, no variational principles needed.

This is the key difference: bypass all the variational machinery

Instead, we'll enforce the PDEs at specific collocation points using automatic differentiation.

3

Define Neural Network Ansatz

Represent the solution as a neural network. The network parameters (weights and biases) are the unknowns to optimize.

\[ \begin{aligned} \hat{\mathbf{u}}(x,y,t) &= \mathcal{N}_u(x,y,t; \mathbf{W}_u, \mathbf{b}_u) \\ \hat{p}(x,y,t) &= \mathcal{N}_p(x,y,t; \mathbf{W}_p, \mathbf{b}_p) \end{aligned} \]

Each network: \(\mathbf{o}^{(l)} = \varphi(\mathbf{W}^{(l)}\mathbf{o}^{(l-1)} + \mathbf{b}^{(l)})\)

4

Sample Collocation Points

No mesh needed! Randomly sample points in the domain using Latin hypercube or uniform sampling.

N_interior = 15000 # Interior points N_boundary = 500 # Per boundary N_initial = 200 # Initial condition X_interior = latin_hypercube_sample(N_interior) X_boundary = sample_boundaries(N_boundary)
Meshless approach: no connectivity, no element assembly
5

Formulate Loss Function

The objective directly enforces PDE residuals at collocation points, plus boundary/initial conditions.

\[ \begin{aligned} \mathcal{L} &= \underbrace{\frac{1}{N_f}\sum_{i=1}^{N_f} \|\Pi(\hat{\mathbf{u}}, \hat{p})\|^2}_{\text{PDE residuals}} \\ &\quad + \underbrace{\frac{1}{N_b}\sum_{j=1}^{N_b} \|BC_j\|^2}_{\text{Boundary conditions}} \\ &\quad + \underbrace{\frac{1}{N_0}\sum_{k=1}^{N_0} \|IC_k\|^2}_{\text{Initial conditions}} \end{aligned} \]

where \(\Pi\) represents the PDE operators (computed via automatic differentiation).

6

Automatic Differentiation

Compute derivatives of neural network outputs with respect to inputs automatically using the chain rule through the computational graph.

\[ \frac{\partial \hat{p}}{\partial x} = \frac{\partial \mathcal{N}_p}{\partial x}, \quad \frac{\partial^2 \hat{p}}{\partial x^2} = \frac{\partial^2 \mathcal{N}_p}{\partial x^2} \]
import tensorflow as tf with tf.GradientTape() as tape: p_pred = neural_net(X) dp_dx = tape.gradient(p_pred, X[:, 0])
7

Optimize via Gradient Descent

Minimize loss function using Adam, L-BFGS, or other optimizers. Update network weights iteratively.

\[ \mathbf{W}^{k+1} = \mathbf{W}^k - \eta \nabla_{\mathbf{W}}\mathcal{L}(\mathbf{W}^k) \]
No convergence guarantees—success depends on optimization landscape and hyperparameters

Key Differences Summary

Aspect Variational Formulation (FEM) Physics-Informed Neural Networks
Starting Point Strong form → Weak form (variational) Strong form directly (no weak form)
Mathematical Tool Integration by parts, test functions Automatic differentiation
Basis Functions Pre-defined (polynomials, FE shape functions) Learned (neural network discovers representation)
Discretization Mesh required (elements, nodes, connectivity) Meshless (random collocation points)
Integration Numerical quadrature (Gaussian integration) Point evaluation (no integration needed)
System Type Linear/nonlinear algebraic system: \(\mathbf{K}\mathbf{u} = \mathbf{f}\) Non-convex optimization problem: \(\min \mathcal{L}(\mathbf{W})\)
Solver Direct (LU) or iterative (CG, GMRES) Gradient descent (Adam, L-BFGS)
Convergence Theory ✓ Well-established (Céa's lemma, error estimates) ✗ Largely unknown (empirical, no guarantees)
Error Estimates ✓ A priori & a posteriori available ✗ No computable bounds
Convergence Rate ✓ Known: \(\mathcal{O}(h^k)\) for degree \(k\) ✗ Unknown (depends on architecture, optimizer)
Hyperparameters Mesh size \(h\), polynomial degree \(k\) Layers, neurons, learning rate, activation, batch size...
Complexity for User Moderate (requires mesh generation expertise) High (requires ML expertise, hyperparameter tuning)
Computational Cost Predictable (depends on DOF count) Variable (depends on convergence, architecture)
Maturity ✓ 50+ years of development and theory ~5 years of active research
Trust for Engineering ✓ High (certified, proven) ✗ Low (research stage, no guarantees)

The Bottom Line

FEM provides a mathematically rigorous path: variational formulation → discretization → guaranteed convergence. You know exactly what to expect and can prove your solution is correct.

PINNs offer conceptual elegance: skip the weak form, work directly with PDEs, and let the network learn the solution. But you're navigating in the dark—no convergence theory, no error bounds, just empirical observation and hope that gradient descent finds something reasonable.

For Mandel's problem specifically, the non-monotonic pressure response and disparate magnitude scales challenge PINNs significantly, requiring careful choice of activation functions (truncated sine) and optimizers (L-BFGS) with no theoretical guidance on why these choices work.