Select Publications

A. Kratsios, V. Debarnot, and I. Dokmanić: Small Transformers Compute Universal Metric Embeddings, JMLR - Journal of Machine Learning Research, 2023.

S. Hou, P. Kassraie, A. Kratsios, A. Krause, and J. Rothfus: Instance-dependent generalization bounds via optimal transport, JMLR - Journal of Machine Learning Research, 2023.

A. Kratsios and L. Papon: Universal Approximation Theorems for Differentiable Geometric Deep Learning, JMLR - Journal of Machine Learning Research, 2022.

Acciaio, B., Kratsios, A., and Pammer, G., Designing Universal Causal Deep Learning Models: The Geometric (Hyper) Transformer, Mathematical Finance - Special Issue on Machine Learning in Finance

A. Kratsios and C. Hyndman: NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation, JMLR - Journal of Machine Learning Research, 2021.

Expertise

Focus: Mathematical and statistical foundations of (geometric deep and operator) learning. Applications of Interest: Deep learning for optimal control, game theory, PDEs, and finance.

Our AI Theory Seminar

I host a organize and host a weekly AI theory seminar, powered by: the Vector Institute, exploring the most recent results in AI theory. Our seminar covers topics ranging from statistical and approximation-theoretic guarantees to the theoretical viability of deep learning solutions to problems in stochastic analysis and game theory.

Hosted on our Youtube Channel @DeepLearninarSeminarSeries

Publications and Preprints

2025

  • Saqur, R., Kratsios, A., Krach†, F., Limmer†, Y., Tian, J. J., Willes, J., Horvarth, B., and Rudzicz, F., Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models, International Conference on Learning (2025)
  • Borde, H., Kratsios, A., Law, M. T., Dong, X., and Bronstein, M., Neural Spacetimes for DAG Representation Learning, International Conference on Learning (2025)
  • Kratsios, A., Sáez de Ocáriz Borde, H., Furuya, T., and Law, M. T., Approximation Rates and VC-Dimension Bounds for (P)ReLU MLP Mixture of Experts, Transactions on Machine Learning Research (2025)
  • Kratsios, T., Furuya, Is In-Context Universality Enough? MLPs are Also Universal In-Context, ArXiV (2025)
  • Sung†, K.., Kratsios, A., Forman, N., Guiding Two-Layer Neural Networks toward Lipschitzness via Gradient Descent Learning Rate Constraints, ArXiV (2025)
  • Li, Y.., Sáez de Ocáriz Borde, H., Kratsios, A., McNicholas, P., Keep it Light! Simplifying Image Clustering via Text-Free Adapters, ArXiV (2025)

2024

  • Kratsios, A., Sáez de Ocáriz Borde, Neural Snowflakes: Universal Latent Graph Inference via Trainable Latent Geometries, NeurIPS (2024)
  • Cheng†, T. S., Lucchi, A., Kratsios, A., and Belius, D., A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression, NeurIPS (2024)
  • Kolesov, A., Mokrov, P., Udovichenko, I., Gazdieva, M., Pammer, G., A. Kratsios, Korotin, A., and Burnaev, E., Energy-Guided Continuous Entropic Barycenter Estimation for General Costs, NeurIPS (2024), Award: Spotlight
  • Cheng†, T. S., Lucchi, A., Kratsios, A., and Belius, D., Characterizing Overfitting in Kernel Ridgeless Regression through the Eigenspectrum, International Conference on Learning (2024)
  • Benitez†, J. A. L., Furuya, T., Faucher, F., Kratsios, A., Tricoche, X., and de Hoop, M. V., Out-of-distributional Risk Bounds for Neural Operators with Applications to the Helmholtz Equation. Journal of Computational Physics, 113168
  • Kratsios, A., Hong†, R., and de Ocáriz Borde, H., Capacity Bounds for Hyperbolic Neural Network Representations of Latent Tree Structures. Neural Networks, 106420
  • G. A. Alvarez, I. Ekren, A. Kratsios, X. Yang, Neural Operators Can Play Dynamic Stackelberg Games, ArXiV: 2411.09644
  • Borde, H., Lukoianov, A., Kratsios, A., Law, M. T., Dong, X., and Bronstein, M., Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning, ArXiV: 2408.13885
  • Persiianov, M., Asadulaev, A., Andreev, N., Starodubcev, N., Baranchuk, D., Kratsios, A., Burnaev, E. and Korotin, A., Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization, ArXiV: 2410.02628
  • Hong†, R., and Kratsios, A., Bridging the Gap Between Approximation and Learning via Optimal Approximation by ReLU MLPs of Maximal Regularity, ArXiV: 2409.12335
  • Arabpour†, R., Armstrong, J., Galimberti, L., Kratsios, A., and Livieri, G., Low-dimensional Approximations of the Conditional Law of Volterra Processes: A Non-positive Curvature Approach, ArXiV: 2405.20094
  • Limmer†, Y., Kratsios, A., Yang†, X., Saqur, R., and Horvath, B., Reality Only Happens Once: Single-Path Generalization Bounds for Transformers, ArXiV: 2405.16563
  • Kratsios, A., Furuya, T., Lara Benitez†, J.A., Lassas, M., and de Hoop, M., Mixture of Experts Soften the Curse of Dimensionality in Operator Learning, ArXiV: 2404.09101
  • Kratsios, A., Neuman, A. M., and Pammer, G., Tighter Generalization Bounds on Digital Computers via Discrete Optimal Transport, ArXiV: 2402.05576

2023

  • Kassraie, P., Hou, S., Kratsios, A., Krause, A., and Rothfuss, J., Instance-Dependent Generalization Bounds via Optimal Transport. Journal of Machine Learning Research, 24, 349
  • Cheng, T. S., Kratsios, A., Lucchi, A., Dokmanic, I., and Belius, D., A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression. Advances in Neural Information Processing Systems, 36, 4767-4798
  • Kratsios, A., Debarnot, V., and Dokmanić I., Small Transformers Compute Universal Metric Embeddings. Journal of Machine Learning Research, 24(170), 1-48
  • Acciaio, B., Kratsios, A., and Pammer, G., Designing Universal Causal Deep Learning Models: The Geometric (Hyper) Transformer, Mathematical Finance - Special Issue on Machine Learning in Finance
  • Kratsios, A., Universal Regular Conditional Distributions via Probabilistic Transformers. Constructive Approximation, 57(3), 1145-1212
  • Herrera, C., Krach, F., Kratsios, A., Ruyssen, P., and Teichmann, J., Denise: Deep Robust Principal Component Analysis for Positive Semidefinite Matrices. Transactions on Machine Learning Research
  • Hovart, B., Kratsios, A., Limmer†, Y., and Yang†, X., Deep Kalman Filters Can Filter. ArXiV: 2310.19603
  • Kratsios, A., Liu, C., Lassas, M., de Hoop, M. V., and Dokmanić, I., An Approximation Theory for Metric Space-Valued Functions With A View Towards Deep Learning, ArXiV: 2304.12231
  • Yang†, X., Kratsios, A., Krach, F., Grasselli, M., and Lucchi, A., Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing, ArXiV: 2309.04557

2022

  • Papon†, L., and Kratsios, A., Universal Approximation Theorems for Differentiable Geometric Deep Learning. Journal of Machine Learning Research, 23(196), 1-73
  • Kratsios, A., Zamanlooy†, B., Dokmanic, I., and Liu†, T., Universal Approximation under Constraints Is Possible with Transformers. In ICLR, International Conference on Learning Representations., Award: Spotlight
  • Zamanlooy†, B., and Kratsios, A., Do ReLU Networks Have an Edge When Approximating Compactly-Supported Functions? Transactions on Machine Learning Research
  • Zamanlooy†, B., and Kratsios, A., Learning Sub-patterns in Piecewise Continuous Functions. Neurocomputing, 480, 192-211
  • Galimberti, L., Kratsios, A., and Livieri, G., Designing Universal Causal Deep Learning Models: The Case of Infinite-Dimensional Dynamical Systems from Stochastic Analysis. ArXiV: 2210.13300

2021

  • Hyndman, C., and Kratsios, A., NEU: A Meta-algorithm for Universal UAP-invariant Feature Representation. Journal of Machine Learning Research, 22(92), 1-51
  • Casgrain, P., and Kratsios, A., Optimizing Optimizers: Regret-optimal Gradient Descent Algorithms. In Conference on Learning Theory (pp. 883-926). PMLR
  • Kratsios, A., Lower-estimates on the Hochschild (Co

Who we are

Past and present group members.

Deep Learning Seminar 2025 - Talks

  • Feb 7 - Eric Moulines (Learning Theory) - Rates of convergence for density estimation with generative adversarial networks
  • Feb 14 - Saber Jararpour (Deep Learning Theory) - Robust Implicit Networks via Non-Euclidean Constraints
  • Feb 28 - Haotian Jian (Deep Learning Theory) - Approximation Rate of the Transformer Architecture for Sequence Modeling
  • March 7 - Christopher Salvi (Geometric Deep Learning) - Scaling Limits of GNNs
  • March 28 - Florian Rossmannek - TBD
  • Arpil 11 - Mary Letey - (Learning Theory) - Asymptotic theory of in-context learning by linear attention
  • April 18 - Michael Choi - (Information Geometry and Optimization) - Covering Numbers for Deep ReLU Networks with Applications to Function Approximation and Nonparametric Regression
  • April 25 - Weigutian Ou - (Approximation Theory) - Covering Numbers for Deep ReLU Networks with Applications to Function Approximation and Nonparametric Regression
  • May 2 - Haizhao Yang - (Learning Theory) - Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces
  • May 23 - Giovanni Ballarin - (Approximation Theory) - Memory of Recurrent Networks: Do We Compute It Right?
  • May 23 - Anotnio Lara - (Operator Theory) - TBD