Leveraging the Invariance Principle for Out-of-Distribution Generalization


Karthikeyan Shanmugan


IBM Research AI
T.J. Watson Center, NY.


Monday, 28 June 2021, 09:00 to 10:00


One of the fundamental issues facing deployment of supervised learning models in real life applications is the issue of out-of-distribution (OOD) generalization. Models trained using the standard Empirical Risk Minimization (ERM) on multiple training data sources suffer from fitting to spurious features that correlate with label which does not hold in unseen test environments. ERM’s sole focus on optimizing average risk contributes to this problem. Invariance Principle, in Pearlian Causal Models, has long been used to infer causal relationships from interventional data.

Invariant Risk Minimization (IRM) is a recent paradigm that proposes to leverage the invariance principle in an optimization framework for OOD problems. This paradigm views different training distributions and the unseen test as intervened versions of a common but unknown causal model. IRM seeks to identify that transformation of data such that the classifier trained on top of it is invariant across training domains apart from optimizing risk. Due to a challenging bilevel optimization, a previous proposal was limited to handling linear classifiers. We propose a novel game theoretic learning paradigm – called Ensemble Invariant Risk Minimization (EIRM Game) whose Nash Equilibria is provably equivalent to invariant solutions for a very general class of non-linear classifiers and transformations. For least squares regression under unobserved confounding, with a modified game we provide the first convergence guarantees, known for this problem in any setting, to approximate invariant solutions (this part may be discussed if time permits).

Bio: Karthikeyan Shanmugam is a Research Staff Member with the IBM Research AI group in NY in the Trustworthy AI Department since 2017. Previously, he was a Herman Goldstine Postdoctoral Fellow in the Mathematical Sciences Division at IBM Research, NY. He obtained his Ph.D. in Electrical and Computer Engineering from UT Austin in 2016, MS degree in Electrical Engineering from USC in 2012 and B.Tech, M.Tech degrees in Electrical Engineering from IIT Madras in 2010.
His research interests broadly lie in Statistical Machine Learning (ML), Optimization, Graph Algorithms, and Information Theory. In ML, his focus is on causal inference, online learning, transfer learning and explainable ML. He has won several awards in IBM for his contributions to explainable AI and Causal Inference including the Corporate Technical Award in 2021, the highest technical award in IBM. His works have appeared regularly in top AI/ML venues like NeurIPS, ICML, AISTATS and ICLR.