State-of-the-Art Digest

Following the tradition from previous years, we interviewed a cohort of distinguished and prolific academic and industrial experts in an attempt to summarise the highlights of the past year and predict what is in store for 2024. Past 2023 was so ripe with results that we had to break this post into two parts. This is Part II focusing on applications, see also Part I for theory & new architectures.

The post is written and edited by Michael Galkin and Michael Bronstein with significant contributions from Dominique Beaini, Nathan Benaich, Joey Bose, Johannes Brandstetter, Bruno Correia, Ahmed Elhag, Kexin Huang, Chaitanya Joshi, Leon Klein, N M Anoop Krishnan, Chen Lin, Andreas Loukas, Santiago Miret, Luca Naef, Liudmila Prokhorenkova, Emanuele Rossi, Hannes Stärk, Alex Tong, Anton Tsitsulin, Petar Veličković, Minkai Xu, and Zhaocheng Zhu.

None
Geometric ML methods and applications filled the covers of high-profile journals in 2023 (Figure sources: the papers by Wang et al., Viñas et al., Deng et al., Weiss et al., Lagemann et al., Duan et al., and Lam et al.)
  1. Structural Biology (Molecules & Proteins) a. A Structural Biologist's Perspective b. Industrial Perspective c. Systems Biology
  2. Materials Science (Crystals)
  3. Molecular Dynamics & ML Potentials
  4. Geometric Generative Models (Manifolds)
  5. BIG Graphs, Scalability: When GNNs are too expensive
  6. Algorithmic Reasoning & Alignment
  7. Knowledge Graphs: Inductive Reasoning is Solved?
  8. Temporal Graph Learning
  9. LLMs + Graphs for Scientific Discovery
  10. Cool GNN Applications
  11. Geometric Wall Street Bulletin 💸

The legend we will be using throughout the text: 🔥 hot topics 💡 year's highlight 🏋️ challenges ➡️ current/next developments 🔮 predictions/speculations 💰 financial transactions

Structural Biology (Molecules & Proteins)

Dominique Beaini (Valence), Joey Bose (Mila & Dreamfold), Michael Bronstein (Oxford), Bruno Correia (EPFL), Michael Galkin (Intel), Kexin Huang (Stanford), Chaitanya Joshi (Cambridge), Andreas Loukas (Genentech), Luca Naef (VantAI), Hannes Stärk (MIT), Minkai Xu (Stanford)

Structural biology was definitely at the forefront of Geometric Deep Learning in 2023.

Following the 2020 discovery of halicin as a potential new antibiotic, in 2023, two new antibiotics were discovered with the help of GNNs! First, it is abaucin (by McMaster and MIT), which targets a stubborn pathogen resistant to many drugs. Second, MIT and Harvard researchers discovered a new structural class of antibiotics where the screening process was supported by ChemProp, a suite of GNNs for molecular property prediction. We also observe a convergence of ML and experimental techniques ("lab-in the-loop") in the recent work on autonomous molecular discovery (a trend we will also see in the Materials Design in the following sections).

Flow Matching has been one of the biggest generative ML trends of 2023, allowing for faster sampling and deterministic sampling trajectories compared to diffusion models. The most prominent examples of Flow Matching models we have seen in the biological applications are FoldFlow (Bose, Akhound-Sadegh, et al.) for protein backbone generation, FlowSite (Stärk et al.) for protein binding site design, and EquiFM (Song, Gong, et al.) for molecule generation.

None
Conditional probability paths learned by different versions of FoldFlow, visualizing the rotation trajectory of a single residue by the action of SO(3) on its homogeneous space 𝕊². Figure source: Bose, Akhound-Sadegh, et al.

Efficient Flow Matching on complex geometries with necessary equivariances became possible thanks to a handful of theory papers including Riemannian Flow Matching (Chen and Lipman), Minibatch Optimal Transport (Tong et al), and Simulation-Free Schrödinger bridges (Tong, Malkin, Fatras, et al). A great resource to learn Flow Matching with code examples and notebooks is the TorchCFM repo on GitHub as well as talks by Yaron Lipman, Joey Bose, Hannes Stärk, and Alex Tong.

Diffusion models nevertheless continue to be the main workhorse of generative modeling in structural biology. In 2023, we saw several landmark works: FrameDiff (Yim, Trippe, De Bortoli, Mathieu, et al) for protein backbone generation, EvoDiff (Alamdari et al) for generating protein sequences with discrete diffusion, AbDiffuser (Martinkus et al) for full-atom antibody design with frame averaging and discrete diffusion (and with successful wet lab experiments), DiffMaSIF (Sverrison, Akdel, et al) and DiffDock-PP (Ketata, Laue, Mammadov, Stärk, et al) for protein-protein docking, DiffPack (Zhang, Zhang, et al) for side-chain packing, and the Baker Lab published the RFDiffusion all-atom version (Krishna, Wang, Ahern, et al). Among latent diffusion model (like Stable Diffusion in image generation applications), GeoLDM (Xu et al) was the first for 3D molecule conformations, followed by OmniProt for protein sequence-structure generation.

None
FrameDiff: parameterization of the backbone frame with rotation, translation, and torsion angle for the oxygen atom. Figure Source: Yim, Trippe, De Bortoli, Mathieu, et al

Finally, Google DeepMind and Isomorphic Labs announced AlphaFold 2.3 — the latest iteration is significantly improving upon the baselines in 3 tasks: docking benchmarks (almost 2× better than DiffDock on the new PoseBusters benchmark), protein-nucleic acid interactions, and antibody-antigen prediction.

Chaitanya Joshi (Cambridge)

💡There have been two emerging trends for biomolecular modeling and design that I am very excited about in 2023:

1️⃣ Going from protein structure prediction to conformational ensemble generation. There were several interesting approaches to the problem, including AlphaFold with MSA clustering, idpGAN, Distributional Graphormer (a diffusion model), and AlphaFold Meets Flow Matching for Generating Protein Ensembles.

2️⃣ Modelling of biomolecular complexes and design of biomolecular interactions among proteins + X: RFdiffusion all-atom and Ligand MPNN, both from the Baker Lab, are representative examples of the trend towards designing interactions. The new in-development AlphaFold report claims that a unified structure prediction model can outperform or match specialised models across solo protein and protein complex structure prediction as well as protein-ligand and protein-nucleic acid co-folding.

"However, for all the exciting methodology development in biomolecular modelling and design, perhaps the biggest lesson for the ML community this year should be to focus more on meaningful in-silico evaluation and, if possible, experimental validation." — Chaitanya Joshi (Cambridge)

1️⃣ In early 2023, Guolin Ke's team at DP Technology released two excellent re-evaluation papers highlighting how we may have been largely overestimating the performance of prominent geometric deep learning-based methods for molecular conformation generation and docking w.r.t. traditional baselines.

2️⃣ PoseCheck and PoseBusters shed further light on the failure modes of current molecular generation and docking methods. Critically, generated molecules and their 3D poses are often 'nonphysical' and contain steric clashes, hydrogen placement issues, and high strain energies.

3️⃣ Very few papers attempt any experimental validation of new ML ideas. Perhaps collaborating with a wet lab is challenging for those focussed on new methodology development, but I hope that us ML-ers, as a community, will at least be a lot more cautious about the in-silico evaluation metrics we are constantly pushing as we create new models.

Hannes Stärk (MIT)

💡I am reading quite some hype here about Flow Matching, stochastic interpolants, and Rectified Flows (I will call them "Bridge Matching," or "BM"). I do not think there is much value in just replacing diffusion models with BM in all the existing applications. For pure generative modeling, the main BM advantage is simplicity.

I think we should instead be excited about BM for the new capabilities it unlocks. For example, training bridges between arbitrary distributions in a simulation-free manner (what are the best applications for this? I basically only saw retrosynthesis so far.) or solving OT problems as in DSBM that does so for fluid flow downscaling. Maybe a lot of tools emerged in 2023 (also let us mention BM with multiple marginals), and in 2024, the community will make good use of them?

Joey Bose (Mila & Dreamfold)

💡 This year we have really seen the rise of geometric generative models from theory to practice. A few standouts for me include Riemannian Flow Matching — in general any paper by Ricky Chen and Yaron Lipman on these topics is a must-read — and FrameDiff from Yim et. al which introduced a lot of the important machinery for protein backbone generation. Of course, standing on the shoulders of both RFM and FrameDIff, we built FoldFlow, a cooler flow-matching approach to protein generative models.

"Looking ahead, I foresee a lot more flow matching-based approaches coming into use. They are better for proteins and longer sequences and can start from any source distribution." — Joey Bose (Mila & Dreamfold)

🔮 Moreover, I suspect we will soon see multi-modal generative models in this space, such as discrete + continuous models and also conditional models in the same vein as text-conditioned diffusion models for images. Perhaps, we might even see latent generative models here given that they scale so well!

Minkai Xu (Stanford)

"This year, the community has further pushed forward the geometric generative models for 3D molecular generation in many perspectives." — Minkai Xu (Stanford)

Flow matching: Ricky and Yaron proposed the Flow Matching method as an alternative to the widely used diffusion models, and EquiFM (Song et al and Klein et al) realize the variant for 3D molecule generation by parameterizing the flow dynamics with equivariant GNNs. In the meantime, FrameFlow and FoldFlow construct FM models for protein generation.

🔮Moving forward similar to vision and text domain, people begin to explore generation in the lower-dimensional latent space instead of the complex original data space (latent generative models). GeoLDM (Xu et al) proposed the first latent diffusion model (like Stable Diffusion in CV) for 3D molecule generation, while Fu et al enjoys similar modeling formulation for large protein generation.

A Structural Biologist's Perspective

Bruno Correia (EPFL)

"Current generative models still create "garbage" outputs that violate many of the physical and chemical properties that molecules are known to have. The advantage of current generative models is, of course, their speed which affords them the possibility of generating many samples, which then brings to front and center the ability to filter the best generated samples, which in the case of protein design has benefited immensely from the transformative development of AlphaFold2." — Bruno Correia (EPFL)

➡️ The next challenge to the community will perhaps be how to infuse generative models with meaningful physical and chemical priors to enhance sampling performance and generalization. Interestingly, we have not seen the same remarkable advances (experimentally validated) in applications to small molecule design, which we hope to see during 2024.

➡️ The rise of multimodal models. Generally in biological-related tasks data sparsity is a given and as such strategies to extract the most signal out of the data are essential. One way to try to overcome such limitations is to improve the expressiveness of the data representations and maybe this way obtain more performant neural networks. Likely in the short term, we will be able to explore architectures that encompass several types of representations of the objects of interest and harness the best predictions for the evermore complex tasks we are facing as progressively more of the basic problems get solved. This notion of multimodality is of course intimately related to the overall aim of having models with stronger priors, that in a generative context, honour fundamental constraints of the objects of interest.

➡️ The models that know everything. As the power of machine learning models improves we clearly tend to request a more multi-objective optimization when it comes to attempting to solve real life problems. Taking as an example small molecule generation, thinking from a biochemical perspective the drug design problem starts by having a target to which a small molecule binds, therefore one of the first and most important constraints is that the generative process ought to be conditioned to the protein pocket. However, such a constraint may not be enough to create real small molecules as many of such chemicals are simply impossible or very hard to synthesize, and, therefore, a model that has notions of chemical synthesizability and can integrate such constraints in the search space would be much more useful.

➡️ From chemotype to phenotype. On the grounds of data representation, atomic graph structures together with vector embeddings have reached remarkable results, particularly in the search for new antibiotics. Making accurate predictions of which chemical structures have antimicrobial activity, broadly speaking, is an exercise of phenotype prediction from chemical structure. Due to the simplicity of the approaches used and the impressive results obtained, one would expect that more sophisticated data representations on the molecule end and perhaps together also with richer phenotype assignment could give critical contributions to such an important problem in drug development.

Industrial perspective

Luca Naef (VantAI)

🔥What are the biggest advancements in the field you noticed in 2023?

1️⃣ Increasing multi-modality & modularity — as shown by the emergence of initial co-folding methods for both proteins & small molecules, diffusion and non-diffusion-based, to extend on AF2 success: DiffusionProteinLigand in the last days of 2022 and RFDiffusion, AlphaFold2 and Umol by end of 2023. We are also seeing models that have sequence & structure co-trained: SAProt, ProstT5, and sequence, structure & surface co-trained with ProteinINR. There is a general revival of surface-based methods after a quieter 2021 and 2022: DiffMasif, SurfDock, and ShapeProt.

2️⃣ Datasets and benchmarks. Datasets, especially synthetic/computationally derived: ATLAS and the MDDB for protein dynamics. MISATO, SPICE, Splinter for protein-ligand complexes, QM1B for molecular properties. PINDER: large protein-protein docking dataset with matched apo/predicted pairs and benchmark suite with retrained docking models. CryoET data portal for CryoET. And a whole host of welcome benchmarks: PINDER, PoseBusters, and PoseCheck, with a focus on more rigorous and practically relevant settings.

3️⃣ Creative pre-training strategies to get around the sparsity of diverse protein-ligand complexes. Van-der-mers training (DockGen) & sidechain training strategies in RF-AA and pre-training on ligand-only complexes in CCD in RF-AA. Multi-task pre-training Unimol and others.

🏋️ What are the open challenges that researchers might overlook?

1️⃣ Generalization. DockGen showed that current state-of-the-art protein-ligand docking models completely lose predictability when asked to generalise towards novel protein domains. We see a similar phenomenon in the AlphaFold-lastest report, where performance on novel proteins & ligands drops heavily to below biophysics-based baselines (which have access to holo structures), despite very generous definitions of novel protein & ligand. This indicates that existing approaches might still largely rely on memorization, an observation that has been extensively argued over the years

2️⃣ The curse of (simple) baselines. A recurring topic over the years, 2023 has again shown what industry practitioners have long known: in many practical problems such as molecular generation, property prediction, docking, and conformer prediction, simple baselines or classical approaches often still outperform ML-based approaches in practice. This has been documented increasingly in 2023 by Tripp et al., Yu et al., Zhou et al.

🔮 Predictions for 2024!

"In 2024, data sparsity will remain top of mind and we will see a lot of smart ways to use models to generate synthetic training data. Self-distillation in AlphaFold2 served as a big inspiration, Confidence Bootstrapping in DockGen, leveraging the insight that we now have sufficiently powerful models that can score poses but not always generate them, first realised in 2022." — Luca Naef (VantAI)

2️⃣ We will see more biological/chemical assays purpose-built for ML or only making sense in a machine learning context (i.e., might not lead to biological insight by themselves but be primarily useful for training models). An example from 2023 is the large-scale protein folding experiments by Tsuboyama et al. This move might be driven by techbio startups, where we have seen the first foundation models built on such ML-purpose-built assays for structural biology with e.g. ATOM-1.

Andreas Loukas (Prescient Design, part of Genentech)

🔥 What are the biggest advancements in the field you noticed in 2023?

"In 2023, we started to see some of the challenges of equivariant generation and representation for proteins to be resolved through diffusion models." — Andreas Loukas (Prescient Design)

1️⃣ We also noticed a shift towards approaches that model and generate molecular systems at higher fidelity. For instance, the most recent models adopt a fully end-to-end approach by generating backbone, sequence and side-chains jointly (AbDiffuser, dyMEAN) or at least solve the problem in two steps but with a partially joint model (Chroma); as compared to backbone generation followed by inverse folding as in RFDiffusion and FrameDiff. Other attempts to improve the modelling fidelity can be found in the latest updates to co-folding tools like AlphaFold2 and RFDiffusion which render them sensitive to non-protein components (ligands, prosthetic groups, cofactors); as well as in papers that attempt to account for conformational dynamics (see discussion above). In my view, this line of work is essential because the binding behaviour of molecular systems can be very sensitive to how atoms are placed, move, and interact.

2️⃣ In 2023, many works also attempted to get a handle on binding affinity by learning to predict the effect of mutations of a known crystal by pre-training on large corpora, such as computationally predicted mutations (graphinity), and on side-tasks, such as rotamer density estimation. The obtained results are encouraging as they can significantly outperform semi-empirical baselines like Rosetta and FoldX. However, there is still significant work to be done to render these models reliable for binding affinity prediction.

3️⃣ I have further observed a growing recognition of protein Language Models (pLMs) and specifically ESM as valuable tools, even among those who primarily favour geometric deep learning. These embeddings are used to help docking models, allow the construction of simple yet competitive predictive models for binding affinity prediction (Li et al 2023), and can generally offer an efficient method to create residue representations for GNNs that are informed by the extensive proteome data without the need for extensive pretraining (Jamasb et al 2023). However, I do maintain a concern regarding the use of pLMs: it is unclear whether their effectiveness is due to data leakage or genuine generalisation. This is particularly pertinent when evaluating models on tasks like amino-acid recovery in inverse folding and conditional CDR design, where distinguishing between these two factors is crucial.

🏋️ What are the open challenges that researchers might overlook?

1️⃣ Working with energetically relaxed crystal structures (and, even worse, folded structures) can significantly affect the performance of downstream predictive models. This is especially true for the prediction of protein-protein interactions (PPIs). In my experience, the performance of PPI predictors severely deteriorates when they are given a relaxed structure as opposed to the binding (holo) crystalised structure.

2️⃣ Though successful in silico antibody design has the capacity to revolutionise drug design, general protein models are not (yet?) as good at folding, docking or generating antibodies as antibody-specific models are. This is perhaps due to the low conformational variability of the antibody fold and the distinct binding mode between antibodies and antigens (loop-mediated interactions that can involve a non-negligible entropic component). Perhaps for the same reasons, the de novo design of antibody binders (that I define as 0-shot generation of an antibody that binds to a previously unseen epitope) remains an open problem. Currently, experimentally confirmed cases of de novo binders involve mostly stable proteins, like alpha-helical bundles, that are common in the PDB and harbour interfaces that differ substantially from epitope-paratope interactions.

3️⃣ We are still lacking a general-purpose proxy for binding free energy. The main issue here is the lack of high-quality data of sufficient size and diversity (esp. co-crystal structures). We should therefore be cognizant of the limitations of any such learned proxy for any model evaluation: though predicted binding scores that are out of distribution of known binders is a clear signal that something is off, we should avoid the typical pitfall of trying to demonstrate the superiority of our model in an empirical evaluation by showing how it leads to even higher scores.

Dominique Beaini (Valence Labs, part of Recursion)

"I'm excited to see a very large community being built around the problem of drug discovery, and I feel we are on the brink of a new revolution in the speed and efficiency of discovering drugs." — Dominique Beaini (Valence Labs)

What work got me excited in 2023?

I am confident that machine learning will allow us to tackle rare diseases quickly, stop the next COVID-X pandemic before it can spread, and live longer and healthier. But there's a lot of work to be done and there are a lot of challenges ahead, some bumps in the road, and some canyons on the way. Speaking of communities, you can visit the Valence Portal to keep up-to-date with the 🔥 new in ML for drug discovery.

What are the hard questions for 2024?

⚛️ A new generation of quantum mechanics. Machine learning force-fields, often based on equivariant and invariant GNNs, have been promising us a treasure. The treasure of the precision of density functional theory, but thousands of times faster and at the scale of entire proteins. Although some steps were made in this direction with Allegro and MACE-MP, current models do not generalize well to unseen settings and very large molecules, and they are still too slow to be applicable on the timescale that is needed 🐢. For the generalization, I believe that bigger and more diverse datasets are the most important stepping stones. For the computation time, I believe we will see models that are less enforcing of the equivariance, such as FAENet. But efficient sampling methods will play a bigger role: spatial-sampling such as using DiffDock to get more interesting starting points and time-sampling such as TimeWarp to avoid simulating every frame. I'm really excited by the big STEBS 👣 awaiting us in 2024: Spatio-temporal equivariant Boltzmann samplers.

🕸️ Everything is connected. Biology is inherently multimodal 🙋🐁 🧫🧬🧪. One cannot simply decouple the molecule from the rest of the biological system. Of course, that's how ML for drug discovery was done in the past: simply build a model of the molecular graph and fit it to experimental data. But we have reached a critical point 🛑, no matter how many trillion parameters are in the GNN model is, and how much data are used to train it, and how many experts are mixtured together. It is time to bring biology into the mix, and the most straightforward way is with multi-modal models. One method is to condition the output of the GNNs with the target protein sequences such as MocFormer. Another is to use microscopy images or transcriptomics to better inform the model of the biological signature of molecules such as TranSiGen. Yet another is to use LLMs to embed contextual information about the tasks such as TwinBooster. Or even better, combining all of these together 🤯, but this could take years. The main issue for the broader community seems to be the availability of large amounts of quality and standardized data, but fortunately, this is not an issue for Valence.

🔬 Relating biological knowledge and observables. Humans have been trying to map biology for a long time, building relational maps for genes 🧬, protein-protein interactions 🔄, metabolic pathways 🔀, etc. I invite you to read this review of knowledge graphs for drug discovery. But all this knowledge often sits unused and ignored by the ML community. I feel that this is an area where GNNs for knowledge graphs could prove very useful, especially in 2024, and it could provide another modality for the 🕸️ point above. Considering that human knowledge is incomplete, we can instead recover relational maps from foundational models. This is the route taken by Phenom1 when trying to recall known genetic relationships. However, having to deal with various knowledge databases is an extremely complex task that we can't expect most ML scientists to be able to tackle alone. But with the help of artificial assistants like LOWE, this can be done in a matter of seconds.

🏆 Benchmarks, benchmarks, benchmarks. I can't repeat the word benchmark enough. Alas, benchmarks will stay the unloved kid on the ML block 🫥. But if the word benchmark is uncool, its cousin competition is way cooler 😎! Just as the OGB-LSC competition and Open Catalyst challenge played a major role for the GNN community, it is now time for a new series of competitions 🥇R. e even got the TGB (Temporal graph benchmark) recently. If you were at NeurIPSན, then you probably heard of Polaris coming up early 2024 ✨. Polaris is a consortium of multiple pharma and academic groups trying to improve the quality of available molecular benchmarks to better represent real drug discovery. Perhaps we'll even see a benchmark suitable for molecular graph generation instead of optimizing QED and cLogP, but I wouldn't hold my breath, I have been waiting for years. What kind of new, crazy competition will light up the GDL community this year 🤔?

Systems Biology

Kexin Huang (Stanford)

Biology is an interconnected, multi-scale, and multi-modal system. Effective modeling of this system can not only unravel fundamental biological questions but also significantly impact therapeutic discovery. The most natural data format for encapsulating this system is a relational database or a heterogeneous graph. This graph stores data from decades of wet lab experiments across various biological modalities, scaling up to billions of data points.

"In 2023, we witnessed a range of innovative applications using GNNs on these biological system graphs. These applications have unlocked new biomedical capabilities and answered critical biological queries." — Kexin Huang (Stanford)

1️⃣ One particularly exciting field is perturbative biology. Understanding the outcomes of perturbations can lead to advancements in cell reprogramming, target discovery, and synthetic lethality, among others. In 2023, GEARS applies GNN to gene perturbational relational graphs and it predicts outcomes of genetic perturbations that have not been observed before.

2️⃣ Another cool application concerns protein representation. While current protein representations are fixed and static, we recognize that the same protein can exhibit different functions in varying cellular contexts. PINNACLE uses GNN on protein interaction networks to contextualize protein embeddings. This approach has shown to enhance 3D structure-based protein representations and outperform existing context-free models in identifying therapeutic targets.

None
PINNACLE has protein-, cell type-, and tissue-level attention mechanisms that enable the algorithm to generate contextualized representations of proteins, cell types, and tissues in a single unified embedding space. Source: Li et al

3️⃣ GNNs also have shown a vital role in diagnosing rare diseases. SHEPHERD utilizes GNN over massive knowledge graph to encode extensive biological knowledge into the ML model and is shown to facilitate causal gene discovery, identify 'patients-like-me' with similar genes or diseases, and provide interpretable insights into novel disease manifestations.

➡️ Moving beyond predictions, understanding the underlying mechanisms of biological phenomena is crucial. Graph XAI applied to system graphs is a natural fit for identifying mechanistic pathways. TxGNN, for example, grounds drug-disease relation predictions in the biological system graph, generating multi-hop interpretable paths. These paths rationalize the potential of a drug in treating a specific disease. TxGNN designed visualizations for these interpretations and conducted user studies, proving their decision-making effectiveness for clinicians and biomedical scientists.

None
A web-based graphical user interface to support clinicians and scientists in exploring and analyzing the predictions and explanations generated by TxGNN. The 'Control Panel' allows users to select the disease of interest and view the top-ranked TXGNN predictions for the query disease. The 'edge threshold' module enables users to modify the sparsity of the explanation and thereby control the density of the multi-hop paths displayed. The 'Drug Embedding' panel allows users to compare the position of a selected drug relative to the entire repurposing candidate library. The 'Path Explanation' panel displays the biological relations that have been identified as crucial for TXGNN's predictions regarding therapeutic use. Source: Huang, Chandar, et al

➡️ Foundation models in biology have predominantly been unimodal (focused on proteins, molecules, diseases, etc.), primarily due to the scarcity of paired data. Bridging across modalities to answer multi-modal queries is an exciting frontier. For example, BioBridge leverages biological knowledge graphs to learn transformations across unimodal foundation models, enabling multi-modal behaviors.

🔮 GNNs applied to system graphs have the potential to (1) encode vast biomedical knowledge, (2) bridge biological modalities, (3) provide mechanistic insights, and (4) contextualize biological entities. We anticipate even more groundbreaking applications of GNN in biology in 2024, addressing some of the most pressing questions in the field.

Predictions from the 2023 post

(1) performance improvements of diffusion models such as faster sampling and more efficient solvers; ✅ yes, with flow matching

(2) more powerful conditional protein generation models; ❌ Chroma and RFDiffusion are still on top

(3) more successful applications of Generative Flow Networks to molecules and proteins ❌ yet to be seen

Materials Science (Crystals)

Michael Galkin (Intel) and Santiago Miret (Intel)

In 2023, for a short period, all scientific news were talking only about LK-99 — a supposed room-temperature superconductor created by a Korean team (spoiler: it did not work as of now).

This highlights the huge potential ML has in material science, where perhaps the biggest progress of the year has happened — we can now say that materials science and materials discovery are first-class citizens in the Geometric DL landscape.

💡The advances of Geometric DL applied to materials science and discovery saw significant advances across new modelling methods, creation of new benchmarks and datasets, automated design with generative methods, and identifying new research questions based on those advances.

1️⃣ Applications of geometric models as evaluation tools in automated discovery workflows. The Open MatSci ML Toolkit consolidated all open-sourced crystal structures datasets leading to 1.5 million data points for ground-state structure calculations that are now easily available for model development. The authors' initial results seem to indicate that merging datasets seems to improve performance if done attentively.

2️⃣ MatBench Discovery is another good example of this integration of geometric models as an evaluation tool for crystal stability, which tests models' predictions of the energy above hull for various crystal structures. The energy above hull is the most reliable approximation of crystal structure stability and also represents an improvement in metrics compared to formation energy or raw energy prediction which have practical limitations as stability metrics.

None
Universal potentials are more reliable classifiers because they exit the red triangle earliest. These lines show the rolling MAE on the WBM test set as the energy to the convex hull of the MP training set is varied, lower is better. The red-highlighted 'triangle of peril' shows where the models are most likely to misclassify structures. As long as a model's rolling MAE remains inside the triangle, its mean error is larger than the distance to the convex hull. If the model's error for a given prediction happens to point towards the stability threshold at 0 eV from the hull (the plot's center), its average error will change the stability classification of a material from true positive/negative to false negative/positive. The width of the 'rolling window' box indicates the width over which errors hull distance prediction errors were averaged. Source: Riebesell et al

3️⃣ In terms of new geometric models for crystal structure prediction, Crystal Hamiltonian Graph neural network (CHGNet, Deng et al) is a new GNN trained on static and relaxation trajectories of Materials Project that shows quite competitive performance compared to prior methods. The development of CHGNet suggests that finding better training objectives will be as (if not more) important than the development of new methods as the intersection of materials science and geometric deep learning continues to grow.

🔥 The other proof points of the further integration of Geometric DL and materials discovery are several massive works by big labs focused on crystal structure discovery with generative methods:

1️⃣ Google DeepMind released GNoME (Graph Networks for Materials Science by Merchant et al) as a successful example of an active learning pipeline for discovering new materials, and UniMat as an ab initio crystal generation model. Similar to the protein world, we see more examples of automated labs for materials science ("lab-in-the-loop") such as the A-Lab from UC Berkley.

None
The active learning loop of GNoME. Source: Merchant et al.

2️⃣ Microsoft Research released MatterGen, a generative model for unconditional and property-guided materials design, and Distributional Graphormer, a generative model trained to recover the equilibrium energy distribution of a molecule/protein/crystal.

None
Unconditional and conditional generation of MatterGen. Source: Zeni, Pinsler, Zügner, Fowler, Horton, et al.

3️⃣ Meta AI and CMU released the Open Catalyst Demo where you can play around with relaxations (DFT approximations) of 11.5k catalyst materials on 86 adsorbates in 100 different configurations each (making it up to 100M combinations). The demo is powered by SOTA geometric models GemNet-OC and Equiformer-V2.

Santiago Miret (Intel)

While those works represent large-scale deployments of generative methods, there is also new work on using reinforcement learning (Govindarajan et al., Lacombe et al.) and GFlowNets (Mistal et al., Nguyen et al.) with geometric DL for crystal structure discovery as highlighted in the AI for Accelerated Materials Design (AI4Mat) workshop at NeurIPSན. AI4Mat-2023 itself saw rapid expansion in participation with a 2× increase in the number of submitted and accepted papers and almost tripling in the number of attendees.

💡 Geometric DL and GNNs continue to be a major part of AI4Mat's research content as we saw increased application of methods not only for property prediction but also for improving chemical synthesis and material characterization. One such promising example highlighted in the AI4Mat-2023 workshop is KREED (Cheng, Lo, et al), which uses equivariant diffusion to predict 3D structures of molecules based on incomplete information that can be obtained from real laboratory machines.

"Given the importance of structural data in material characterization, the discussions at AI4Mat highlighted the opportunities for Geometric DL to enter the space of real-world materials modelling in addition to their continued successes in simulations including ML-based potentials." — Santiago Miret (Intel)

🔮 In 2024, I expect to see multiple developments:

1️⃣ More discovery architectures and workflows that directly integrate geometric models like M3GNet, CHGNet, MACE.

2️⃣ Geometric models might also see increased competition from text-based representations and LLMs as new methods are being proposed that directly generate CIF files.

3️⃣ More deployment of geometric models and GNNs into real-world experimental data, likely in materials characterization such as KREED, which will likely run into regimes with less data compared to simulation-based modeling.

Molecular Dynamics & ML Potentials

Michael Galkin (Intel), Leon Klein (FU Berlin), N M Anoop Krishnan (IIT Delhi), Santiago Miret (Intel)

One of the pronounced trends of 2023 is going towards foundation models for ML potentials that work on a variety of compounds from small molecules to periodic crystals

For example, JMP (Shoghi et al) from FAIR and CMU, DPA-2 (Zhang, Liu, et al) from a large collaboration of Chinese institutions, and MACE-MP-0 (Batatia et al) from a collaboration led by Cambridge. Practically, those are geometric GNNs pre-trained in the multi-task mode to predict the energy (or forces) of a certain atomic structure. Another notable mention goes to Equiformer V2 (Liao et al) as a strong equivariant transformer that holds SOTA in many tasks including the recent OpenCatalyst 2023 Challenge and OpenDAC (Direct Air Capture) challenge.

None
A foundation model for materials modelling. Trained only on Materials Project data which consists primarily of inorganic crystals and is skewed heavily towards oxides, MACE-MP-0 is capable of molecular dynamics simulation across a wide variety of chemistries in the solid, liquid and gaseous phases. Source: Batatia et al

⚛️ A common use case for ML potentials is molecular dynamics (MD) which aims to simulate a certain structure on a span of nanoseconds (10ᐨ⁹) to seconds. The main problem is that the fundamental timestep in classical methods is a femtosecond (10ᐨ¹⁵), that is, you'd need at least 1 million steps to simulate a nanosecond and that's expensive. Modern ML-based methods for MD aim to speed it up by applying coarse-graining and other approximation tricks that accelerate simulations by large margins (30–1000x). Fu, Xie, et al (TMLRན) apply coarse-graining to atomic structures and run a GNN over smaller graphs to predict the next-step position. Experimentally, the method brings 1000–10.000x speedups compared to classical methods. TimeWarp (Klein, Foong, Fjelde, Mlodozeniec, et al, NeurIPSན) can simulate large timesteps (1⁰⁵ — 1⁰⁶ femtoseconds) in a single forward pass by using a conditional normalizing flow model that approximates a distribution of next-step positions. A trained model is used with MCMC sampling and delivers ~33x speedups.

None
(a) Initial state x(t) (Left) and accepted proposal state x(t+τ) (Right) sampled with Timewarp for the dipeptide HT (unseen during training). (b) TICA projections of simulation trajectories, showing transitions between metastable states, for a short MD simulation (Left) and Timewarp MCMC (Right), both run for 30 minutes of wall-clock time. Timewarp MCMC achieves a speed-up factor of ≈ 33 over MD in terms of effective sample size per second. Source: Klein, Foong, Fjelde, Mlodozeniec, et al

Santiago Miret (Intel)

💡As the deployment of geometric models has seen greater success in property modelling, researchers have pushed the state-of-the-art by testing these models in real-world molecular dynamics simulations. The first work to highlight issues with training models on energy and forces alone was Forces Are Not Enough published in TMLR in early 2023. Nevertheless, advances in neighborhood-based methods such as Allegro led to the successful deployment of large-scale simulations using geometric deep learning models, including a nomination for the Gordon Bell Prize.

"Much work still remains in ensuring successful, generalised deployment of machine learning potentials across a variety of physical and chemical phenomena." — Santiago Miret (Intel)

➡️ EGraffBench highlights some new challenges, such as generalisation across temperatures and materials phase changes (i.e. solid-to-liquid change), and proposes new metrics for evaluating the performance of machine learning potentials in real MD simulations. The AI4Mat-2023 workshop also showcased the development of new ML potentials for specialised use cases, such as solid electrolytes for batteries.

Leon Klein (FU Berlin)

💡 A notable constraint in the application of generative models to sample from the equilibrium Boltzmann distribution was the requirement for retraining with each new system, thereby limiting potential advantages over traditional MD simulations. However, recent advancements have seen the emergence of transferable models across various domains. Our contribution, Timewarp, presents a transferable model capable of proposing large time steps for MD simulations focused on all atom small peptide systems. Similarly, Fu et al. capture the time-coarsened dynamics of coarse-grained polymers, while Charron et al. excel in learning a transferable force field for coarse-grained proteins.

"Consequently, this year has demonstrated the feasibility of transferable generative models for MD simulations, showcasing their potential to speed up such simulations." — Leon Klein (FU Berlin)

🔮 In 2024, I expect that more tailored GNNs are used to improve accuracy for the transferable models, with a potential focus on encoding more information about the system. For example, Timewarp, while lacking rotational symmetry in its model, employs data augmentation. Alternatively, rotational symmetry could be incorporated using the recently proposed SE(3) Equivariant Augmented Coupling Flows. Similarly, Charron et al. use a SchNet instead of a more complex GNN.

N M Anoop Krishnan (IIT Delhi)

"One of the most exciting developments for the year in the realm of ML potentials is the development of "universal" interatomic potentials that can span almost all the elements of the periodic table." — N M Anoop Krishnan (IIT Delhi)

💡 Following M3GNet in 2022, this year witnessed the developments of three such models based on CHGNet (Deng et al), NequIP (Merchant et al), and MACE (Batatia et al). These models have been used to demonstrate several challenging tasks including materials discovery (Merchant et al), and diverse set of MD simulations (Batatia et al) such as phase transition, amorphization, chemical reaction, 2D materials modeling, dissolution, defects, combustion to name a few. These approaches provide promising results towards the universality of these potentials, thereby allowing one to solve challenging problems including the discovery of crystals from their corresponding amorphous structure (Aykol et al), a long-standing open problem in materials.

🏋️ While these potentials do provide a handle to attack some outstanding problems, the challenges remain in understanding the scenarios where these potentials can fail.

1️⃣ Testing these potentials to their limit to understand their capability is an important aspect to understand their limitations. This includes modeling extreme environments such as high pressure and radiation conditions, simulating complex multicomponent systems such as glasses or high-entropy alloys, or simulating different phases of systems such as water or silica would be interesting challenges.

2️⃣ While some of these models have been termed as "foundation" models, emergent behavior associated with FMs has not been demonstrated by them. Most of these models simply show extrapolation capability to potentially unseen regions in the phase space or to novel compositions. Developing truly foundational models in terms of emergent properties would be an interesting challenge.

3️⃣ A third aspect that has been paid less attention to is the ability of these models to simulate at scale. While Allegro has demonstrated some capability in terms of length scales these potentials can achieve, simulating at larger time and length scales with stability while respecting the "universality" shall still remain an open challenge for these potentials.

🔮 What to expect in 2024?

1️⃣ Benchmarking suite: While there exist several benchmarking studies on MD simulations, it is expected that 2024 will witness more formalized efforts in this direction both in terms of datasets and tasks. A standard set of tasks that can automatically evaluate potentials and place them on leaderboards will enable easy ranking of potentials targeted for downstream tasks on different materials such as metals, polymers, or oxides.

2️⃣ Model and dataset development: Further efforts will be made to make ML potentials more compact and efficient in terms of their architectures. Moreover, 2024 will also witness large-scale dataset development that will provide ab initio data for training these potentials.

3️⃣ Differentiable MD/AIMD: Further, it is expected that the developments in differentiable simulations will become a major area of fusing experiments and ab initio simulations towards automated development of interatomic potentials for targeted applications. This year may also see advances in differentiable AIMD with machine learned functionals that may allow economical simulations to scale beyond what it has been able to achieve thus far.

Predictions from the 2023 post

We expect to see a lot more focus on computational efficiency and scalability of GNNs. Current GNN-based force-fields are obtaining remarkable accuracy, but are still 2–3 orders of magnitude slower than classical force-fields and are typically only deployed on a few hundred atoms.

✅ Allegro for the Gordon Bell Prize, Large-scale screening with GNoMe

🔮What to expect in 2024:

1️⃣ More deployment of ML potentials into large-scale MD simulations that showcase new research opportunities and challenges and provide a better idea of what benefits ML potentials provide compared to traditional potentials.

2️⃣ New datasets that outline previously unexplored challenges for ML potentials, such as new materials systems and new physical phenomena for those materials such as phase changes at various temperatures and pressures.

3️⃣ Exploration of multi-scale problems that might draw inspiration from classical techniques.

Geometric Generative Models (Manifolds)

Joey Bose (Mila & Dreamfold) and Alex Tong (Mila & Dreamfold)

While generative ML continued to dominate the field in 2023, it was the popularization of geometric generative models that incorporate geometric priors an interesting trend of the year.

Joey Bose (Mila & Dreamfold)

"This year we saw the burgeoning subfield of geometric generative generative models really take a commanding step forward. With the success of diffusion models and flow matching in images we saw more fundamental contributions to enable Generative AI for geometric data types." — Joey Bose (Mila & Dreamfold)

While diffusion models for manifolds existed, this year we really saw them being scaled up with Scaling Riemannian Diffusion Models by Lou et. al and functional approaches in Manifold Diffusion Fields Elhag et. al.

None
(Left) Visual depiction of a training iteration for a field on the bunny manifold M. (Right) Visual depiction of the sampling process for a field on the bunny manifold. Figure source: Elhag et al.

For Normalizing flow-based methods, Riemannian Flow matching by Chen and Lipman stands at the top of the sea of papers as being the most general framework for FM.

In general, a large theme of geometric generative models involves handling symmetries. Equivariant approaches shone this year, from SE(3) models including EDGI (Brehmer, Bose et. al), SE(3) augmented coupling flows (Midgley et. al), to cool theoretical work on Geometric neural diffusion processes (Mathieu et. al) and important physics-based applications with the paper by Abbot et. al.

Alex Tong (Mila & Dreamfold)

"In 2023 we saw advancement both in terms of modelling and the rise of a new application — Protein backbone design. Much work is still needed to understand the properties of the SE(3)₀ type of product manifold, where it is still unclear how to best combine modalities" — Alex Tong (Mila & Dreamfold)

2023 saw new models such as RFDiffusion, FrameDiff, and FoldFlow which operate over the SE(3)₀ manifold of protein backbones. This presents a new challenge for geometric generative models which I think we will see significant progress in the coming year.

On the modelling side, generative modelling with flow and bridge matching models in Euclidean domains led to quick succession of Riemannian and equivariant extensions with Riemannian Flow Matching by Chen and Lipman and Equivariant flow matching (Klein et al., Song et al.) on molecule generation tasks.

🔮 What to expect in 2024:

1️⃣ More exploration into modelling the SE(3)₀ manifold following successes in protein backbone design.

2️⃣ Further investigation and theory of how to train generative models on multimodal and product manifolds.

3️⃣ Domain-specific models exploiting features of more specific manifold and equivariant structures.

BIG Graphs, Scalability: When GNNs are too expensive

Anton Tsitsulin (Google)

This year has been fruitful for large graph fans.

"Learning on Very Large Graphs has always been a challenge due to the unstructured sparsity not being supported by modern accelerators, losing in the hardware lottery. Tensor Processing Units — you can think about them as very fast GPUs with tons (multi-terabyte) of HBM memory — were the rescue of 2023." — Anton Tsitsulin (Google)

In a KDD paper (Mayer et al.), we showed that TPUs can solve large-scale node embedding problems more efficiently than GPU and CPU systems at a fraction of the cost. Many industrial applications of graph machine learning are fully unsupervised; there, it is hard to evaluate embedding quality. We wrote a paper (Tsitsulin et al.) that performs unsupervised embedding analysis at scale.

None
Scale of TpuGraphs compared to other graph property prediction datasets. Source: Phothilimthana et al.

➡️ This year, TPUs helped graph machine learning, so it was time to give back. We released a new TpuGraphs dataset (Phothilimthana et al.) and ran a Kaggle competition "Google — Fast or Slow? Predict AI Model Runtime" on it that showed how to improve learning models running on TPUs with graph machine learning. It had 792 Competitors, 616 Teams, and 10,507 Entries. The dataset provides 25x more graphs than the largest graph property prediction dataset (with comparable graph sizes), and 770x larger graphs on average compared to existing performance prediction datasets on machine learning programs. This dataset is so large, a new algorithm for doing graph-level predictions on large-scale graphs had to be developed by Cao et al.

➡️ Large-scale graph clustering has seen significant contributions this year. A new approximation algorithm (Cohen-Addad et al.) was proposed for correlation clustering improving the approximation factor from 1.994 to the whopping 1.73. TeraHAC (Dhulipala et al) is a major improvement over last year's ParHAC (that we covered in the 2023 post) — an approximate (1+𝝐) hierarchical agglomerative clustering algorithm for trillion-edge graphs. The largest graph used in the experiments is a massive Web-Query graph with 31B nodes and 8.6 trillion edges 👀. Notable mentions also go to the fastest (to date) algorithm for Euclidean minimum spanning tree (Jayaram et al) and a new near-linear time algorithm for approximating the Chamfer distance between point sets (Bakshi et al.).

🔮 What to expect in 2024:

1️⃣ Algorithmic advances will help scale other popular graph algorithms

2️⃣ Novel hardware usage will help scaling up different graph models

Predictions from the 2023 post

(1) further reduction in compute costs and inference time for very large graphs ✅ We observed order-of-magnitude speedups in clustering and node embedding.

(2) Perhaps models for OGB LSC graphs could run on commodity machines instead of huge clusters? ❌ solid no

Algorithmic Reasoning & Alignment

Petar Veličković (Google DeepMind) and Liudmila Prokhorenkova (Yandex Research)

Algorithmic reasoning, a class of ML techniques able to execute algorithmic computation, has continued to make stable progress during 2023.

Petar Veličković (Google DeepMind)

"2023 has been a year of steady progress for neural algorithmic reasoning models — it indeed remains one of the areas where GNN development gets most creative — probably because it has to be." — Petar Veličković (Google DeepMind)

Aside from the already discussed asynchronous algorithmic alignment work, there are three results we achieved this year that I am personally proudest of:

1️⃣ DAR showed that pre-trained multi-task neural algorithmic reasoners can be scalably deployed to downstream graph problems — even if they are 180,000x larger than the synthetic training distribution of the NAR. What's more, we set the state-of-the-art in modelling mouse brain vessels 🐁🧠🩸. NAR is not a victim of the bitter lesson! 📈

2️⃣ Hint-ReLIC 🗿was our response to the rich body of research in no-hint models. We go away from the issue-ridden hint autoregression and instead model hint invariants using causal reasoning. We obtain a potent hint-based NAR, which still holds state-of-the-art on broad patches of CLRS-30! "Hints can take you a long way, if used in the right way."

3️⃣ Last but not least, we took the plunge and made the first in-depth analysis of the latent space representations of trained NAR models. What we found was not only immensely beautiful to look at 🌺 but it also taught us a great deal about how these models work.

None
Left: Trajectory-wise PCA of eight clusters of reweighted graphs showing that they all contain a single dominant direction. Different clusters have different colors. Middle: Many embedding clusters with dominant directions overlaid in red. Right: Step-wise PCA of random graphs with the dominant cluster directions overlaid in red. Source: Mirjanić, Pascanu, Veličković

Beyond growing our vibrant community, I find it important to state that many of NAR's foundational ideas are at the crux of important LLM methodologies; to name just one example, hint following is directly related to chain-of-thought prompting.

💡 What I am most happy about is that in 2023, this link is getting explicit recognition, and ideas from NAR are now directly or indirectly influencing the most potent AI systems in use today. Indeed, NAR is listed as a key motivation for studying length generalisation, and more broadly generalisation on the unseen (ICMLན Best Paper Award). CLRS-30, the flagship NAR benchmark, is directly used to evaluate capabilities of LLMs in neural architecture search and general AI research. And, as a final cherry on top, CLRS-30 is recognised as one of only seven reasoning evaluations used by Gemini, a frontier large language model from Google DeepMind. I am hopeful that this is a beacon of things to come in 2024, and that we will see even more ideas from NAR break into the design of frontier scalable AI models.

Liudmila Prokhorenkova (Yandex Research)

Throughout the year, substantial progress has been achieved on the path towards endowing models with various algorithmic inductive biases: the use of dual problems (Numeroso et al), contrastive learning techniques (Bevilacqua et al; Rodionov et al), augmentation of models with data structures (Jürß et al; Jain et al), and in-depth examination of computational models (Engelmayer et al). Another important direction is evaluating existing models in terms of scalability and data diversity (Minder et al).

"In 2024 it would be great to see more comprehensive analysis and understanding of neural reasoners: which operations they learn, how sensitive they are to different shifts in data distributions, what types of mistakes they tend to make and why." — Liudmila Prokhorenkova (Yandex Research)

Gaining such insights may contribute to the development of even more robust and scalable models. Furthermore, robust neural reasoners have the potential to positively impact combinatorial optimization models.

Predictions from the 2023 post

(1) Algorithmic reasoning tasks are likely to scale to graphs of thousands of nodes and practical applications like in code analysis or databases ✅ yes, DAR scales to the OGB vessel size

(2) even more algorithms in the benchmark ✅ yes, SALSA-CLRS

(3) most unlikely — there will appear a model capable of solving quickselect ❌ still unsolved ;(

Knowledge Graphs: Inductive Reasoning is Solved?

Michael Galkin (Intel) and Zhaocheng Zhu (Mila & Google)

Since its inception in 2011, the grand challenge of KG representation learning was truly inductive reasoning when a single model would be able to run inference (eg, missing link prediction) on any graph without input features and without learning hard-coded entity/relation embedding matrices. GraIL (ICMLཐ) and Neural Bellman-Ford Nets (NeurIPSད) were instrumental in extending inference to unseen entities, but generalization to both new entities and relation types at inference time remained an unsolved challenge due to the main question: what can be learned and transferred when the whole entity/relation vocabulary can change?

🔮 Our prediction for 2023 (an inductive model fully transferable to different KGs with new sets of entities and relations, e.g., training on Wikidata, and running inference on DBpedia or Freebase) came true in several works:

  • Gao et al introduced the concept of double equivariance that forces the neural net to be equivariant to permutations of both node IDs and relation IDs. The proposed ISDEA++ model employs a DSS-GNN-like aggregation of a relation-induced subgraph and a subgraph induced by all other relation types.
  • ULTRA introduced by Galkin et al learns the invariance of relation interactions (captured by a graph of relations) and transfers to absolutely any multi-relational graph. ULTRA achieves SOTA results on dozens of transductive and inductive datasets even in the zero-shot inference setup. Besides, it enables a foundation model-like approach for KG reasoning with generic pre-training, zero-shot inference, and task-specific fine-tuning.
None
Three main steps taken by ULTRA: (1) building a relation graph; (2) running conditional message passing over the relation graph to get relative relation representations; (3) use those representations for inductive link predictor GNN on the entity level. Source: Galkin et al

Learn more about inductive reasoning in the recent blog post:

As the grand challenge seems to be solved now, is there anything left for KG research, or we should call it a day, throw a party, and move on?

Michael Galkin (Intel)

"Indeed, with the grand challenge solved, it feels a bit like an existential crisis — everything important is invented, Graph ML enabled things that looked impossible just 5 years ago. Perhaps, KG community should re-invent itself and focus on practical problems that can be tackled with graph foundation models. Otherwise, the subfield would disappear from research radars like Semantic Web" — Michael Galkin (Intel)

Transductive and shallow KG embeddings are dead and nobody in 2024 should work on them, it is time to retire them for good. ULTRA-like foundation models can now work without training on any graph which is a sweet spot for many closed enterprise KGs.

➡️ The last uncharted territory is inductive reasoning beyond simple link prediction (complex database-like logical queries) and I think it will also be solved in 2024. Adding temporal aspects, LLM node features, or scaling GNNs for larger graphs is a question of time and presents more of an engineering task than a research question.

Zhaocheng Zhu (Mila & Google)

"With the rise of LLMs and numerous prompt-based reasoning techniques, it looks like KG reasoning is coming to an end. Texts are more expressive and flexible than KGs, and meanwhile they are more available in quantity. However, I don't think the reasoning techniques that the KG community developed are in vain." — Zhaocheng Zhu (Mila & Google)

➡️ We see that many LLM reasoning methods coincide with well-known ideas on KGs. For instance, the difference between direct prompting and chain-of-thought (CoT) shares much spirit with embedding methods and path-based methods on KGs, where the latter ones parameterize smaller steps and thereby generalize better to new combinations of steps. In fact, topics like inductive and multi-step generalization were explored on KGs several years earlier than on LLMs.

When we develop new techniques for LLMs, it is essential to take a glance at similar goals and solutions on KGs. In brief, while the modality of KGs may fade at some point, the insights we learned from KG reasoning will continue to illuminate in the era of LLMs.

Temporal Graph Learning

There will be a separate overview post on temporal graph learning, stay tuned!

LLMs + Graphs for Scientific Discovery

Michael Galkin (Intel)

💡LLMs were everywhere in 2023 and it's hard to miss the 🐘 in the room.

"We have seen a flurry of approaches trying to marry graphs with LLMs. The subfield is emerging and making its tiny baby steps which are important to acknowledge." — Michael Galkin (Intel)

We have seen a flurry of approaches trying to marry graphs with LLMs (sometimes literally verbalizing the edges in a text prompt) where straightforward prompting with edge index does not really work for running graph algorithms with language models, so the crux is in the "text linearization" and proper prompting. Among the notable mentions, you might be interested in GraphText by Zhao et al that devises a graph syntax tree prompt constructed from features and labels in the ego-subgraph of a target node — GraphText works for node classification. In Talk Like a Graph by Fatemi et al the authors study graph linearization strategies and how they impact LLM performance on basic tasks like edge existence, node count, or cycle check.

None
Standard GNNs (left) and GraphText (right). GraphText encodes the graph information into text sequences and uses LLM to perform inference. The graph-syntax tree contains both node attributes (e.g. feature and label) and relationships (e.g. center-node, 1st-hop, and 2nd-hop). Source: Zhao et al

➡️ Despite the early stage, there exist already 3 recent surveys (Li et al, Jin et al, Sun et al) covering dozens of prompting approaches for graphs. Generally, it is yet to be seen whether LLMs are an appropriate hammer 🔨 for a specific graph nail given all the limitations of the autoregressive decoding, small context sizes, and permutation-invariant nature of graph tasks. If you are broadly interested in LLM reasoning, check out our recent blog post covering the main areas and progress made in 2023.

➡️ LLMs in applied scientific tasks exhibit more promising, sometimes quite unexpected results: ChemCrow 🐦‍⬛ by Bran, Cox, et al is an LLM agent powered with tools that can perform tasks in organic chemistry, synthesis, and material design right in natural language (without fancy equivariant GNNs). For example, with a query "Find and synthesize a thiourea organocatalyst which accelerates a Diels-Alder reaction" ChemCrow devises a sequence of actions starting from a basic SMILES string and ending up with instructions to a synthesis platform.

Similarly, Gruver et al fine-tuned LLaMA-2 to generate 3D crystal structures as a plain text file with lattice parameters, atomic composition, and 3D coordinates and it is surprisingly competitive with SOTA geometric diffusion models like CDVAE.

None
Experimental validation. a) Example of the script run by a user to initiate ChemCrow. b) Query and synthesis of a thiourea organocatalyst. c) The IBM Research RoboRXN synthesis platform on which the experiments were executed (pictures reprinted courtesy of International Business Machines Corporation). d) Experimentally validated compounds. Source: Bran, Cox, et al

🔮 In 2024, scientific applications of LLMs are likely to expand both breadth-wise and depth-wise:

1️⃣ Reaching out to more AI4Science areas;

2️⃣ Integration with geometric foundation models (since multi-modality is the main LLM focus for the coming year);

3️⃣ Hot take: LLMs will solve the quickselect task in the CLRS-30 benchmark before GNNs do 🔥

Cool GNN Applications

Petar Veličković (Google DeepMind)

In my standard deck motivating the use of GNNs to a broader audience, I rely on a usual "arsenal" slide of impactful GNN applications over the years. With 2023 being significantly marked by LLM developments, I was wondering — can I meaningfully update this slide, but only using models released this year?

"It was the middle of the year back then, and already I was in for a nice surprise; I did not have enough space to list all the awesome things done with GNNs!" — Petar Veličković (Google DeepMind)

💡 While it might have gone comparatively under the radar, I confidently claim that 2023 was the most exciting year for cool GNN applications! The rise of LLMs just made it very clear where the limits of text-based autoregressive models are, and that for most scientific problems coming from Nature, their graph structure cannot be ignored.

Here's a handful of my personal favourite landmark results — all published in top-tier venues:

  • GraphCast provided us a landmark model for medium-range global weather forecasting ⛈️ and with it, more accurate foreshadowing of extreme events such as hurricanes. A highly well-deserved cover of Science!
  • In an outstanding development in materials science, GNoME uses a GNN-based model to discover millions of novel crystal structures 💎R — an . Published in Nature.
  • We've been treated to not just one, but two new breakthroughs in antibiotic discovery 💊 using message passing neural networks — the latter being published in Nature!
  • GNNs can smell 👃 by observing the molecular structure emitting an odour — a result that may well revolutionise many industries, including perfumes! Published in Science.
  • On the cover of Nature Machine Intelligence, HYFA 🍄 shows how to use hypergraph factorisation to make significant progress in gene expression imputation 🧬!
  • Last but not least, particle physics ⚛️ remains a natural stronghold of GNN applications. In this year's Nature Physics Review, we have been treated to a fascinating survey elucidating the myriad of ways how graph neural networks are deployed for various data analysis tasks at the Large Hadron Collider ⚡.

⚽ My own humble contribution to the space of GNN applications this year was TacticAI, the first full AI system giving useful tactical suggestions to (association) football coaches, developed in partnership with our collaborators at Liverpool FC 🔴R. TacticA is capable of both predictive modelling ("what will happen in this tactical scenario?"), retrieving similar tactics, and conditional generative modelling ("how to modify player positions to make a particular outcome happen?"). In my opinion, the most satisfying part of this very fun collaboration was our user study with some of LFC's top coaching staff — directly illustrating that the outputs of our model will be of use to coaches in their work 🏃.

None
A "bird's eye" overview of TacticAI. (A), how corner kick situations are converted to a graph representation. Each player is treated as a node in a graph, with node, edge and graph features extracted as detailed in the main text. Then, a graph neural network operates over this graph by performing message passing; each node's representation is updated using the messages sent to it from its neighbouring nodes. (B), how TacticAI processes a given corner kick. To ensure that TacticAI's answers are robust in the face of horizontal or vertical reflections, all possible combinations of reflections are applied to the input corner, and these four views are then fed to the core TacticAI model, where they are able to interact with each other to compute the final player representations — each "internal blue arrow" corresponds to a single message passing layer from (A). Once player representations are computed, they can be used to predict the corner's receiver, whether a shot has been taken, as well as assistive adjustments to player positions and velocities, which increase or decrease the probability of a shot being taken. Source: Wang, Veličković, Hennes et al.

This is what I'm all about — AI systems that significantly augment human abilities. I can only hope that, in my home country, Partizan catches on to these methods before Red Star does! 😅

🔮 What will we see in 2024? Probably more of the same, just accelerated! ⏩

Geometric Wall Street Bulletin 💸

Nathan Benaich (AirStreet Capital), Michael Bronstein (Oxford), and Luca Naef (VantAI)

2023 started with BioNTech (mostly known to the broad public for developing mRNA SARS-CoV-2 vaccines) announcing the acquisition of InstaDeep, a decade-old British company focused on AI-powered drug discovery, design and development. In May 2023, Recursion acquired two startups, Cyclica and Valence "to bolster chemistry and generative AI capabilities". Valence ML team is well-known for multiple works in the geometric and graph ML and hosting the Graphs & Geometry and Molecular Modeling & Drug Discovery seminars on YouTube.

💰Isomorphic Labs started 2024 by announcing small molecule-focused collaborations with Eli Lilly and Novartis with upfront payments of $45M and $37.5M, respectively, with the potential worth of $3 billion.

💰VantAI partnered with Blueprint Medicines on innovative proximity modulating therapeutics, including molecular glue and hetero-bifunctional candidates. The deal's potential worth is $1.25 billion.

💰CHARM Therapeutics raised more funding from NVIDIA and from Bristol Myers Squibb totalling the initial funding round to $70M. The company has developed DragonFold, its proprietary algorithm for protein-ligand co-folding.

💊 Monte Rosa announced a successful Phase 1 study of MRT-2359 (orally bioavailable investigational molecular glue degrader) against MYC-driven tumors like lung cancer and neuroendocrine cancer. Monte Rosa is known to use geometric deep learning for proteins (MaSIF).

Nathan Benaich (AirStreet Capital, author of the State of AI Report)

"I have long been optimistic about the potential of AI-first approaches to design problems in medicine, biotech, and materials science. Graph-based models had a great year in techbio in 2023." — Nathan Benaich (AirStreet Capital)

RFdiffusion combines diffusion techniques with GNNs to predict protein structures. It denoises blurry or corrupted structures from the Protein Data Bank, while tapping into RoseTTAFold's prediction capabilities. DeepMind have continued to further develop AlphaFold and build on top of it. Their AlphaMissense uses weak labels, language modeling, and AlphaFold to predict the pathogenicity of 71 million human variants. This is an important achievement, as most amino acid changes from genetic variation have unknown effects.

Beyond proteins, graph-based models have been improving our understanding of genetics. Stanford's GEARS system integrates deep learning with a gene interaction knowledge graph to predict gene expression changes from combinatorial perturbations. By leveraging prior data on single and double perturbations, GEARS can predict outcomes for thousands of gene pairs.

None
GEARS can predict new biologically meaningful phenotypes. (a) Workflow for predicting all pairwise combinatorial perturbation outcomes of a set of genes. (b) Low-dimensional representation of postperturbation gene expression for 102 one-gene perturbations and 128 two-gene perturbations used to train GEARS. A random selection is labeled. (c) GEARS predicts postperturbation gene expression for all 5,151 pairwise combinations of the 102 single genes seen experimentally perturbed. Predicted postperturbation phenotypes (non-black symbols) are often different from phenotypes seen experimentally (black symbols). Colors indicate Leiden clusters labeled using marker gene expression. Source: Roohani et al

🔮 In 2024, I put hope in two different developments.

1️⃣ We have seen the first two CRISPR-Cas9 therapies approved in the US and the UK. These genome editors were discovered through sequencing and random experimentation. I am excited about the use of AI models to design and create bespoke editors on demand.

2️⃣ We have started to see multimodality come to the AI bio world — combining DNA, RNA, protein, cellular, and imaging data to give us a more holistic understanding of biology.

Companies to watch in 2024

  • Profluent — LLMs for protein design
  • Inceptive.bio — founded by one of the authors of the Transformers paper.
  • Enveda Biosciences
  • Orbital Materials
  • Kumo.AI
  • VantAI — we are biased (Michael Bronstein is Vant's Chief Scientist and Luca Naef is a founder and CTO), but this is a cool company focused on the rational design of molecular glues using a combination of ML and proprietary experimental technology, which we believe to be the right combination for success.
  • Future House — a new Silicon Valley-based non-profit company in the AI4Science space funded by ex-Google CEO Eric Schmidt. Head of Science is Andrew White, known for his works on LLMs for chemistry. The self-described mission of the company is a "moonshot to build an AI scientist."

For additional articles about geometric and graph deep learning, see Michael Galkin's and Michael Bronstein's Medium posts and follow the two Michaels (Galkin and Bronstein) on Twitter.