Abstract
Advances in machine learning (ML) over the past decade have resulted in a proliferation of algorithmic applications for encoding, characterizing, and acting on complex data that may contain numerous multidimensional features. Recently, the emergence of deep-learning models trained across large datasets has created a new paradigm for ML in the form of Foundation Models (FMs). FMs are programs trained on large and broad datasets with an extensive number of parameters. Once built, these extremely powerful, flexible models can be utilized in less resource-intensive ways to build a variety of different downstream applications that can integrate previously disparate, multimodal data. The development of these applications can be done rapidly and with a much lower demand for ML expertise. Additionally, the necessary infrastructure and models themselves are already established within agencies such as NASA and ESA. At NASA, this work extends across several divisions of the Science Mission Directorate. Examples include the NASA Goddard and INDUS Large Language Models and the Prithvi Geospatial Foundation Model. Furthermore, ESA initiatives to bring FMs to Earth observations have led to the development of TerraMind. In February 2025, a workshop was held by NASA Ames Research Center and the SETI Institute to explore the potential of FMs in astrobiological research and identify the steps necessary to build and utilize such a model or models. Here, we share the findings and recommendations of that workshop and describe clear near-term and future opportunities in the development of a FM (or Models) for astrobiology applications. These applications would include a biosignature or life characterization task, a mission development and operations task, and a natural language task for integrating and supporting astrobiology research needs.
Get full access to this article
View all access options for this article.
