Insert image left Evaluator & Agent are highly collaborative modules that Collaboration with experiment teams

Evaluator : AI for Experimental Design

ODBO figure

The goal of this direction is to transform the experimental design by replacing trial-and-error with few-shot minimal data adaptive optimization with approaches like Bayesian optimization (BO), active learning, and reinforcement learning to efficient exploration of complex and high-dimensional experimental spaces. In our research, we build our models with uncertainty-aware algorithm, outlier managements, low-dimensional representations, and domain-specific priors to ensure practical performance in noisy and constrained real-world chemistry experimental settings. By forming feedback loops, we hope to help the experimentalists to provide a general tool to acheive closed-loop discovery in chemistry, biology, and materials science.

- ODBO (Outlier-detected Bayesian Optimization): a ML protein directed evolution protocol that integrates low-dimensional and function-value-based protein encoding, search space prescreening with BO & outlier detection surrogate modeling to efficiently navigate noisy large sequence spaces and recommend high-fitness variants with minimal experimental cost.

Problem-driven Fine-tuning and Benchmark Constructions for Scientific LLM Agent

In this topic, we plan to construct domain-specific datasets by combining outputs from upstream simulation/emulation/prediction layers with curated data from literature data-mining. We are also interested in collaborating with computer scientists to develop scientist-centric evaluation metrics for Scientific LLMs (e.g., NatureLM). This targeted approach ensures that fine-tuning aligns with real scientific tasks, enabling LLM agents to learn domain-relevant reasoning, symbolic logic, and experimental workflows grounded in chemistry. This direction will involve close collaboration with computer science groups to bridge foundation models and scientific discovery.

- The impact of large language models on scientific discovery: a preliminary study using GPT-4 : Benchmark LLM4Sci performance in various tasks.