AI4PhysSci Lab: Mini tests

[Optional] Offline-mini tests for PhD applicants

To help prospective PhD applicants to idenitify if our group is suitable for you, we list few optional mini tests that cover several skills we might be interested in. We don't require applicants to submit these mini test results for application consideration but it is a great plus if you submit some results. Some of the tests might be challenging, and I know that. We will also value the efforts from candidates who at least try out the challenging ones.

Note 1: Even though we have suggestions for different students with different backgrounds, candidates can choose freely on what you would like to show. For all the mini tests, we don't care if you are using AI or any other coding tools, but will only assess by the final submissions

Note 2: We will expand the list of problems from time to time, this page also serves as a study material for UG students to use a different perspective to think about the relationships between chemistry, physics, math and AI.

1. Suggested for students from Physics/Chemsitry/Material science:

(1.1)[Vibe coding] Please work with any AI coding agent (e.g., Cursor, Claude Code, CodeX, GitHub Copilot, Gemini, etc.) to help you write a crawler that could extract a specific topic of papers (can be picked by the applicant) accepted by a most recent CS conference (again, can be picked by the applicant)

(1.2)[Basic deep learning/Forward NN] Please implement a simple 3-layer feedforward neural network from scratch (without using any deep learning libraries such as PyTorch, TensorFlow, JAX, etc.) to fit (regression or classification are both fine) a simple dataset picked by the applicant (e.g., MNIST, CIFAR10, etc.). Hint: You shoule be able to do that using Numpy.

(1.3)[Basic quantum chemistry/traditional electronic structure] Please implement a simple Hartree-Fock from scratch using Python (without using any quantum chemistry libraries such as PySCF, Psi4, etc.) to calculate the ground state energy of a simple molecule picked by the applicant (e.g., H2, HeH+, LiH, etc.).

(1.4)[Basic quantum chemistry/traditional electronic structure] Please implement a simple Full CI from scratch using Python (without using any quantum chemistry libraries such as PySCF, Psi4, etc.) to calculate the ground state energy of a simple molecule picked by the applicant (e.g., H2, HeH+, LiH, etc.).

(1.5)[(Not that basic) quantum chemistry/traditional/electronic structure/particle physics] Please derive the Coupled Cluster (CC) equations for CCD and CCSD, draw the corresponding Feynman diagrams, and implement CCD using Python to calculate the ground state energy of a simple molecule picked by the applicant (e.g., H2, HeH+, LiH, etc.).

(1.6)[Basic computational physics/computational math] Please implement a simple Metropolis-Hastings algorithm from scratch using Python (without using any ML libraries such as PyTorch, TensorFlow, JAX, etc.) to sample from a 2D Gaussian distribution.

(1.7)[Basic computational physics/stat mech] Please implement a simple Path Integral Monte Carlo (PIMC) algorithm from scratch using Python to simulate a 1D quantum harmonic oscillator (V(x) = x^2/2) and anharmonic oscillator (V(x) = x^2/2 + \lambda*x^4) at a finite temperature.

(1.8)[Basic quantum computing/QC4QC] Please describe Jordan-Wigner & Bravyi-Kitaev transforms from a theoretical perspective, and implement a simple Hamiltonian for a molecule picked by the applicant (e.g., H2, HeH+, LiH, etc.), and compute the ground energy using the two transforms. Hint: using QC softwares, such as OpenFermion, Qiskit, PennyLane, TensorCircuit, etc. .

(1.9)[Basic computational material science/software usage] Please use pymatgen to generate the crystal structure of a material picked by the applicant (e.g., Si, GaAs, etc.), and use DFT software (e.g., VASP, Quantum Espresso, etc.) to calculate its band structure and density of states. Please also provide some analyses on the results and discussions on symmetry and point groups and why certain DFT is picked.

(1.10)[Basic computational chemistry/traditional electronic structure] Please describe the following methods: (1) MP2 (2) CASSCF (3) DFT (please list at least one DFT on each level of Jacob ladder) (4) EOM-CCSD (5) TDDFT (6) QM/MM, and provide some examples on when to use them. Then use python computational chemistry software (PySCF, Psi4, etc.) to compute the single bond dissociation curve of H2O, and explain why certain method is behaving in certain pattern

(1.11)[Basic quantum computing/quantum optimization algorithms] Please describe the different quantum optimization algorithms from a theoretical perspective, then implement a QAOA (Quantum Approximate Optimization Algorithm) algorithm for Max-Cut problem using any QC softwares, such as OpenFermion, Qiskit, PennyLane, TensorCircuit, etc.

For (1.1) please submit your (1) prompts to the agents (ideally, state your prompt engineering strategy), (2) the codes generated by the agents, (3) your own modifications to the codes, (4) the final working codes, (5) a readme file that explains how to run the crawler, and (6) a summary of your experience and insights.

For (1.2-1.7), hint is that you should be able to do that using Numpy. Hint for (1.3-1.5 and 1.8), you could also check your code using H atom since you should know the analytical solution. Please submit your (1) codes as a GitHub repo, (2) test run results summarized as markdown or jupyter notebook, (3) a summary of your experience and insights.

For (1.8-1.11) please submit your (1) codes as a GitHub repo, (2) full outputs from software or jupyter notebook, (3) a summary of on how you think AI can help with these directions.

2. Suggested for students from Computer science/Applied math/Statistics:

Note: The following tests are not related to if we believe these techniques are the most advanced ones in AI4S.

(2.1) [ Diffusion model] Please implenet a simple Denoising Diffusion Probabilistic Model (DDPM) in Pytorch. You could refer to the original paper Denoising Diffusion Probabilistic Models by Ho et al. (2020). You could train your model on a simple dataset picked by the applicant (e.g., MNIST, CIFAR10, etc.) and generate some samples.

(2.2) [ Representation learning& Molecular graph ] Please implement a simple Graph Convolutional Neural Network (GCN) in Pytorch to predict a molecular property (e.g., solubility, toxicity, etc.) using a public dataset (e.g., QM9, MoleculeNet, etc.). You could refer to the paper Graph Neural Networks for Molecules by Want et al (2022).

(2.3) [ Reinforcement learning] Please implement a simple Deep Q-Network (DQN) in Pytorch to solve a classic control problem (e.g., CartPole, MountainCar, etc.) using OpenAI Gym. You could refer to the original paper Playing Atari with Deep Reinforcement Learning by Mnih et al. (2013).

(2.4) [ Transformer] Please implement a simple Transformer in Pytorch to perform a sequence-to-sequence task (e.g., machine translation, text summarization, etc.) using a public dataset (e.g., WMT, IWSLT, etc.). You could refer to the original paper Attention Is All You Need by Vaswani et al. (2017).

(2.5) [ Bayesian statistics] Please implement a simple Bayesian Neural Network from scratch using Python (without using any ML libraries such as PyTorch, TensorFlow, JAX, etc.) to fit a simple dataset picked by the applicant (e.g., Boston housing, diabetes, etc.).

(2.6) [ Optimization] Please implement a simple Stochastic Gradient Descent (SGD) from scratch using Python (without using any ML libraries such as PyTorch, TensorFlow, JAX, etc.) to fit a simple dataset picked by the applicant (e.g., MNIST, CIFAR10, etc.). Please also provide some insights/analyses on "grad-based" and "non-grad-based" optimization scheme, and compare the popular optimizers in DL on their theoretical benefits, complexity, and limitations.

(2.7) [ Causal inference] Please fine-tune/train a small causal language models and describe your architecture choices, training strategies, and evaluation metrics. You could use a small dataset picked by the applicant (e.g., CausalBank, etc.). You could refer to the paper Causal Inference with Large Language Model: A Survey by Ma (2024).

(2.8) [ Fine-tuning] Please discuss the advantages and disadvantages of different fine-tuning techs, and then fine-tune a small pre-trained language model (e.g., GPT-2, DistilBERT, etc.) on a specific downstream task (e.g., text, image, etc.) using any public dataset.

(2.9) [ Statistical learning theory] Please read the book Statistical Learning Theory by Nagler and summarize the key concepts and theorems in a markdown or jupyter notebook. You could also provide some examples to illustrate the concepts and theorems. (Hint: the basic ones include bias-variances, loss functions (types and characteristics), regularization (types and why we need it), prior & posterior, algorithm complexity analysis, risk bounds, etc.)

(2.10) [ Computational complexity] Please read the book Computational Complexity: A Modern Approach by Arora and Barak and summarize the key concepts and theorems in a markdown or jupyter notebook. You could also provide some examples to illustrate the concepts and theorems. (Hint: the basic ones include P vs NP, NP-completeness, reductions, space complexity, hierarchy theorems, etc.)

For (2.1-2.8), please submit your (1) codes as a GitHub repo, (2) test run results summarized as markdown or jupyter notebook, (3) a summary of your experience and insights.