Keynotes
On the “Rough Use” of Machine Learning Techniques: a Story about Unrealistic Prediction, by Chih-Jen Lin Distinguished Professor, National Taiwan University, and Affiliated Professor, MBZUAI
Tentative Program
We are excited to see you in Singapore this coming week! Please see the program schedule below.
As part of the workshop, we hope to work with you for a “position paper” session following the talks to amplify the work that everyone is performing.
09:00–09:15 — Opening (Edward Raff, Odd Erik Gundersen)
Welcome, workshop goals, potential special issue, and position paper plan.
09:15–10:00 — Keynote + Q&A (45 min)
Keynote talk followed by questions.
10:00–10:30 — Session 1: Problem framing & reference baselines (2 papers, 15 min each)
Session Chair: Waqas Ahmed
Automated Reproducibility Has a Problem Statement Problem
Thijs Snelleman, Peter Lundestad Lawrence, Holger H. Hoos, Odd Erik Gundersen
open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison
Marianna Nezhurina, Jörg K.H. Franke, Taishi Nakamura, Timur Carstensen, Niccolò Ajroldi, Ville Komulainen, David Salinas, Jenia Jitsev
10:30–11:00 — Coffee Break (AAAI-defined)
11:00–12:00 — Session 2: Reproducibility in agents, code, benchmarks, and images (4 papers, 15 min each)
Session Chair: Peter Lundestad Lawrence
AI Copilots for Reproducibility in Science: A Case Study
Adrien Bibal, Steven Minton, Deborah Khider, Yolanda Gil
AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents
Bhanu Prakash Vangala, Ali Adibifar, Tanu Malik, Ashish Gehani
Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents
María Sanz-Gómez, Víctor Mayoral-Vilches, Francesco Balassone, Luis Javier Navarrete Lozano, Cristobal Ricardo Jesús Veas Chavez, Maite del Mundo de Torres
Learning to be Reproducible: Custom Loss Design for Robust Neural Networks
Waqas Ahmed, Sheeba Samuel, Kevin Coakley, Birgitta Koenig-Ries, Odd Erik Gundersen
12:00–13:00 — Lunch
13:00–14:15 — Session 3: Images, stability & simpler methods in applied settings (5 papers, 15 min each)
Session Chair: Thijs Snelleman
Exploration of Reproducible Generated Image Detection
Yihang Duan
Image Tiling for High-Resolution Reasoning: Balancing Local Detail with Global Context
Anatole Jacquin de Margerie, Alexis Roger
Measuring Stability Beyond Accuracy in Small Open-Source Medical Large Language Models for Pediatric Endocrinology
Vanessa D’Amario, Randy Daniel, Alessandro Zanetti, Dhruv Edamadaka, Nitya Alaparthy, Joshua Tarkoff
Simpler Methods Work Better for L1 Penalized Logistic Models and Large Datasets
Edward Raff, James Holt
Forest vs Tree: The $(N, K)$ Trade-off in Reproducible ML Evaluation
Deepak Pandita
14:15-14:30 - Short Break
14:30–15:30 — Workshop Part 1: Towards a position paper (Odd Erik Gundersen, Thijs Snelleman, Edward Raff)
Goal-setting, scope, key claims, outline, and breakout assignments.
15:30–16:00 — Coffee Break
16:00–17:00 — Workshop Part 2: Towards a position paper (Odd Erik Gundersen, Thijs Snelleman, Edward Raff)
Report-backs, synthesis, drafting plan, and next steps.
17:00–17:15 — Closing remarks & forward plan (Edward Raff, Odd Erik Gundersen)
Special issue next steps + position paper writing timeline and responsibilities.