AI Safety Research Engineer

Isabella Luong

Three active research projects, all asking the same question: can we actually trust how we measure AI?

AI Safety Evaluations Adversarial Robustness Red-teaming Mech Interp
View Research Full CV
Scroll to explore
About

Research engineer at the
frontier of AI safety

I build the infrastructure and evidence base for trustworthy AI — determined to close the gap between how we think AI systems behave and how they actually do.

Isabella Luong

I'm Isabella (Thao My) Luong, a research engineer based in Ho Chi Minh City, Vietnam, focused on AI safety evaluations and adversarial robustness. Within five months of entering the field, I'm running three interconnected empirical research projects examining how current evaluation infrastructure fails at scale.

My work spans benchmark integrity (how single-pass evals miss trajectory-level failure), LLM-as-judge bias (how stylistic features distort scoring independent of quality), and cross-session threat detection (catching adversarial actors who fragment attacks across API sessions). These aren't separate interests — they're a systematic examination of where evaluation breaks down.

I was admitted to CAMBRIA (10% acceptance) and accepted to two SPAR Spring 2026 projects, and am embedded in the EA/AI safety institutional ecosystem. I hold a B.Sc. in Information Technology from RMIT University Vietnam as a Vice-Chancellor Merit Scholar.

Location Ho Chi Minh City, Vietnam
Status Available for collaboration
Focus areas
AI Safety Evaluations Adversarial Robustness LLM-as-Judge Scalable Oversight Mechanistic Interpretability AI Welfare Benchmark Design
Stack
Python PyTorch HuggingFace Inspect (UK AISI) GNN Docker Git
Research

Active research projects

Benchmark Design · Animal Welfare Feb 2026 – Present
Research Mentee · Mentor: Allen Lu (Sentient Futures / Electric Sheep) · SPAR Spring 2026

Dynamic multi-turn benchmark evaluating frontier models on animal welfare reasoning under escalating adversarial pressure. Targeting submission to Inspect Evals (UK AISI); outputs feed into model specifications and training interventions at frontier labs.

  • Engineered an LLM-based scenario generation pipeline following ARENA dataset quality control standards, adapting ARENA's MCQ-oriented framework to open-ended adversarial welfare scenarios; implemented iterative rubric design for automated quality filtering, integrated manually curated few-shot exemplars to control output style and length distribution, and validated ~300 generated scenarios against benchmark contribution criteria
  • Refactored the dataset generation pipeline for reproducibility and robustness, eliminating auto-generated few-shot prompting in favor of curated exemplars, structuring multi-file pipeline outputs for reproducibility, and distilling community feedback into concrete generation constraints that measurably reduced eval-aware and formulaic scenario outputs
  • Identified and designed mitigation experiments for the comparability problem in dynamic multi-turn evaluation — where model-conditioned adversarial follow-ups cause non-equivalent pressure testing across models, confounding cross-model scoring; proposed and tested two solutions: a hybrid fixed-then-dynamic turn design providing a standardized baseline pressure before resuming adaptive generation, and controlled pressure injection ensuring welfare reasoning is consistently targeted regardless of Turn 1 model response
Adversarial Detection · ML Security Feb 2026 – Present
Research Mentee · Mentors: Linh Le & David Williams-King (Mila / ERA) · SPAR Spring 2026

End-to-end detection system for cross-session malicious model misuse — targeting adversarial actors who decompose attack queries across multiple API sessions to evade per-session safety classifiers. Targeting publication at USENIX Security, NeurIPS D&B, and ICLR.

  • Building cross-session monitoring infrastructure — session tracker and code embedding pipeline with vector DB to semantically link malicious fragments
  • Constructing dependency graphs with a cross-session linker and subgraph extraction module, surfacing latent attack structures invisible to single-session classifiers
  • Developing a GNN detection architecture with adversarial training loops (5–10 rounds) and explainability-focused outputs generating structured attack explanations
  • Designing FragBench — a standardized benchmark suite with baselines and leaderboard for cross-session attack detection evaluation
AI Safety · AI Welfare Nov 2025 – Present
AI Safety Researcher · Mentor: Philip Kratz · FutureKind Winter Fellowship

Three interconnected empirical projects forming a systematic examination of benchmark and evaluation failure in frontier AI systems — with implications for scalable oversight and reward robustness.

  • Characterizing trajectory-level behavioral drift under adversarial input pressure — identifying failure modes that standard single-pass evaluations systematically miss
  • Empirically measuring how stylistic features (verbosity, hedging, formatting) distort LLM-as-judge scores independent of answer quality
  • Investigating the absence of reliable ground truth in nonhuman welfare reasoning evaluations — proposing criteria for scalable oversight under partial verifiability
Training

AI Safety programs

01
Participant · London Initiative for Safe AI (LISA)

Accepted to the April 2026 cohort of Iliad's month-long intensive on technical AI alignment. Curriculum covers RL, learning theory, mechanistic interpretability, agent foundations, and scalable oversight including Debate. Strong performance serves as a pathway into the Iliad Fellowship (June–August 2026).

Apr 2026
02
Participant · Harvard Square, Boston · 10% acceptance rate

1 of 20 participants admitted worldwide. Completed a 3-week technical curriculum covering CNNs, ResNets, and transformers built from scratch; RL (DQN, PPO); RLHF; and mechanistic interpretability. Completed a capstone on automated capability elicitation with LLM-as-judge.

CAMBRIA 2026 cohort CAMBRIA 2026 cohort
Jan 2026
03
Vietnam Country Lead

International AI policy and advocacy organisation mobilising youth around AI governance and safety. Establishing funding pipelines with tech corporations and philanthropic funders; organizing panel discussions on AI risks and securing partnerships for technical curriculum delivery across Vietnam.

Nov 2025 – Present
04
Attendee · Harvard University, Cambridge, MA

HPAIR's flagship annual conference uniting global leaders, researchers, and students across policy, technology, and business.

Feb 2026
Experience

Professional background

Founding AI Engineer — Agents Full-time
DeepSurg · Remote, Turkey HQ
Aug 2025 – Present

Building Scala AI — an intelligent surgical tutor that performs real-time phase recognition, safety assessment, and structured performance feedback for laparoscopic procedures, improving surgical training through procedure-aware analysis.

  • Architected an agentic AI surgical tutor that understands procedural context, identifies the current surgical step, and evaluates safety, quality, efficiency, and bleeding — generating structured training reports with scores, feedback, key moments, and improvement suggestions
  • Fine-tuned a CNN-based image classifier to detect the start and end of the Calot Triangle Dissection phase within full laparoscopic cholecystectomy videos
  • Explored a VLM-only workflow as an alternative approach to building the annotation dataset; collaborated with medical practitioners to lead the surgical segmentation dataset generation process
  • Developed a CV-based, segmentation-driven surgical phase recognition pipeline, with active expansion to cover additional phases: Clipping & Cutting, Gallbladder Dissection, Gallbladder Packaging, Cleaning / Coagulation, Gallbladder Retraction
  • Built a systematic evaluation pipeline to benchmark model performance across phases and metrics
  • To our knowledge, first-to-market solution performing end-to-end automated surgical phase recognition and AI-driven trainee assessment at this level
Python PyTorch MONAI nnU-Net OpenCV CUDA
Product Engineer Full-time
Avery Dennison Corporation · Long Hau IP, Vietnam
2023 – 2025

Hired as 1 of 7 from 500+ candidates through a rigorous annual campaign for a fast-tracked management position. Embedded as the sole technical hire within a 500-person manufacturing operation — serving as the de facto in-house software engineer and automation consultant for the Finance department and the entire Vietnam site.

  • Designed and deployed Python + UiPath RPA solutions across 4 finance divisions, eliminating 120+ manual hours/month of repetitive workflow overhead
  • Architected a tax reconciliation system that cut month-end closing time by 25% — later standardised as the APAC regional template
  • Independently owned 8 department-wide automation projects from scoping to go-live — full-cycle project management, risk assessment, software engineering, and deployment; represented Finance in managing and liaising across internationally-wide automation initiatives
Python Streamlit Docker Apache Arrow UiPath Excel VBA
Education

Academic foundations

Bachelor of Science in Information Technology · GPA: 3.75/4.0
Vice-Chancellor Scholar — Full-Tuition Merit Scholarship, Top 7 incoming students
Jun 2021 – May 2025
  • Ranked Top 8 in 2025 IT graduates cohort; Top 5% university-wide with 16/21 High Distinction courses (85%+)
  • Top student in: Software Engineering Project Management, Artificial Intelligence & Machine Learning, UI/UX Product Design
Poznań University of Technology · Poznań, Poland
Apr 2025

Selected among 90 European peers for a 5-day intensive workshop on systems thinking, Balanced Scorecard, and AI-integrated strategy. Presented to an international faculty panel.

Competitions

Selected competitions
& leadership

Champion — Ranked 1st twice
McKinsey Young Leaders for Vietnam (YLV) Fellowship
May – Nov 2024 · Ho Chi Minh City

Selective fellowship pairing high-potential Vietnamese leaders with McKinsey consultants and NGO partners for real-world consulting casework under evaluation conditions.

  • Ranked 1/20 teams twice by McKinsey consultants across two independent fieldwork cases evaluated under real consulting conditions
  • Fieldwork 1 — Boatman Foundation: Designed a multi-component technology-integrated intervention for undereducated children in Vietnam Highlands — digital learning tools, scholarship allocation logic, and a teaching ambassador deployment model
  • Fieldwork 2 — Vun Art: Led end-to-end operational and financial consulting; delivered data-driven product diversification strategy and targeted materials sourcing campaign
  • Shortlisted for McKinsey Young Leaders for Inclusion (international program); invited as Case Coach for YLV 2025 cohort
Champion
MAERSK × RMIT Sustainability Impact Challenge
May – Nov 2024

Industry-partnered innovation challenge requiring technically grounded, financially validated decarbonization strategies for global logistics operations.

  • Won among 50+ teams with a 5-year roadmap deploying AI telematics for real-time fuel optimization — cutting Maersk's carbon footprint by 40% with 12% ROI
  • Built quantitative models demonstrating $2.8M long-term savings with full ESG compliance
National Champion · Best Pitch
RMIT FinTech Startup Competition × KardiaChain
Jun – Sep 2023

National competition requiring full-stack product development and business validation, judged by blockchain industry leaders.

  • Architected HemoChain — Vietnam's first distributed blood transfusion coordination network on Hyperledger Fabric with Google Maps API for real-time hospital routing
  • Led dual technical and business teams across smart contract logic, UI development, and go-to-market strategy
National 2nd Place
ASEA-China-India Youth Leadership Summit (ACIYLS)
Mar – Oct 2024 · Singapore & Vietnam
  • Vietnam — Technical Research Lead: Developed OAKIA — textile-to-textile circular system using AI-driven NIR spectroscopy for automated fabric composition identification, addressing Vietnam's 15M kg annual textile waste
  • Singapore — Systems Lead: Led a randomly assigned team of 10 to design Be-Cool — an urban cooling retrofit company deploying IoT-integrated passive cooling systems

Let's work on
safer AI together

I'm open to research collaborations, fellowship opportunities, and conversations about AI safety. Reach out if you're working on evaluations, scalable oversight, or adversarial robustness.