About
From papers to production
Currently: M.S. AI at UT Austin · Research Engineering at Burns & McDonnell
I translate AI research into production systems at enterprise scale—bridging the gap between what papers discover and what products need.
From multi-agent orchestration platforms to NL-to-SQL agents serving 15,000 employees, I apply insights from interpretability, evaluation, and reinforcement learning to build systems that actually work outside a notebook.
Currently pursuing an M.S. in AI at UT Austin while leading research engineering at Burns & McDonnell.
3+
Publications
IEEE, NAACL, arXiv
15K+
Users Impacted
Enterprise Platforms
4
Platforms Built
Research → Production
Experience & Education
Where I've been
2025 – 2027 (expected)
M.S. Artificial Intelligence
University of Texas at Austin
Graduate studies in AI with focus on agent systems, interpretability, and reasoning.
May 2024 – Present
Research Engineer
Burns & McDonnell
Drove adoption of AI-native engineering practices across Technology Solutions Group. Architected three production platforms—Axiom, a document processing SDK, and Experience IQ—applying research from agent orchestration, RLVR, and LLM evaluation to serve 15,000+ employees. Presenting at Google Cloud Next 2026.
2026
Speaker — Google Cloud Next
Experience IQ: Enterprise NL-to-SQL Agent.
December 2025
Speaker — Google Dev Days
AI Applications.
November 2024
Guest Lecturer — CS473
Purdue University
Retrieval Augmented Generation.
May 2024
Speaker — BMcD Innovation Roundtable
Burns & McDonnell
Introduction to Agentic Systems.
March 2024
Speaker — BMcD Innovation Roundtable
Burns & McDonnell
Applications of Large Language Models.
January 2024 – Present
Independent Researcher
University of Virginia (Aidong Zhang Group)
Published at IEEE Big Data 2024. Finetuned language models via PEFT and prompt-tuning for Social Determinants of Health extraction in MIMIC-IV.
January 2023 – April 2024
Data Engineer
1898 & Co.
Reduced manual data integration time by ~70% for utility-sector clients. Built ETL pipelines, a Flask REST API automating data migration from SharePoint to Oracle APEX, and PowerBI dashboards visualizing operational KPIs for 10–15 project managers.
2018 – 2022
B.S. Computer Science · Minor: Psychology
Purdue University
3 publications during undergrad. TA for CS390 Deep Learning, CS252 Systems Programming, ECE264 Advanced C. Research across NLP, education technology, and formal language theory.
Selected Work
What I've built
Axiom
Multi-Agent Engineering Platform
Orchestration platform supporting any Vertex AI model and coding harness (OpenCode, Claude Code) with full lineage tracking. Features a context engineering layer with MCPs, Skills, and A2A protocols—admins define guidance structures while end users interact through a brief agent → planner agent → specialized agent teams pipeline. Generates documents, images, data files, and deployable web applications.
6+
Agent Teams
100+
Beta Users
10+
Models Supported
Experience IQ
Enterprise NL-to-SQL Agent
LLM agent translating natural language into SQL over a 110-table Spanner database via intent routing, dynamic schema retrieval, and SME-curated few-shot curriculum—inspired by Nvidia Voyager’s skill library. Systematic evaluation cycles categorized failure modes by root cause, improving query quality from 72% to 80% and reliability from 75% to 94%. Includes an LLM-as-a-Judge evaluator and a Looker LLMOps dashboard for real-time drift monitoring.
15K
Users
94%
Reliability
−85%
Search Time
Document Processing SDK
Intelligent Extraction Platform
Microservice for certificate-of-insurance-to-subcontract comparisons with rule-specific preprocessing and structured prompts routed to Gemini. Applied RLVR’s strict verifier paradigm to design deterministic verification functions sourced from SMEs. Core logic extracted into a reusable SDK—users provide a prompt, JSON schema, and optional verification function; the SDK handles the rest.
85–90%
Accuracy
1.5–3K
Users
−90%
Review Time
Research
Published work
Context-specific feature augmentation for improving social determinants of health extraction
L. Gong, A. Shor, A. Zhang, and K. Jha
Augments EHR discharge summaries with biomedical literature context to improve SDoH extraction. Introduces an adaptive feature infusion strategy combining information from different sources, significantly outperforming baselines on the MIMIC-SDoH dataset.
Read paperClustering entity relationship diagrams: Enhancing feedback quality and grading consistency in large database courses
S. Thadani, A. Shor, S. Ahn, L. Gong, A. Alawini, and H. Benotman
A tool for clustering Entity Relationship Diagrams using object detection, OCR, and clustering to group similar student submissions. Identifies common approaches and mistakes, improving feedback and simplifying grading at scale.
Read paperA holistic framework for analyzing the COVID-19 vaccine debate
M. L. Pacheco, T. Islam, M. Mahajan, A. Shor, M. Yin, L. Ungar, and D. Goldwasser
Proposes a framework connecting stance analysis, reason analysis, and moral sentiment analysis to combat misinformation. Analyzed temporal trends in 2.7M COVID-19 tweets and validated BERT sentiment classifiers for a vaccine debate framework.
Read paperNeural operator: Is data all you need to model the world?
H. Viswanath, M. A. Rahman, A. Vyas, A. Shor, et al.
A comprehensive survey of neural operator architectures for learning mappings between function spaces, exploring whether data-driven approaches can replace traditional physics-based modeling.
Read paperSkills
What I work with
A diverse skillset spanning theoretical foundations and practical applications, with a focus on designing scalable AI solutions.
AI & NLP
Platforms & Infrastructure
Frameworks & Languages
Currently
What I'm thinking about
Problems and ideas I'm actively exploring outside of day-to-day work.
Context Engineering
How do we give agents the right information at the right time? Exploring MCP, A2A, and skills architectures for reliable multi-agent systems.
LLM Evaluation at Scale
Systematic approaches to catching agent drift and measuring reliability. From trace analysis and failure mode categorization to LLM-as-a-Judge evaluators.
Research-Informed Engineering
Bridging the gap between what papers discover and what products need. Applying frameworks like RLVR to build deterministic verification in production systems.
AI in High-Stakes Domains
Healthcare, engineering, education — domains where AI mistakes have real consequences. How do we build systems that earn trust?
Contact
Let's connect
Open to research collaborations, speaking opportunities, and conversations about turning research into production systems.
© 2026Andrey Shor
