Why 80% Accuracy Fails in AI Agents: Inside Amazon AGI Labs
At NeurIPS 2025, we sat down with Deniz Birlikci, Member of Technical Staff at Amazon AGI Labs, to discuss the cutting edge of web agents. Deniz dives deep into the challenges of Reinforcement Learning (RL) in sparse environments, explaining why "mocking" a website to 80% fidelity isn't enough to train reliable models.
He also introduces Amazon’s newly released Nova Act model, explains why agents should be viewed as "stacks" rather than just models, and paints a picture of a future where agents act as the API for the entire web.
A huge thank you to Lambda for sponsoring the SAIL booth at NeurIPS 2025 and making these interviews possible.
In this video:
00:00 - Introduction & Deniz’s role at Amazon AGI Labs
00:32 - The state of RL research at NeurIPS 2025
01:20 - Unreliable environments & the challenge of sparse rewards
02:49 - Why 80% website fidelity creates "noise" and failure
04:51 - Launching Amazon Nova Act: Building for enterprise & developers
06:00 - The "Agent Stack": Why the model is only one piece of the puzzle
07:02 - The Future: Agents as Connective Tissue (APIs) vs. Assistants
08:52 - The startup culture inside Amazon AGI SF Labs
#NeurIPS2025 #ArtificialIntelligence #WebAgents #ReinforcementLearning #AmazonAGI #NovaAct #MachineLearning #Lambda #SAIL
Posted December 30, 2025
click to rate
Share this page with your family and friends.