Here’s an analogy: Freeways didn’t exist in the U.S. until after 1956, envisioned by President Dwight D. Eisenhower’s administration — yet super fast, powerful cars like Porsche, BMW, Jaguars, Ferrari and others had been around for decades. You could say AI is at that same pivot point: While models are becoming increasingly more capable, performant and sophisticated, the critical infrastructure they need to bring about true, real-world innovation has yet to be fully built out. “All we have done is create some very good engines for a car, and we are getting super excited, as if we have this fully functional highway system in place,” Arun Chandrasekaran, Gartner distinguished VP analyst, told VentureBeat. This is leading to a plateauing, of sorts, in model capabilities such as OpenAI’s GPT-5: While an important step forward, it only features faint glimmers of truly agentic AI.
AI Scaling Hits Its Limits
Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:
Turning energy into a strategic advantage
Architecting efficient inference for real throughput gains
Unlocking competitive ROI with sustainable AI systems
Secure your spot to stay ahead: https://bit.ly/4mwGngO
“It is a very capable model, it is a very versatile model, it has made some very good progress in specific domains,” said Chandrasekaran. “But my view is it’s more of an incremental progress, rather than a radical progress or a radical improvement, given all of the high expectations OpenAI has set in the past.”
GPT-5 improves in three key areas
To be clear, OpenAI has made strides with GPT-5, according to Gartner, including in coding tasks and multi-modal capabilities. Chandrasekaran pointed out that OpenAI has pivoted to make GPT-5 “very good” at coding, clearly sensing gen AI’s enormous opportunity in enterprise software engineering and taking aim at competitor Anthropic’s leadership in that area.
Meanwhile, GPT-5’s progress in modalities beyond text, particularly in speech and images, provides new integration opportunities for enterprises, Chandrasekaran noted. GPT-5 also does, if subtly, advance AI agent and orchestration design, thanks to improved tool use; the model can call third-party APIs and tools and perform parallel tool calling (handle multiple tasks simultaneously). However, this means enterprise systems must have the capacity to handle concurrent API requests in a single session, Chandrasekaran points out.
Multistep planning in GPT-5 allows more business logic to reside within the model itself, reducing the need for external workflow engines, and its larger context windows (8K for free users, 32K for Plus at $20 per month and 128K for Pro at $200 per month) can “reshape enterprise AI architecture patterns,” he said. This means that applications that previously relied on complex retrieval-augmented generation (RAG) pipelines to work around context limits can now pass much larger datasets directly to the models and simplify some workflows. But this doesn’t mean RAG is irrelevant; “retrieving only the most relevant data is still faster and more cost-effective than always sending massive inputs,” Chandrasekaran pointed out.
Gartner sees a shift to a hybrid approach with less stringent retrieval, with devs using GPT-5 to handle “larger, messier contexts” while improving efficiency. On the cost front, GPT-5 “significantly” reduces API usage fees; top-level costs are $1.25 per 1 million input tokens and $10 per 1 million output tokens, making it comparable to models like Gemini 2.5, but seriously undercutting Claude Opus. However, GTP-5’s input/output price ratio is higher than earlier models, which AI leaders should take into account when considering GTP-5 for high-token-usage scenarios, Chandrasekaran advised.
Bye-bye previous GPT versions (sorta)
Ultimately, GPT-5 is designed to eventually replace GPT-4o and the o-series (they were initially sunset, then some reintroduced by OpenAI due to user dissent). Three model sizes (pro, mini, nano) will allow architects to tier services based on cost and latency needs; simple queries can be handled by smaller models and complex tasks by the full model, Gartner notes. However, differences in output formats, memory and function-calling behaviors may require code review and adjustment, and because GPT-5 may render some previous workarounds obsolete, devs should audit their prompt templates and system instructions.
By eventually sunsetting previous versions, “I think what OpenAI is trying to do is abstract that level of complexity away from the user,” said Chandrasekaran. “Often we’re not the best people to make those decisions, and sometimes we may even make erroneous decisions, I