Data doesn’t just appear in the right place for enterprise analytics or AI; it requires preparation and direction through data pipelines. This is the realm of data engineering, which has long been a tedious and thankless task for enterprises.
Google Cloud is addressing the tedium of data preparation with new AI agents covering the entire data lifecycle. The Data Engineering Agent in BigQuery automates complex pipeline creation using natural language commands. A Data Science Agent turns notebooks into smart workspaces capable of autonomously conducting machine learning workflows. Additionally, the improved Conversational Analytics Agent now features a Code Interpreter for advanced business user Python analytics.
“When I think about who is doing data engineering today, it’s not just engineers. Data analysts, data scientists, every data persona complains about how hard it is to find data, how hard it is to wrangle data, how hard it is to get access to high-quality data,” Yasmeen Ahmad, managing director, data cloud at Google Cloud, told VentureBeat. “Most of the workflows that we hear about from our users are 80% mired in those toilsome jobs around data wrangling, data, engineering, and getting to good-quality data they can work with.”
Targeting the data preparation bottleneck
Google built the Data Engineering Agent in BigQuery to create complex data pipelines using natural language prompts. Users describe multi-step workflows, and the agent manages the technical implementation, including data ingestion, transformation, and quality checks.
The agent writes complex SQL and Python scripts, handles anomaly detection, schedules pipelines, and troubleshoots failures, tasks traditionally requiring significant engineering expertise and maintenance.
The agent decomposes natural language requests into multiple steps, including data source connections, creating table structures, loading data, identifying primary keys for joins, reasoning over data quality issues, and applying cleaning functions.
“Ordinarily, that entire workflow would have been writing a lot of complex code for a data engineer and building this complex pipeline and then managing and iterating that code over time,” Ahmad explained. “Now, with the data engineering agent, it can create new pipelines for natural language. It can modify existing pipelines. It can troubleshoot issues.”
How enterprise data teams will work with the data agents
Data engineers are often very hands-on, and commonly used tools for building data pipelines, including streaming, orchestration, quality, and transformation, remain necessary with the new data engineering agent.
“Engineers still are aware of those underlying tools because what we see from how data people operate is, yes, they love the agent, and they actually see this agent as an expert, partner, and collaborator,” Ahmad said. “But often our engineers actually want to see the code; they actually want to visually see the pipelines that have been created by these agents.”
While data engineering agents can work autonomously, data engineers can see the agent’s actions. Data professionals will often review the code written by the agent and suggest further adjustments or customizations to the data pipeline.
Building a data agent ecosystem with an API foundation
Several vendors are building agent AI workflows in the data space, including startups like Altimate AI and large vendors like Databricks, Snowflake, and Microsoft.
Google’s approach differs by developing its agentic AI services for data with the Gemini Data Agents API, allowing developers to embed Google’s natural language processing and code interpretation capabilities into their applications. This marks a shift from closed, first-party tools to an extensible platform approach.
“Behind the scenes for all of these agents, they’re actually being built as a set of APIs,” Ahmad said. “With those API services, we increasingly intend to make those APIs available to our partners.”
The umbrella API service will release foundational API services and agent APIs. Google has lighthouse preview programs where partners embed these APIs into their interfaces, including notebook providers and ISV partners building data pipeline tools.
What it means for enterprise data teams
For enterprises aiming to lead in AI-driven data operations, this announcement accelerates the move toward autonomous data workflows, potentially offering significant competitive advantages in time-to-insight and resource efficiency. Organizations should evaluate current data team capacity and consider pilot programs for pipeline automation.
For enterprises planning later AI adoption, integrating these capabilities into existing Google Cloud services changes the landscape, making advanced data agents standard rather than premium. This shift could elevate baseline expectations for data platform capabilities industry-wide.
Organizations must balance efficiency gains against the need for oversight and control. Google’s transparency approach may offer a middle ground, but data leaders should develop governance frameworks for autonomous agent operations before widespread deployment.
The emphasis on API availability suggests custom agent development will become a competitive differentiator. Enterprises should explore leveraging these foundational services to build domain-specific agents that address their unique business processes and data challenges.