Back to News Hub
☁️Google Cloud AI
June 29, 2026
Product Updates

Synthesize the big picture and analyze trends with BigQuery's AI.AGG function

Overview

Google has announced a preview of BigQuery's AI.AGG() function, which allows users to analyze millions of rows of unstructured and multimodal data using natural-language instructions in a single SQL query. This new capability enables teams to summarize, categorize, and extract insights from logs, documents, reviews, and images without manual review, significantly reducing time and labor costs.

Key Takeaways

  • AI.AGG() lets you write one line of SQL to summarize or synthesize information across millions of rows of unstructured or multimodal data using natural-language prompts
  • The function can analyze system logs to identify hidden inefficiencies, latency spikes, and unusual patterns that don't trigger fatal errors
  • AI.AGG() can extract categories and insights from unstructured text and image data, such as product descriptions and customer reviews
  • Careful prompt engineering-such as explicitly permitting the model to say 'everything is fine'-prevents hallucinations and improves accuracy
  • The function integrates with BigQuery's other managed AI functions for complex, intelligent data analysis at scale
Synthesize the big picture and analyze trends with BigQuery's AI.AGG function

Understanding AI.AGG() and its core capabilities

AI.AGG() is a new BigQuery function designed to handle analysis at scale in a fundamentally different way than row-level AI operations.

  • Works with unstructured data such as logs, documents, product descriptions, and images
  • Aggregates information across millions of rows in a single SQL statement
  • Accepts natural-language instructions to guide summarization and synthesis tasks
  • Enables questions like 'What are the top three feature requests among negative product reviews?' or 'What errors are users seeing most frequently?'

While BigQuery already offers powerful AI functions for analyzing individual rows of data, scaling unstructured data analysis requires a different architectural approach. AI.AGG() fills this gap by allowing analysts and engineers to ask complex questions about large datasets without manual review. This is particularly valuable for organizations operating at scale, where thousands of logs, reviews, or documents would be impossible to investigate manually.

Analyzing system logs to uncover hidden inefficiencies

One of the most practical use cases for AI.AGG() is analyzing system logs at scale to identify problems that don't trigger fatal errors.

  • Log messages, warnings, and stack traces contain valuable information but are time-consuming to review manually
  • AI.AGG() can group and prioritize logs, helping teams decide which issues to investigate first
  • The function surfaces hidden inefficiencies, latency spikes, repeated retries, and unusual patterns in seemingly normal logs
  • BigQuery's own engineering team used AI.AGG() during development to identify edge cases related to input handling

Distributed systems like Apache Spark clusters often encounter issues-such as memory thrashing, clock drift, or broadcast bottlenecks-without throwing a FATAL error. These problems can degrade performance silently, making them difficult to diagnose through traditional log analysis. By using AI.AGG() on a public dataset of Apache Spark INFO logs, analysts can quickly summarize normal operations while identifying anomalies.

The function achieves this through careful prompt construction. For example, explicitly instructing the model to acknowledge 'normal operation' messages prevents hallucination of false errors, while directing it to hunt for specific performance issues. A sample query groups logs by component, counts entries, and generates a two-sentence analysis-first describing normal operation, then identifying hidden inefficiencies. The results successfully surface critical diagnostic insights that would otherwise require hours of manual review.

Extracting categories from unstructured text and image data

Beyond logs, AI.AGG() enables powerful categorization and extraction tasks on mixed data types including text and images.

  • Works with product catalogs, reviews, and multimodal datasets containing descriptions and images
  • Can automatically categorize products, extract features, and identify patterns across unstructured content
  • Reduces the initial hurdle of manual labeling and classification tasks
  • Scales to handle diverse data types in a single analysis

The fictional cymbal_pets dataset demonstrates this flexibility. This pet supply shop catalog includes product names, descriptions, and images-a realistic mix of structured metadata and unstructured content. Rather than manually reviewing hundreds or thousands of products to categorize them, analysts can use AI.AGG() to automatically extract categories, identify common features, and organize products based on natural-language analysis. This approach dramatically reduces the time required for data preparation and quality assurance tasks.

Prompt engineering best practices for AI.AGG()

Effective use of AI.AGG() requires thoughtful prompt design to guide the model and prevent common pitfalls.

  • Explicitly permit the model to acknowledge when everything is operating normally, preventing false-positive error detection
  • Give specific instructions for the types of inefficiencies or patterns to hunt for
  • Request structured output formats, such as two-sentence summaries with distinct sections
  • Use domain-specific language and context to improve accuracy and relevance

The difference between a well-crafted and poorly-crafted prompt in AI.AGG() can significantly affect result quality. When analyzing logs, for example, a prompt that doesn't explicitly allow the model to confirm normal operation may lead to hallucinated errors and false alarms. By contrast, a prompt that says 'provide a 2-sentence summary-first, describe normal operation; second, identify hidden inefficiencies'-creates clear boundaries and expectations. This structured approach helps the model focus on legitimate anomalies while acknowledging baseline expected behavior.

Integration with BigQuery's broader AI ecosystem

AI.AGG() does not operate in isolation but complements BigQuery's existing managed AI functions.

  • Can be combined with other BigQuery AI functions for multi-stage, intelligent data pipelines
  • Enables complex workflows that mix row-level analysis with aggregate-level synthesis
  • Supports common loading methods including the UI, CLI, and client libraries
  • Maintains BigQuery's scalability and performance characteristics for enterprise-scale analysis

AI.AGG() represents an important addition to BigQuery's managed AI toolkit. While existing functions excel at analyzing individual rows, AI.AGG() handles the aggregation and synthesis layer, enabling workflows where raw data is first processed at the row level, then synthesized and analyzed across millions of records. This layered approach allows organizations to build sophisticated data pipelines that combine the strengths of different AI techniques without managing separate ML infrastructure or external services.

Practical use cases and business impact

AI.AGG() unlocks several high-impact use cases across customer support, product development, and operations.

  • Identifying top feature requests from negative product reviews to inform product roadmaps
  • Diagnosing root causes of the most frequent errors without manually reviewing thousands of logs
  • Detecting specific failure scenarios in automated systems, such as chatbots or customer service agents
  • Reducing investigation time from hours or days to minutes

Organizations can immediately apply AI.AGG() to pressing business problems. Product teams can mine customer review data to extract actionable feature requests, prioritizing improvements based on real user feedback. Operations teams can shift from reactive firefighting to proactive problem identification by continuously analyzing logs for emerging patterns. Support teams can identify gaps in automated agents and prioritize improvements where they have the greatest impact on customer satisfaction. The common thread is that AI.AGG() transforms labor-intensive manual analysis into fast, scalable, data-driven decision-making.

Getting started with AI.AGG()

The preview availability of AI.AGG() means teams can begin experimenting with this capability now.

  • Load sample data into BigQuery using UI, CLI, or client libraries
  • Write SQL queries that use AI.AGG() with natural-language instructions
  • Test on real datasets like system logs, product catalogs, or review data
  • Iterate on prompts to refine accuracy and relevance of results

Teams interested in AI.AGG() can start with public datasets or their own data. The function's SQL-native interface means no special ML expertise is required-analysts familiar with standard BigQuery queries can immediately begin working with it. Starting with a clear use case, such as log analysis or review categorization, helps teams understand the function's strengths and limitations while building confidence in the results.

Frequently Asked Questions

How does AI.AGG() differ from existing BigQuery AI functions?

Existing BigQuery AI functions analyze individual rows of data, while AI.AGG() synthesizes and summarizes information across millions of rows in a single query using natural-language instructions. This makes it particularly suited for unstructured data like logs, documents, and images where aggregate-level insights are needed.

What types of data can AI.AGG() analyze?

AI.AGG() can analyze unstructured and multimodal data including system logs, product descriptions, customer reviews, images, and documents. It works by aggregating this data according to a natural-language instruction, making it flexible for a wide range of data types.

Why is prompt engineering important for AI.AGG()?

Careful prompt design guides the model to provide accurate, relevant results and prevents hallucinations. For example, explicitly telling the model it is acceptable to report 'everything is fine' prevents false-positive error detection, while specific instructions help the model focus on the insights you actually need.

Can AI.AGG() be combined with other BigQuery functions?

Yes, AI.AGG() integrates with BigQuery's broader AI ecosystem, allowing organizations to build multi-stage data pipelines that combine row-level analysis with aggregate-level synthesis for complex, intelligent analytics.

What are some immediate use cases for AI.AGG()?

Practical applications include extracting feature requests from product reviews, identifying the most common errors in system logs, detecting failure patterns in automated agents, and categorizing unstructured product or customer data at scale.

AI.AGG() brings the power of large language models to BigQuery's SQL environment, enabling organizations to unlock insights from unstructured data at scale without manual review or external ML infrastructure.

Continue Learning

Originally published by Google Cloud AI
Read the original

Comments

Sign in to join the conversation