1. Introduction & High-Level Overview

1.1. Document Purpose

This document provides a detailed technical explanation of the end-to-end application workflow for the Maven AI agent. It outlines the sequence of operations, from receiving a user's initial prompt to delivering a final, contextualized response. The focus is on the agent's internal decision-making processes, its interaction with various tools, and its state management strategy.

1.2. Core Architecture

Maven AI is built upon a sophisticated, modern technology stack designed for creating responsive, stateful, and intelligent conversational applications. The key components of its architecture are:

  • Vercel AI SDK (RSC): The foundation of the application is Vercel's AI SDK, specifically leveraging its support for React Server Components (RSC). This allows for a powerful architecture where UI components can be streamed from the server, enabling dynamic, real-time updates as the AI agent processes information and generates responses.

  • Google Gemini Models: The agent's intelligence is powered by Google's family of Gemini models (e.g., gemini-2.0-flash, gemini-2.0-flash-lite). These models are used for natural language understanding, function calling (tool use), and generating human-like text responses.

  • Tool-Based Agent Architecture: The agent follows a modular, tool-based design. Instead of being a monolithic system, its capabilities are divided into distinct, special-purpose "tools" (e.g., searchProduct, getProductDetails). This makes the system extensible, maintainable, and allows the primary agent—the Orchestrator—to delegate complex tasks to the appropriate specialist function.

1.3. High-Level Workflow

The overall flow begins when a user submits a request. This request is first processed by the central Orchestrator, which analyzes the user's intent. Based on this analysis, it either generates a direct textual response or delegates the task to a specialized tool. The selected tool then executes its own sub-workflow, potentially calling other sub-tools or external APIs, before returning its result. Finally, the AI generates a concluding insight based on the tool's output, and the entire interaction is saved and rendered in the UI.

[Top-Level-Agent-Workflow-Diagram]

Workflow Diagram

2. The Core Orchestrator (action.tsx)

2.1. Overview

The orchestrator function, located in action.tsx, serves as the central nervous system and primary entry point for the Maven AI agent. It is responsible for receiving all user inputs, interpreting their intent, managing the conversational state, and deciding the appropriate next step. It acts as a router, determining whether to handle a request with a direct language-based response or to invoke a specialized tool for more complex tasks.

2.2. Request Lifecycle

The lifecycle of a request through the orchestrator follows a precise, state-driven sequence:

  1. Initiation & Payload Processing: The workflow begins when a user action triggers the orchestrator. The input is received as a PayloadData object, which can contain various forms of user input (textInput, attachProduct, etc.). This payload is immediately standardized into a unified user message format using the toUnifiedUserMessage mutator.

  2. State Hydration: The orchestrator retrieves the current conversation's state using getMutableAIState<typeof AI>(). This provides access to the existing chat history (messages) and other metadata stored in the AIState object.

  3. State Update: The new, unified user message is appended to the message history within the mutable AI state. This ensures that the agent has the full context of the conversation for its next decision.

  4. The Decision Engine (streamUI): This is the core of the orchestrator's logic. The streamUI function from the Vercel AI SDK is invoked with the following configuration:

    • Model: google("gemini-2.0-flash-exp").
    • Messages: The complete, updated conversation history.
    • System Prompt: A specialized prompt, ORCHESTRATOR_SYSTEM_INSTRUCTION, which instructs the model on how to behave as an orchestrator and how to use the available tools.
    • Tools: The set of available functions that the model can choose to call.

    Based on the system prompt and the conversation history, the Gemini model decides between two primary paths:

    • Text Generation: If the user's query is conversational or does not require a specific tool, the model generates a text response directly.
    • Tool Call: If the query maps to a defined capability (e.g., "search for laptops"), the model outputs a structured tool call request with the appropriate parameters.

2.3. Tool Integration

The orchestrator is explicitly provided with a suite of tools it can delegate tasks to. These tools are defined in the tools property of the streamUI call and represent the agent's primary capabilities:

  • recommendator: For providing product recommendations.
  • searchProduct: For searching for products from various sources.
  • getProductDetails: For retrieving detailed information about a specific product.
  • productsComparison: For comparing two or more products.
  • inquireUser: For asking the user clarifying questions when the input is ambiguous.

When the LLM decides to use a tool, streamUI automatically invokes the corresponding function with the arguments provided by the model.

2.4. Finalization & State Persistence

The agent's lifecycle is managed by a set of event handlers within the createAI configuration:

  • Response Rendering: The output of streamUI is a React Node (display) that is streamed to the client, allowing the user to see UI elements like loading indicators and tool outputs in real-time.
  • Related Queries: After a primary response is generated (either text or from a tool), a subsequent LLM call is made using streamObject to generate a set of contextually relevant follow-up questions (RelatedQuery). This helps guide the user and enhance the conversational flow.
  • State Persistence (onSetAIState): Once the entire user-assistant turn is complete (done is true), the final, updated AIState is persisted to the database via the saveAIState function.
  • UI State Hydration (onGetUIState): When a user revisits a conversation, this function is called to retrieve the saved AIState from the database and reconstruct the corresponding UI using the mapUIState mapper.

3. Tool: Search Product

3.1. Purpose

The toolSearchProduct is a multi-stage, adaptive tool designed to find products based on a user's query. Its primary strength lies in its ability to follow different execution paths based on the specified data source (reffSource). It incorporates a "sub-tool chain" for query pre-processing to ensure high-quality search results and employs a combination of external APIs and web scraping for data retrieval.

3.2. Sub-Tool Chain: Query Pre-processing

Before any searching occurs, the user's query is passed through a two-step pre-processing chain to enhance its quality and validate its legitimacy. This ensures that subsequent API calls and data extraction tasks are based on a solid foundation.

  1. Query Validation (queryValidator):

    • Objective: To determine if the user's query refers to a real, verifiable electronic product.
    • Mechanism: This sub-tool uses generateObject with a Gemini model that has useSearchGrounding: true. It leverages the model's ability to perform real-time web searches to check the query against authoritative sources like manufacturer websites and major tech publications.
    • Output: It returns a structured object (QueryValidationData) containing a validation score, a boolean is_valid flag, and detailed reasoning. If the query is deemed invalid, the entire tool workflow can be gracefully halted, preventing wasted resources.
  2. Query Enhancement (queryEnhancer):

    • Objective: To transform vague or incomplete queries into specific, highly-searchable terms.
    • Mechanism: Following successful validation, this sub-tool also uses generateObject with a search-grounded model. It analyzes the query and adds critical details. For example, "MacBook" might be enhanced to "MacBook Air M3 (15-inch, 2024)".
    • Output: It produces a QueryEnhancedData object containing the new, enhanced query string, which is then used for all subsequent search operations.

3.3. Data Sourcing Logic: The Core Bifurcation

After pre-processing, the tool's execution path diverges based on the reffSource provided in the requestOption.

Path A: Insight & Shopee Sources (reffSource !== 'tokopedia')

This path is executed for global insight searches or searches targeted at specific marketplaces like Shopee. It relies on aggregating data from multiple search APIs.

  1. Initiation: The workflow invokes the searchProductInsight sub-tool.
  2. Parallel Search: searchProductInsight makes concurrent calls to the Serper API for four different types of search results: web, shopping, images, and videos. This provides a comprehensive, multi-faceted view of the product online.
  3. Data Aggregation & Structuring: The results from all Serper searches are aggregated into a single payload. This payload is then passed to a Gemini model via generateObject with a strict schema (dataSourceInsightSchema). The model's task is to analyze this rich dataset and extract a clean, structured DataSourceInsight object, which contains key product attributes like title, estimated price, and image/video URLs.
  4. UI Streaming & Finalization: The resulting structured data is immediately rendered in the UI via the InsightProductCard component. A final call to streamText generates a human-readable summary of the findings, which is displayed as the assistant's message.

Path B: Tokopedia Source (Scraping Workflow)

This path is executed when the data source is specified as 'tokopedia'. It relies on direct web scraping and subsequent data extraction.

  1. Scraping: The tool calls the handleScrapingWithCache utility, which wraps the Firecrawl API. Firecrawl is used to scrape the full page content (in Markdown format) and take a screenshot of the target Tokopedia URL. The caching layer prevents redundant scraping of the same URL.
  2. Data Extraction: The raw Markdown content from the scrape is passed to a Gemini model via streamObject with the productsSchema. The model's prompt instructs it to act as an expert data extractor, parsing the messy Markdown to identify and list out all products found on the page in a structured format.
  3. Real-time UI Updates: As the model streams the extracted product objects, the UI is updated in real-time using the StreamProductSearch component. This provides immediate feedback to the user as products are found.
  4. Insight Generation: Once all products are extracted, a final call to streamText generates a narrative summary of the search results.

3.4. Finalization and State Management

Regardless of the path taken, the tool concludes with a standardized finalization sequence:

  1. Tool Result Formatting: The complete output, including arguments and the final structured data, is packaged into an ExtendedToolResult object.
  2. State Mutation (mutateTool): This critical utility function adds the tool's execution result and the final assistant summary to the conversation's message history in the mutable AI state.
  3. Database Persistence: The final structured data object is saved to the database using createObjectEntry, and the tool result message is saved via createToolDataEntry. This ensures that the data can be referenced in subsequent turns (e.g., by toolGetProductDetails).

3.5. Diagram

[Search-Product-Workflow-Diagram]

Workflow Diagram

4. Tool: Get Product Details

4.1. Purpose

The toolGetProductDetails is responsible for gathering, processing, and presenting in-depth information about a single, specific product. It acts as a data aggregator and enrichment engine, building upon the initial discovery work done by toolSearchProduct or by starting fresh from a user-provided URL. Its workflow is highly adaptive, leveraging different data retrieval and processing strategies based on the source of the product information.

4.2. Invocation Paths & Data Sourcing

The tool's behavior is primarily determined by its input arguments, leading to two distinct operational paths:

  • Path A: Invocation by callId (Insight/Global Source):

    • Trigger: This path is used when the user clicks a product card that was generated by toolSearchProduct using an "insight" source. The key parameter is the callId, which is a unique identifier from the previous search result.
    • Workflow:
      1. Data Retrieval: The tool first calls getObjectEntry<SearchProductResult>(prevCallId) to fetch the cached, structured product data from the database. This avoids redundant searching and provides immediate access to the product's title, images, videos, etc.
      2. Data Enrichment (2-Step LLM Chain): The tool then initiates a two-step enrichment process to build a comprehensive profile:
        • Step 1 (Researcher): It calls streamText with a powerful model (gemini-2.0-flash) and the PRODUCT_RESEARCHER_INSIGHT_SYSTEM_PROMPT. This model, with search grounding enabled, performs a broad web search to generate a detailed, narrative-style markdown document about the product, covering specifications, features, and reviews.
        • Step 2 (Extractor): The generated markdown from Step 1 is then passed to a second, more economical model (gemini-2.0-flash-lite) via streamObject. Guided by the PRODUCT_EXTRACTOR_INSIGHT_SYSTEM_PROMPT, this model's sole job is to parse the markdown and extract a structured JSON object containing the detailed product specifications. This two-step process ensures high-quality, structured data extraction from unstructured, search-augmented text.
  • Path B: Invocation by link (Tokopedia/Scraping Source):

    • Trigger: This path is used when the user provides a direct URL to a product page, typically from a source like Tokopedia.
    • Workflow:
      1. Scraping: Similar to toolSearchProduct, it uses handleScrapingWithCache to invoke the Firecrawl API, retrieving the page's content as markdown and a screenshot.
      2. Data Extraction: The scraped markdown is passed to a Gemini model via streamObject, guided by the PRODUCT_DETAILS_EXTRACTOR system prompt. The model parses the raw HTML content to extract a structured JSON object of the product's details.

4.3. Optional Researcher Module

For any invocation path, the tool can be enhanced with an optional, high-level researcher capability:

  • Mechanism: If the requestOption.onRequest.search flag is set to true, the tool makes an upfront call to externalTavilySearch.
  • Purpose: The Tavily API performs a comprehensive web search to answer the user's core query (query). The resulting answer provides additional, high-level context that is fed into the main data extraction LLM calls, enriching the final output with more diverse information.

4.4. Finalization and Insight Generation

Both paths converge into a common finalization sequence:

  1. UI Streaming: Throughout the process, the UI is continuously updated. Components like StreamProductDetails and StreamProductDetailsInsight render data as it becomes available from the streaming LLM calls (streamObject and streamText).
  2. Insight Generation: After the primary data object has been constructed, a final call to streamText is made. This LLM call takes the structured product data, the user's original message, and any data from the Tavily researcher module. Its goal is to generate a final, human-readable insight or summary that directly addresses the user's intent.
  3. State Management and Persistence: The process concludes with the standard mutateTool and createToolDataEntry calls to update the AIState and save the detailed product data to the database, making it available for future actions like toolProductComparison.

4.5. Diagram

[Get-Details-Workflow-Diagram]

Workflow Diagram

5. Tool: Product Comparison

5.1. Purpose

The toolProductComparison is a powerful synthesis tool designed to generate a detailed, side-by-side comparison of two products. It leverages previously gathered information to provide users with both objective, data-driven comparisons and subjective, AI-generated insights to aid in their decision-making process.

5.2. Workflow

The tool operates using a sequential, multi-step LLM chain that transforms raw data into a polished, actionable comparison.

  1. Initiation by callIds:

    • Trigger: The tool is invoked by the Orchestrator when a user expresses a desire to compare products. It receives an array of compare objects, where each object contains a callId.
    • Mechanism: Each callId is a unique pointer to a detailed product profile that was generated and saved to the database by a previous toolGetProductDetails execution.
  2. Parallel Data Aggregation:

    • Mechanism: The tool uses Promise.all to execute multiple getObjectEntry calls concurrently. This efficiently fetches the complete, structured ProductDetails data for every requested product from the database.
    • Error Handling: If any callId is invalid or its corresponding data cannot be found, the tool's execution is gracefully halted, and an informative error message is displayed to the user.
  3. Step 1: Structured Comparison Generation (The Extractor):

    • Objective: To transform the raw data of multiple, separate products into a single, unified, and structured comparison table.
    • Mechanism: The retrieved JSON data for all products are aggregated into a single, large payload. This payload is then passed to a Gemini model (gemini-2.0-flash) via the streamObject function. The process is guided by the strict rules and desired output format defined in the COMPARISON_EXTRACTOR_SYSTEM_INSTRUCTION.
    • Output & UI Streaming: The LLM analyzes the combined dataset and generates a new, structured JSON object that logically organizes the products' specifications for a direct, feature-by-feature comparison. As this comparison object is generated, it is streamed to the client and rendered progressively by the StreamProductComparison component, allowing the user to see the comparison table being built in real-time.
  4. Step 2: Qualitative Insight Generation (The Synthesizer):

    • Objective: To provide a human-like analysis and summary of the objective data, tailored to the user's needs.
    • Mechanism: The structured comparison JSON generated by the Extractor in the previous step is immediately fed into a second LLM call. This call uses the streamText function and is governed by the COMPARISON_INSIGHT_SYSTEM_INSTRUCTION. The prompt includes both the comparison data and the user's original query (userIntent).
    • Output: The model generates a narrative summary that goes beyond the raw data. It highlights key trade-offs, points out the pros and cons of each product, and often provides a concluding recommendation based on the user's stated intent (e.g., "If you prioritize battery life, Product A is the better choice, but if gaming performance is more important, Product B is superior."). This text is streamed to the UI as the assistant's final message.
  5. Finalization and State Persistence:

    • The complete tool output, including the structured comparison object and the qualitative insight, is saved to the database under a new unique callId using createObjectEntry.
    • The standard mutateTool and createToolDataEntry functions are invoked to append the results to the AIState, ensuring the conversation history accurately reflects the comparison task.

5.3. Diagram

[Comparison-Tool-Diagram]

Workflow Diagram

6. Tool: Inquire User

6.1. Purpose

The toolInquireUser is a conversational tool designed to make the agent more interactive and robust. Its primary function is to handle ambiguity by proactively asking the user for clarification when a request is incomplete or unclear. Instead of failing or making a poor assumption, the agent uses this tool to pause its current task and gather the specific information it needs to proceed, leading to a more natural and effective user experience.

6.2. Workflow

The tool's workflow is unique in that its final output is not data for further processing, but an interactive UI component designed to solicit a direct response from the user.

  1. Invocation by the Orchestrator:

    • Trigger: The Orchestrator's LLM invokes toolInquireUser when it analyzes the conversation history and determines that it cannot fulfill the user's request without more information. For example, if a user asks to "compare the laptops," the Orchestrator will use this tool to ask, "Which specific laptops would you like me to compare?"
    • Payload: The LLM passes a payload to the tool that typically includes the reason for the inquiry (e.g., "The user did not specify which products to compare").
  2. Structured Inquiry Generation:

    • Objective: To create a well-formed question with predefined options for the user.
    • Mechanism: The tool uses the streamObject function with a Gemini model (gemini-2.0-flash-lite). The operation is guided by the INQUIRY_CRAFTER system prompt and the strict inquireUserSchema.
    • Output: The LLM generates a structured JSON object containing all the necessary elements for an interactive form:
      • A clear, concise question to present to the user.
      • An array of potential options that can be rendered as interactive buttons.
      • A flag indicating if a free-text input field should be included for open-ended responses.
  3. Interactive UI Rendering:

    • Mechanism: The generated inquiry object is passed as a prop to the UserInquiry React component.
    • Output: This component renders the inquiry as a user-facing form within the chat interface. The user can then respond by clicking one of the predefined option buttons or by filling out the text field.
  4. Continuation of Conversation:

    • The user's response to the inquiry form is captured and sent back to the Orchestrator as a new PayloadData object in the subsequent turn.
    • Now equipped with the necessary information, the Orchestrator can re-evaluate the user's original request and successfully invoke the appropriate tool (e.g., toolProductComparison) with the clarified parameters.

6.3. Diagram

[Inquiry-Tool-Diagram]

Workflow Diagram

7. Tool: Recommendator

7.1. Purpose

The toolRecommendator acts as a proactive product discovery engine. It is designed to handle open-ended user requests where a specific product is not mentioned. Instead of requiring the user to know what to search for, this tool takes a high-level intent (e.g., "a good camera for travel") and uses its own intelligence to find and suggest suitable products.

7.2. Workflow

Similar to other tools, the recommendator uses a two-step LLM chain to first generate structured data and then synthesize a qualitative insight.

  1. Initiation:

    • Trigger: The Orchestrator invokes this tool when the user's query expresses a need or a goal rather than a specific product search.
    • Payload: The tool receives the user's intent and a desired scope (e.g., budget, key features) to guide the recommendation process.
  2. Step 1: Recommendation Generation (The Extractor):

    • Objective: To identify and structure a list of relevant product recommendations based on the user's intent.
    • Mechanism: The tool uses the streamObject function with a search-grounded Gemini model (gemini-2.0-flash-lite). The model is guided by the RECOMMENDATOR_EXTRACTOR system prompt and the recommendationSchema. Critically, the useSearchGrounding: true flag enables the model to perform its own real-time web searches to find products that match the user's intent and scope.
    • Output & UI Streaming: The LLM returns a structured JSON object containing a list of recommended products. Each recommendation includes the product name and a brief rationale. This list is streamed to the UI and rendered by the StreamRecommendationAction component, allowing the user to see the recommendations appear one by one.
  3. Step 2: Insight Generation (The Synthesizer):

    • Objective: To provide a compelling narrative that explains the recommendations and helps the user understand their options.
    • Mechanism: The structured list of recommendations generated in the previous step is passed to a second LLM call using streamText. This call is governed by the RECOMMENDATOR_INSIGHT system prompt.
    • Output: The model generates a final, human-readable summary that elaborates on why these specific products were chosen, how they fit the user's intent, and what their key strengths are. This narrative provides crucial context that goes beyond a simple list of products.
  4. Finalization and State Persistence:

    • The tool's complete output, including the structured list and the narrative insight, is saved to the database under a new callId using createToolDataEntry.
    • The standard mutateTool function is called to update the AIState with the results of the recommendation task, preserving a complete record of the agent's actions.

7.3. Diagram

[Recommendator-Tool-Diagram]

Workflow Diagram
(This content rendered using MDX)