Article

Build a Multi-Tool AI Agent with Nebius and Llama 3.3 for Real-Time Search, Retrieval and Computation

DATE: 6/27/2025 · STATUS: LIVE

Imagine building a ChatNebius-powered AI that queries Wikipedia, custom docs, and math engines into one impressive powerhouse ready for anything…

Build a Multi-Tool AI Agent with Nebius and Llama 3.3 for Real-Time Search, Retrieval and Computation
Article content

A new tutorial outlines how to build an advanced AI assistant that draws on Nebius’ ChatNebius, NebiusEmbeddings, and NebiusRetriever tools. The agent runs on the Llama-3.3-70B-Instruct-fast model to deliver detailed replies that can tap into live Wikipedia entries, pull in context from custom document sets, and perform secure arithmetic. The write-up shows how to combine structured prompt templates with LangChain’s composable chains so that the result supports scientific investigations, technical questions, and routine calculations in one integrated system.

Developers start by installing libraries needed for the project. Run pip commands to add langchain-nebius, langchain-core, langchain-community, and the Wikipedia Python client. This setup ensures the agent uses Nebius and LangChain to access an external knowledge source for current content. Developers should perform this step in a virtual environment to avoid conflicts. Dependencies include version specifications that align with the latest Nebius SDK release.

Next, the code imports several standard modules, including os for environment handling, getpass for secure key entry, datetime for timestamping, and typing utilities for code clarity. It initializes a Wikipedia API client, letting the agent fetch articles, summaries, and links from the free encyclopedia as part of its answer generation process.

The main logic sits in a custom class named AdvancedNebiusAgent, which unifies reasoning workflows, retrieval steps, and tool hooks. The class constructor loads the Llama-3.3-70B-Instruct-fast model through ChatNebius and builds a semantic retriever that indexes a small library of documents. That library covers various topics, such as machine learning, quantum computing, and blockchain. A prompt template then weaves together base instructions, retrieved snippets, tool outputs, and the current date to guide the LLM on each user request.

In configuring the retriever, embedded vectors for each document allow rapid similarity search over the local knowledge base. The agent can look up relevant passages in real time and feed those back into the prompt. This approach gives the LLM a memory of curated material rather than relying only on its internal parameters. It bridges the gap between static model weights and dynamic information needs for domain-specific research tasks.

The prompt template defines placeholders for retrieved context, tool responses, and user queries. At runtime, the agent fills in those slots to craft a single instruction sequence. By combining these sections, the LLM receives all necessary background and responds with focused, multi-step reasoning. The design follows a modular pattern so that developers can swap in new document sets, change the instruction structure, or adjust the sequence of tool calls.

Two built-in tools expand the agent’s reach. A wikipedia_search tool issues HTTP queries to the encyclopedia, parses the returned JSON, and extracts headline summaries. It returns key facts and source links. A calculate tool performs arithmetic in a sandboxed environment. It handles addition, subtraction, multiplication, division, and safe functions, blocking risky operations. Both tools register with LangChain’s tool registry so the agent can invoke them by prefix in user messages.

The process_query method orchestrates the full flow. It first sends the user input to the retriever to collect top-k relevant documents. Next, it routes the combined query and context into a LangChain chain that checks for tool prefixes. If a tool is requested, the chain runs the external function, feeds output back into the prompt, and then calls the LLM. The final result merges knowledge base snippets, live data, and computed values into one cohesive answer.

An optional interactive loop provides a command-line shell for real-time exploration. Users can type free-form questions or include the wiki: or calc: prefix to trigger external lookups or arithmetic operations. The agent recognizes those markers, runs the appropriate tool, and loops back with the enriched prompt. That setup lets users experiment with retrieval, live data, and math functions in a single session without restarting the program.

The tutorial then runs several example queries to highlight each capability. Initially, it asks about the latest trends in artificial intelligence, quantum algorithm research, and climate forecasting, drawing on the embedded document collection. Each response shows how the retriever narrows in on relevant passages. It then handles a question on environmental modeling, illustrating how multiple passages merge into a synthesized overview.

In a follow-up demonstration, the agent handles a space exploration query by invoking wikipedia_search. It fetches a brief history of satellite launches and returns key dates. Finally, a scenario on solar panel output uses calc to estimate power generation given input hours, panel area, and efficiency. The tutorial prints each intermediate step, showing how the sandboxed math tool prevents errors and confirms the final result before printing a polished summary.

Taken together, these sections show how to fuse LLM-driven inference with structured retrieval and external tool use in one package. LangChain provides the scaffolding for chaining prompts and tools, and Nebius supplies the high-performance model, embeddings, and retriever backend. By following the modular design laid out in this tutorial, developers can adapt the blueprint to new domains, swap in custom document stores, or add other safe tools. The result is a capable, context-aware assistant for research, learning, or interactive exploration.

Keep building
END OF PAGE

Vibe Coding MicroApps (Skool community) — by Scale By Tech

Vibe Coding MicroApps is the Skool community by Scale By Tech. Build ROI microapps fast — templates, prompts, and deploy on MicroApp.live included.

Get started

BUILD MICROAPPS, NOT SPREADSHEETS.

© 2025 Vibe Coding MicroApps by Scale By Tech — Ship a microapp in 48 hours.