Article

OpenAI’s ChatGPT Agent Now Autonomously Performs Complex Web Browsing, Code Execution and API Tasks

DATE: 7/18/2025 · STATUS: LIVE

July 17, 2025, ChatGPT evolved into an autonomous AI agent handling browsing, scripting, and research seamlessly—prepare for a groundbreaking reveal…

OpenAI’s ChatGPT Agent Now Autonomously Performs Complex Web Browsing, Code Execution and API Tasks

Article content

On July 17, 2025, OpenAI introduced ChatGPT Agent, converting ChatGPT from a conversational assistant into a unified AI agent capable of autonomously carrying out complex, multi-step assignments within a simulated computer environment. This release merges two earlier offerings, Operator and Deep Research, into a single architecture. Operator allowed basic web tasks—clicking, scrolling, and form completion—through a browser-based agent. Deep Research supported long-duration browsing and automatic report creation. Each tool faced limits: Operator could not perform comprehensive analysis, and Deep Research could not interact fluidly with dynamic sites. ChatGPT Agent combines both capacities, delivering browsing, tool integration, and reasoning in one framework.

Central to the system is a sandboxed computer environment that offers multiple interfaces: a standard graphical browser for visual websites, a text-based browser tailored for logic-heavy tasks, a shell for scripting and code execution, and API bridges to services such as Gmail, GitHub, and popular productivity tools. The agent makes decisions on the fly—choosing when to click links, execute commands, or extract data—while maintaining a coherent session state. All steps are logged within the agent’s context, providing full traceability and adjustment points.

Users can employ ChatGPT Agent to handle workflows that require several distinct steps, like:

Calendar briefing: reading events, gathering context from related news, and summarizing agendas.
Grocery ordering: locating ingredients online, comparing prices at multiple vendors, and placing orders.
Competitive research: loading rival websites, scraping key data, and assembling slide decks or spreadsheets.
Financial modeling: downloading market statistics, updating spreadsheet models, and keeping existing formatting intact.

These examples call for cross-modal operations: logging into accounts, running scripts in the terminal, parsing web content, and outputting results in editable documents, all under configurable user oversight.

In benchmark assessments, the agent achieved leading results:

Humanity’s Last Exam recorded a Pass@1 rate of 41.6%, climbing to 44.4% with parallel trials.
FrontierMath reached 27.4% accuracy when using code and terminal support, surpassing prior bests.
SpreadsheetBench showed a 45.5% score for XLSX edits, against 20% by Copilot in Excel and around 71% by humans.
An internal knowledge-work evaluation found agent tools matching or exceeding expert-level performance roughly half the time.
On browse-based tasks, BrowseComp & WebArena returned a top mark of 68.9%.

These figures reflect notable gains in both autonomous behavior and the complexity of tasks the agent can tackle.

Given this level of autonomy, OpenAI implemented several safeguard layers:

Explicit confirmations are required before any consequential step, such as placing orders or posting content online.
A Watch Mode flags sensitive tasks for active human supervision.
Prompt-injection defenses include both training to detect irregular web inputs and continuous monitoring of tool outputs.
Privacy safeguards enforce session-specific takeover modes that discard sensitive entries like passwords, preventing long-term retention.
Biothreat protocols classify biological risk activities as high-priority events, triggering specialized threat modeling, refusal training, live oversight, and dedicated bug bounty programs.

These measures are designed to reduce the risk of misuse, from unintended data leaks to malicious redirection of tasks.

Access to Agent Mode is rolling out to all current ChatGPT subscription tiers. Pro users receive immediate capability with 400 agent-mode interactions per month. Plus and Team plans will gain access shortly, each providing 40 messages monthly. Enterprise and Education customers are slated for rollout over the next few weeks. An international release is in motion, extending availability to regions within the European Economic Area and Switzerland.

Switching into Agent Mode is done through the tools menu in any live conversation. The interface narrates each action in real time, giving options to pause, assume manual control, or stop the process.

ChatGPT Agent marks a shift from reactive Q&A bots to proactive digital collaborators. It rests on GPT-4-class language models, coordinated with browser and shell interfaces within a unified execution context that tracks state, variables, and outputs. This setup supports advanced workflows, such as automated reporting, email triage, and data-driven decision support.

For developers and data scientists, this agent provides a programmable, observable platform for scraping, parsing, synthesizing, and exporting information on demand. Action logs can feed into continuous integration pipelines or audit trails. By bundling language reasoning with tool orchestration and persistent context, the system extends AI’s function beyond text replies, making it a core component of next-generation automation workflows.

Current vision-language models play a key role in intelligent systems by interpreting text and image data. Though they demonstrate strong multimodal understanding, they often default to text-only reasoning during inference, which can limit performance on tasks requiring detailed visual analysis.

In other developments, NVIDIA released Canary-Qwen-2.5B, a hybrid ASR and language model that now leads the Hugging Face OpenASR leaderboard in transcription accuracy. Google followed with updates to its Search platform, launching Gemini 2.5 Pro, Deep Search, and an enhanced autonomous feature that deepens contextual query understanding and automates follow-up actions.

On the research front, Google DeepMind introduced AlphaEvolve, an evolutionary coding agent powered by Gemini that autonomously devises and refines algorithms across areas like mathematical modeling and data center resource optimization. Mistral AI launched Voxtral-Small-24B and Voxtral-Mini-3B, a pair of open-weight models designed to process both audio and text inputs under Mistral’s architecture. A new tutorial examines Griffe as the centerpiece of an advanced AI code analyzer, illustrating how its reflection API can generate documentation, extract class diagrams, and suggest refactorings. In digital photography, emerging retouch workflows use a mix of human input and algorithmic tools to adjust lighting, color balance, and object placement. Generative interface research explores conversation-driven UI control, making software more adaptive to user needs. Lastly, Mirascope rolled out a library that provides a unified interface across multiple language model providers, simplifying provider selection, request handling, and caching for application builds.

Keep building

Join Skool — Ship Your First Microapp Back to feed