Article

Yandex Debuts ARGUS: 1-Billion-Parameter Recommender Transformer Preserving Long-Term User History and Scaling to Massive Catalogs

DATE: 9/6/2025 · STATUS: LIVE

Yandex’s ARGUS brings billion-parameter recommender transformers to production, promising deeper user memory, larger catalogs, faster adaptation, and a shocking reveal…

Yandex Debuts ARGUS: 1-Billion-Parameter Recommender Transformer Preserving Long-Term User History and Scaling to Massive Catalogs

Article content

Yandex introduced ARGUS (AutoRegressive Generative User Sequential modeling), a transformer-based framework for recommender systems that scales up to one billion parameters. The release marks a significant step for the company and places it alongside Google, Netflix, and Meta among the small number of organizations that have pushed recommender transformers out of laboratory experiments and into large-scale production.

Recommender systems have faced three persistent limits: short-term memory, constrained scalability, and weak adaptation to shifts in user behavior. Many production models reduce a person’s history to a narrow window of recent clicks or purchases, discarding months or years of previous activity. That truncation produces a shallow representation of intent that misses durable habits, slow shifts in taste, and repeating seasonal patterns. Catalogs with billions of items increase the pressure on algorithms and infrastructure. Smaller history windows can lower relevance and drive up compute and latency when personalization must cover massive item sets. The visible effects are familiar: recommendations grow stale, engagement falls, and platforms miss chances for users to find new or unexpected items.

Only a handful of firms have taken recommender transformers beyond prototypes. Google, Netflix, and Meta have invested heavily and reported gains from systems such as YouTubeDNN, PinnerFormer, and Meta’s Generative Recommenders. ARGUS brings Yandex into that group by operating at the billion-parameter scale in live services. The model treats a user’s activity as a continuous sequence, modeling entire behavioral timelines instead of short snippets. That long-horizon view makes it possible to detect both obvious links and subtle correlations across time. The model can pick up on gradual intent shifts and periodic behaviors with higher fidelity than limited-window approaches. For example, rather than reacting only to a single recent purchase, ARGUS can promote a preferred brand of tennis balls as summer nears, without the user having to repeat the same signals year after year.

The framework introduces several technical advances:

Dual-objective pre-training: ARGUS splits autoregressive learning into two coordinated subtasks — next-item prediction and feedback prediction. Next-item prediction trains the model to reproduce the sequence of interactions the system has seen historically. Feedback prediction teaches the model to forecast user responses such as clicks, likes, or purchases. Training on both objectives helps the model imitate prior system behavior while also learning the underlying signals that reflect genuine user preference.
Scalable transformer encoders: Yandex reports models built at scales from 3.2M to 1B parameters. Performance improves consistently across sizes. At the billion-parameter point, pairwise accuracy uplift grew by 2.66%, providing empirical evidence for a scaling law in recommender transformers. The gains suggest that larger encoders can extract richer patterns from long user histories than smaller architectures or shallow sequence models.
Extended context modeling: ARGUS processes user histories up to 8,192 interactions in a single forward pass. That capacity lets the system reason over months of behavior rather than only the last handful of events. Longer contexts reduce the need to compress, truncate, or randomly sample past interactions. The result is more stable personalization for users whose interests evolve slowly or repeat over time.
Efficient fine-tuning: The design uses a two-tower structure that separates user and item representations. Item embeddings can be computed offline and cached, enabling scalable deployment across large catalogs. This approach cuts inference cost compared with earlier target-aware methods or impression-level online models that required heavier, on-the-fly computation for each candidate and impression.

ARGUS is already live on Yandex’s music service, where it serves millions of listeners. In controlled production A/B experiments, the system produced a +2.26% lift in total listening time (TLT) and a +6.37% increase in like likelihood. Yandex describes those numbers as the largest quality improvements recorded in the platform’s history for any deep learning–based recommender model.

Yandex researchers say their next priorities include adapting ARGUS for real-time recommendation scenarios, refining feature engineering for pairwise ranking tasks, and applying the framework to high-cardinality settings such as large-scale e-commerce catalogs and video platforms. Work on latency reduction, memory-efficient training, and deployment tooling is underway to make the system practical for domains where item counts and request rates are extreme.

The company’s results add to a growing body of evidence that transformer-based sequence models can scale effectively for personalization tasks. The demonstrated improvements in both offline and online metrics point to a development path for recommenders that echoes the scaling progress seen in natural language processing. Yandex’s work with ARGUS positions the firm among a few organizations that are shaping the next wave of recommendation technology. By publishing technical details and evaluation results, the team aims to raise the bar for personalization across its products and accelerate progress in the recommendation research community.

Keep building

Join Skool — Ship Your First Microapp Back to feed