Article

Liquid AI open sources LFM2 to double on-device decode speed and triple training efficiency

DATE: 7/15/2025 · STATUS: LIVE

LFM2 shrinks on-device AI delays, triples training throughput, and keeps data offline on phones or satellites—discover what makes it tick…

Liquid AI open sources LFM2 to double on-device decode speed and triple training efficiency

Article content

Liquid AI introduced LFM2, a lineup of second-generation Liquid Foundation Models built for on-device intelligence. This release marks a major move in edge machine learning by speeding up inference and trimming compute overhead while keeping output quality competitive.

Tests show LFM2 cuts both decode and prefill times in half compared to Qwen3 on CPUs, boosting real-time responsiveness for chat and retrieval tasks. Training throughput is also three times faster than the original LFM series, reducing the cost of developing large general-purpose models.

These improvements unlock sub-10 millisecond latencies, fully offline operation, and strict data privacy on constrained hardware. Smartphones, laptops, vehicles, robotics platforms, wearables, and satellites can now run powerful AI routines without relying on cloud connectivity or exposing sensitive data.

At its core sits a hybrid framework that merges short-range convolution modules with grouped query attention layers. This approach builds on Liquid AI’s earlier Liquid Time-constant Networks work, which blends continuous-time recurrent units with dynamic gating functions.

LFM2 employs a 16-block structure: 10 double-gated convolution blocks handle local feature extraction, and 6 grouped query attention blocks capture long-range dependencies. That balance allows the model to adapt quickly to both recent inputs and broader context.

A standout feature is the Linear Input-Varying operator, which generates weights dynamically from the input itself. By bringing convolution, recurrence, and attention under one input-aware mechanism, the design reduces parameter bloat and ensures each layer responds directly to incoming data.

Model selection used STAR, Liquid AI’s neural architecture search engine. STAR was extended to evaluate language modeling via over fifty internal tests, covering areas such as factual recall, chained reasoning, low-resource language comprehension, instruction compliance, and auxiliary tool usage.

Three sizes of LFM2 are available: 350 million, 700 million, and 1.2 billion parameters. Pretraining spanned ten trillion tokens drawn from a curated mix of 75 percent English, 20 percent other languages, and 5 percent code sourced from web and licensed repositories.

Throughout the 10T-token run, each LFM2 variant learned under an LFM1-7B “teacher” via cross-entropy loss against the teacher outputs. The context window was extended to process up to 32,000 tokens, enabling richer document-level reasoning.

Against comparable systems, LFM2-1.2B rivals Qwen3-1.7B with 47 percent fewer parameters. The 700 million model surpasses Gemma 3 1B IT, and the 350 million checkpoint holds its own against Qwen3-0.6B and Llama 3.2 1B Instruct, showcasing a favorable size-to-performance ratio.

In multi-turn dialogue tests, LFM2 shines. Evaluations on the WildChat dataset, judged by an LLM-as-a-Judge framework, rank LFM2-1.2B ahead of Llama 3.2 1B Instruct and Gemma 3 1B IT in user preference, matching Qwen3-1.7B on scoped conversational metrics.

These models have been integrated into ExecuTorch for PyTorch and the open-source llama.cpp runtime. Benchmarks on devices like the Samsung Galaxy S24 Ultra and AMD Ryzen CPUs place LFM2 near the Pareto frontier for both prefill and decode latency.

Further kernel optimization ports these CPU gains to GPUs and NPUs, making LFM2 flexible across heterogeneous edge platforms. That broad compatibility meets diverse on-device requirements for local AI execution.

As on-device inference gains traction, the divide between cloud and edge intelligence narrows. LFM2’s millisecond latencies, offline resilience, and data sovereignty appeal to consumer electronics, industrial automation, smart home appliances, financial services, e-commerce, and educational tools.

This release signals a shift in enterprise strategy. Firms moving away from public cloud LLMs can deploy LFM2 on premises for rapid, cost-effective, private AI, paving the way for innovative intelligent fixtures and applications.

In a separate announcement, Amazon rolled out Kiro, an IDE that embeds autonomous agents directly into the developer workspace. Users can assign bots to handle builds, tests, and deployments, accelerating delivery cycles and cutting manual overhead.

A team from MetaStone-AI and the University of Science and Technology of China unveiled MetaStone-S1, a generative model featuring a reflective feedback loop. By rerouting interim outputs back through its core generator, the system achieves parity with OpenAI’s o3-mini performance.

Google now offers gemini-embedding-001 through the Gemini API and AI Studio. This multilingual text encoder supports semantic search and classification in over 100 languages. Concurrently, MLflow’s open-source platform integrates with the OpenAI Agents SDK, automatically tracking agent calls, logging metrics, storing artifacts, and capturing environment configurations.

Researchers examining inference compute in large language models have proposed Fractional Reasoning, a training-free framework that selects key tokens on the fly. In medical AI, experts highlight that many benchmarks rely on static vignettes that miss real clinical complexity. Meanwhile, large multimodal networks continue to fuse text, images, and other inputs for tasks like visual question answering and factual retrieval in unified models.

Developers working on cross-modal workflows can leverage Google DeepMind’s GenAI Processors, a Python library for orchestrating generative pipelines involving audio, vision, and text. In computational chemistry, teams tackling Density Functional Theory costs are adopting neural surrogate solvers to reduce runtime while retaining accuracy in materials discovery.

Lastly, Moonshot AI’s Kimi K2, released in July 2025, presents a Mixture-of-Experts architecture with one trillion parameters and 32 billion active weights per token. This dynamic expert selection delivers high-capacity reasoning at lower overall inference expense, and the model is available under an open-source license for researchers.

Keep building

Join Skool — Ship Your First Microapp Back to feed