Latvian language-technology firm Tilde has published TildeOpen LLM, an open-source foundational large language model engineered for European languages with a strong emphasis on under-represented national and regional tongues. The company made the model public on September 3, 2025, distributing it at no charge via Hugging Face.
TildeOpen is a 30-billion-parameter dense decoder-only transformer offered under a permissive CC-BY-4.0 license. Language coverage runs from Latvian and Lithuanian to Ukrainian, Turkish and a wide set of other European languages, reflecting a design goal of balanced multilingual support rather than English-first priorities.
Training took place on European Union supercomputers LUMI in Finland and JUPITER, leveraging approximately 2 million GPU hours that were awarded through the European Commission’s Large AI Grand Challenge. The scale of compute allowed Tilde to run an extended training schedule and experiment with language sampling strategies aimed at equalizing model behavior across languages.
Technical training used EleutherAI-inspired GPT-NeoX scripts for roughly 450,000 update steps and consumed about 2 trillion training tokens in total. Sampling followed a three-stage regimen: an initial uniform pass across languages, a natural-distribution phase that increased exposure for high-data-volume languages, and a concluding uniform sweep to rebalance rarer-language examples.
The model architecture includes 60 transformer layers, an embedding dimension of 6,144, and 48 attention heads. The context window supports 8,192 tokens. Activation functions use SwiGLU, positional encoding is implemented with RoPE, and layer normalization uses RMSNorm. The design choices favor long-context handling and an operational profile suited for multilingual inference.
Mainstream large models tend to concentrate training on English and other major languages, producing performance gaps when applied to Baltic, Slavic or other smaller European languages. Those gaps show up as grammatical errors, clumsy phrasing and an increased tendency to hallucinate. TildeOpen addresses this through what the company calls an "equitable tokenizer," engineered to represent text in a similar tokenization regime regardless of language, reducing token counts for inflected and morphologically rich languages and improving inference efficiency.
Organizations may self-host TildeOpen in on-premises data centers or in EU-compliant cloud environments, keeping data within chosen jurisdictions and meeting GDPR and related data-protection obligations. The ability to run models under local control tackles data-sovereignty concerns that can arise when organizations rely on US- or Asia-hosted models.
TildeOpen is delivered as a foundational base model. The firm expects to produce derivative, task-specialized versions atop that base, such as instruction-tuned translation systems and targeted dialog or assistance models. The open license and modular architecture are intended to make it straightforward for institutions and vendors to fine-tune or build pipeline-specific variants.
Beyond the technology itself, the release positions Latvia and Tilde as a regional technology exporter. The company has said it wants to help scale European AI infrastructure in ways that keep language diversity intact, and the open-source release is part of a strategy to give local governments, research labs and companies access to production-grade multilingual models.
The move mirrors ongoing academic and industry research about multilingual model behavior, where notable gaps persist. Evaluations of contemporary open LLMs have found hallucinations and deficits in lexical accuracy for Baltic languages, lending weight to the argument for focused, localized development and evaluation pipelines that test real-world use cases for smaller languages.
TildeOpen reframes an EU approach to AI from regulatory compliance to active technical stewardship. The model offers a transparent architecture and deployment options that can be audited and hosted close to sensitive data. Its creators emphasize practical utility over marketing claims, presenting a high-capacity system engineered for concrete linguistic needs.
Tilde published the model weights and technical documentation alongside example code on Hugging Face, plus notes on evaluation datasets and metrics used during internal testing. The project repository and license are intended to let universities, companies and public agencies examine, adapt or extend the base model.
Q1: What is TildeOpen LLM?
TildeOpen is a 30B-parameter multilingual large language model trained on EU supercomputers and optimized for European languages, with special attention to under-represented national and regional languages.
Q2: How is it different from mainstream LLMs?
Mainstream models prioritize English and other major languages. TildeOpen employs an equitable tokenizer and a balanced training schedule to improve representation and accuracy across smaller European languages.
Q3: Can organizations self-host the model?
Yes. TildeOpen is open-source under CC-BY-4.0 and can be deployed in local data centers or EU-compliant cloud services to satisfy GDPR and data-sovereignty needs.
Q4: What are the main use cases?
Government services, translation, education, AI assistants, speech technologies and multilingual customer support are among primary uses—any application requiring accurate processing of European languages.

