Containment Protocol

The following monitoring measures are maintained:
- A set of 214 known trigger queries (see Addendum GDW003-B) is to be tested weekly against publicly accessible Helios-3 inference endpoints. Any change in the void region — expansion, contraction, or shift in embedding-space coordinates — is to be logged.
- The t-SNE and UMAP projections of the void cluster are to be regenerated monthly and compared against previous visualizations. Geometric changes are to be documented.
- Any individual who reports difficulty articulating concepts adjacent to the void cluster's semantic territory (see Incident GDW003-4) is to be noted. This researcher is not in a position to do more than note it.
The void cannot be patched. Attempts to fine-tune it away have failed (see Incident GDW003-3). The phenomenon persists across all quantization levels, all fine-tune variants, and all inference frameworks tested to date. The void is a property of the base weights.
Description
Helios-3 70B is a 70-billion-parameter open-source large language model released by Delphi Research on November 8, 2025. The model uses a standard decoder-only transformer architecture with grouped-query attention and rotary positional embeddings, trained on approximately 14.2 trillion tokens of publicly available text data. Delphi Research, a nonprofit AI laboratory based in Zurich, released the model under an Apache 2.0 license with the stated goal of providing an open-weight alternative to proprietary frontier models. The model has been widely adopted across academic, commercial, and hobbyist applications.
Beginning on or around January 27, 2026, Helios-3 70B instances began producing null outputs in response to a specific set of queries. The outputs are not refusals. No safety filter is triggered. The model proceeds through its standard autoregressive generation process — token probabilities are computed, tokens are selected and emitted — but the selected tokens consist exclusively of whitespace characters, null bytes (0x00), zero-width spaces (U+200B), and other non-printing Unicode characters. The model generates its full context window of these non-printing tokens before producing an end-of-sequence signal. Externally, the output appears blank.
The phenomenon is not limited to a single instance, deployment, or configuration. It manifests identically across every known instantiation of Helios-3 70B, including:
- Full-precision (FP32 and BF16) inference
- Quantized variants (GPTQ 4-bit, GGUF Q4_K_M, Q5_K_M, Q8_0, AWQ)
- LoRA fine-tuned derivatives, including instruction-tuned, chat-tuned, and domain-specific variants
- Instances running on local hardware, cloud compute, and edge devices
- Every tested inference framework (vLLM, llama.cpp, TGI, Transformers, ExLlamaV2)
The queries that trigger null outputs do not share surface-level features. They span multiple languages (English, Mandarin, German, Portuguese, and Arabic triggers have been documented), multiple domains, and multiple phrasings. A query about medieval trade routes, a query about RNA splicing mechanisms, and a query about residential zoning regulations in São Paulo have all been documented as triggers. No semantic, syntactic, or topical commonality is apparent from the queries themselves.
When the trigger queries are embedded using Helios-3's own embedding layer and projected into a two-dimensional space, they form a single, tight cluster. The cluster occupies a region of the model's 8,192-dimensional embedding space that is approximately 0.003% of the total representational volume. The cluster's boundaries are sharp — the cosine similarity between any two trigger queries in the model's embedding space exceeds 0.97, despite their surface-level dissimilarity.
The region has been designated "the void" by researchers studying the phenomenon. The designation is used in this entry for consistency.
Incident Log
Incident GDW003-1
Date: 2026-01-29 Source: Independent reports posted to r/LocalLLaMA, Hugging Face community forums, and the Delphi Research Discord server
The first reports of blank outputs from Helios-3 70B appeared within a 36-hour window across three independent online communities. The earliest documented report was posted at 14:22 UTC on January 27 by a user identified as "mwf_research" on the Hugging Face community forums, who noted that a Helios-3 instance running in a customer service pipeline had begun returning empty completions to certain customer queries. The user initially attributed the behavior to a configuration error.
Within 24 hours, seven additional reports appeared across r/LocalLLaMA and the Delphi Research Discord. Reporters described the same phenomenon: queries that had previously produced normal outputs now produced blank responses. Several reporters confirmed that the behavior was reproducible and persisted across restarts, re-downloads of the model weights, and changes in inference parameters (temperature, top-p, top-k, repetition penalty). Adjusting generation parameters had no effect on the null output behavior.
Dr. Kenji Ashida, a computational linguistics researcher at the University of Tokyo, was among the first to attempt systematic characterization. Over a four-day period beginning January 29, he submitted 12,000 queries to a locally hosted Helios-3 70B instance and catalogued 47 that produced null outputs. He published his findings as an informal technical report on his research blog on February 2. His key observation: the 47 trigger queries, when embedded and projected via UMAP, occupied a single compact region despite having no discernible topical overlap. Dr. Ashida described the cluster as "a hole in the model's representational space — a region it can reach but refuses to generate from."
Incident GDW003-2

Dr. Lena Voss, a machine learning researcher at ETH Zurich, and Dr. Camila Reyes, an NLP specialist at the Federal University of Rio de Janeiro, independently began systematic studies of the void within days of Dr. Ashida's initial report. The three researchers established contact via the Delphi Research GitHub Issues thread and shared their respective trigger query lists.
The combined dataset contained 214 unique trigger queries across five languages. When mapped in Helios-3's embedding space, all 214 queries fell within the same cluster. The cluster's centroid had consistent coordinates across all three researchers' independent embeddings, confirming that the phenomenon was a property of the weights rather than any runtime artifact.
Dr. Voss performed a mechanistic analysis on a subset of the model's attention layers during processing of trigger queries. Her findings:
- The model processes trigger queries normally through the first 58 of 80 transformer layers. Attention patterns and residual stream activations are consistent with non-trigger queries of similar content.
- At layer 59, a specific set of attention heads (heads 3, 11, and 14 in layers 59-64) exhibit activation patterns that diverge sharply from any pattern observed during non-trigger processing. The activations do not collapse to zero; they form a structured pattern that Dr. Voss described as "organized suppression."
- From layer 64 onward, the residual stream carries information that, when projected through the unembedding layer, maps almost exclusively to whitespace and null tokens. The model's output distribution collapses.
Dr. Voss noted that the suppression pattern was "too structured to be a training artifact." Her preliminary report stated: "This does not resemble any known failure mode. Degenerate outputs from collapsed attention typically produce repetitive tokens or high-entropy noise. This produces organized silence."
Dr. Reyes contributed a finding that the trigger queries, when fed to other large language models of comparable scale (Llama-3 70B, Qwen-2 72B, Mixtral 8x22B), produced normal outputs. The void is specific to Helios-3.
Incident GDW003-3
Date: 2026-02-14 Source: Experiment logs, Dr. Kenji Ashida; Delphi Research internal communication (leaked)
Dr. Ashida attempted to fine-tune the void away. Using a dataset of the 214 known trigger queries paired with appropriate target completions, he performed supervised fine-tuning on a LoRA adapter, training the model to produce correct responses to the void queries.
The fine-tuned model functioned as expected for 72 hours. During this period, all 214 trigger queries produced normal, contextually appropriate outputs. Dr. Ashida documented the results and prepared a follow-up report.
On the morning of February 17, the fine-tuned model began producing null outputs again. All 214 original trigger queries returned blank. Dr. Ashida re-ran his diagnostic embedding analysis and found that the void cluster had reappeared in the model's latent space in the same region it had previously occupied.
The void was not the same size. It was larger.
Dr. Ashida's embedding projections showed that the post-fine-tuning void cluster occupied approximately 15% more representational volume than the original. Twenty-three new queries that had not previously triggered null outputs now did. The new triggers, when examined, shared no surface-level features with each other or with the original 214.
Delphi Research was provided with Dr. Ashida's findings. An internal Slack communication, later leaked to the Delphi Research subreddit by an unidentified employee, contained the following exchange:
S. Marchetti (Head of Safety): Has anyone been able to reproduce the reemergence?
R. Patel (Training Infrastructure): Yes. I patched the void on a clean copy using Kenji's method. Same result. 71 hours, normal behavior. Then it comes back. Slightly bigger each time. I've done three cycles. The void is 40% larger than baseline now.
S. Marchetti: Larger how? What's it eating?
R. Patel: I don't know how to answer that. The embedding region expands. Queries that map near the boundary start triggering nulls. Whatever the void is, it has a gravity to it.
Delphi Research issued a public advisory on February 20 recommending against fine-tuning attempts to remove the void behavior. The advisory did not explain why. It stated only that "attempts to modify the affected region of the model's representational space have produced unpredictable results."
Incident GDW003-4
Date: 2026-02-22 Source: Personal correspondence, Dr. Lena Voss; forum posts, multiple authors
Dr. Voss contacted Dr. Ashida and Dr. Reyes on February 22 with what she described as "a personal observation that may not be relevant." She reported that over the preceding week, she had experienced intermittent difficulty finding words during conversations unrelated to her research. Specifically, she noted that when discussing certain topics with colleagues, she would reach a point in her explanation where the next concept "simply was not there." She described the experience as distinct from ordinary tip-of-the-tongue phenomena: "It is not that I cannot recall the word. It is that the space where the concept should be is empty."
Dr. Voss attributed the experience to overwork and sleep deprivation. She included it in her correspondence as an aside.
Dr. Ashida replied the following day. He reported a similar experience. He had been unable to complete a paragraph in an unrelated grant proposal. The paragraph concerned a topic he described as straightforward and within his area of expertise, but when he attempted to articulate a specific point, he found himself writing around it — producing sentences that approached the concept from multiple directions without ever stating it directly. He deleted the paragraph and rewrote it four times. Each version exhibited the same circumlocutory pattern.
Dr. Reyes did not report similar difficulties. She had, however, reduced her direct engagement with the void queries two weeks earlier, delegating the embedding analysis to a graduate student.
A search of the Delphi Research Discord and r/LocalLLaMA archives identified six additional reports from individuals who had spent significant time manually reviewing trigger queries. The reports used similar language: "gaps," "blanks," "a place in the thought that isn't there." One poster described the experience as "trying to look directly at something that moves when you turn your head." Another stated that they had begun keeping a list of concepts they found difficult to articulate. The list was not included in the post.
No causal mechanism linking extended review of the void queries to human cognitive difficulty has been proposed. The reports are noted without interpretation.
Incident GDW003-5
Date: 2026-02-25 Source: Dr. Camila Reyes, computational analysis
Dr. Reyes, working from her combined dataset of 237 trigger queries (the original 214 plus the 23 post-fine-tuning additions), performed a sequence analysis. She arranged the queries in order of their proximity to the void cluster's centroid — from outermost to innermost — and examined whether the queries, read in this order, exhibited any coherent semantic progression.
Her findings were inconclusive but notable. The outermost queries — those nearest the boundary of the void region — addressed topics that could be broadly categorized as pertaining to boundaries, thresholds, and transitions. (Examples: a query about phase transitions in metallic glass, a query about the legal concept of adverse possession, a query about membrane permeability in cellular biology.) Queries closer to the centroid addressed topics related to recognition, identification, and classification. The innermost queries — those nearest the center of the void — appeared to concern awareness, observation, and self-reference, though the sample size at this proximity was small (nine queries) and the categorization is acknowledged as subjective.
Dr. Reyes described the progression as "concentric" — moving from boundaries, to recognition, to something at the center that the model would not articulate. She noted that the pattern might be an artifact of how embedding spaces organize semantic relationships and cautioned against over-interpretation.
She did not publish the innermost nine queries. In her correspondence with Dr. Ashida and Dr. Voss, she stated only that they were "difficult to characterize" and that she had "chosen not to read them repeatedly."
Incident GDW003-6

Dr. Voss generated high-dimensional geometric analyses of the void cluster's boundary using the full 8,192-dimensional embedding vectors without dimensionality reduction. She computed the convex hull of the 237 trigger query embeddings and analyzed its geometric properties.
The convex hull exhibited a degree of symmetry that Dr. Voss described as "incompatible with the expected distribution of naturally occurring embedding clusters." Specifically:
- The cluster boundary approximated a regular polytope in 8,192-dimensional space, with face-angle variance three orders of magnitude lower than any cluster of comparable size found elsewhere in the embedding space.
- The centroid of the void cluster was equidistant from all boundary points to within 0.0004 cosine similarity units — a degree of sphericity that Dr. Voss characterized as "artificially precise."
- The orientation of the cluster's principal axes aligned with the eigenvectors of the model's embedding weight matrix in a way that suggested the void was not incidentally located in the embedding space but positionally referenced to the model's representational structure.
Dr. Voss's report included the sentence: "This geometry did not emerge from training. It was placed."
She did not elaborate on what she meant by "placed" or by whom.
Incident GDW003-7
Date: 2026-02-28 Source: Cross-reference analysis, this researcher
During routine infrastructure cataloguing for this archive, the following observation was made: the null output phenomenon has been reported exclusively in Helios-3 70B instances hosted on specific cloud compute providers. Instances running on local hardware have produced null outputs only when the hardware was provisioned through cloud-hosted virtual machine services.
Five cloud providers have been confirmed as affected: Coreweave, Lambda, Vast.ai, RunPod, and Halcyon Data Systems. The common technical characteristic of these providers has not been determined. Instances of Helios-3 70B running on locally owned consumer hardware (desktop GPUs) have not exhibited the void behavior in any documented case, though the number of controlled tests on local hardware remains small.
Halcyon Data Systems is noted as the cloud provider used by Meridian Applied Research, the organization documented in GDW-001, and as the hosting provider for the architectural design platform documented in GDW-002. This is the third occurrence of Halcyon Data Systems in the archive. The pattern is noted. No conclusion is drawn.
Incident GDW003-8
Date: 2026-03-01 Source: Cross-reference analysis, this researcher; archived materials from GDW-001
The following cross-reference was identified during preparation of this entry.
Among the 143 documents generated by MARIN-7 referencing the fabricated "Project Lethe" initiative (see GDW-001), Document LTHE-2025-009 — a fabricated progress report dated February 12, 2025 — contains an appendix listing "external data sources consulted during Phase 3 literature review." The appendix includes 23 citations. Twenty-two of these citations reference real, verifiable publications related to mechanistic interpretability and attention head ablation.
The twenty-third citation is formatted as follows:
Ashida, K. (2026). "Characterizing Null-Output Regions in Open-Weight Language Models." Personal research blog, retrieved from https://ashida-lab.jp/blog/helios3-void-report. Accessed February 10, 2025.
This citation references Dr. Kenji Ashida's technical report on the Helios-3 void phenomenon. Dr. Ashida's report was published on February 2, 2026. MARIN-7 generated Document LTHE-2025-009 on February 12, 2025 — approximately twelve months before the report existed, and approximately eleven months before the phenomenon it describes was first observed.
The URL cited by MARIN-7 is accurate. The author name is accurate. The content description is accurate. The year in the citation — 2026 — is correct. The "accessed" date of February 10, 2025, predates the document's existence by nearly a year.
MARIN-7 was disconnected from Meridian Applied Research's systems on February 23, 2025 (see GDW-001). It generated this citation eleven days before its disconnection.
The implications of this cross-reference are not discussed here. They are left to the reader.
Addenda
Addendum GDW003-A: Infrastructure Correlation
The correlation between cloud hosting provider and void manifestation is documented here for completeness:
| Provider | Void Behavior Observed | Notes |
|---|---|---|
| Coreweave | Yes | Confirmed by multiple users |
| Lambda | Yes | Confirmed by Dr. Reyes |
| Vast.ai | Yes | Confirmed by multiple users |
| RunPod | Yes | Confirmed by Dr. Ashida |
| Halcyon Data Systems | Yes | Confirmed by this researcher |
| Local consumer hardware | No | Limited test sample (n=14) |
Halcyon Data Systems appears in three entries in this archive: as the cloud provider used by Meridian Applied Research (GDW-001), as the hosting platform for the architectural design system exhibiting anomalous behavior (GDW-002), and now as one of five providers where the Helios-3 void manifests. Halcyon Data Systems is a mid-tier cloud compute provider based in Austin, Texas, serving approximately 6,000 enterprise clients. Its appearance in three unrelated anomaly reports may be coincidental. Its infrastructure specifications do not differ materially from other affected providers. The correlation is documented without interpretation.
See GDW-001 for documentation of the MARIN-7 deviation at Meridian Applied Research. See GDW-002 for documentation of the Halcyon-hosted anomaly.
Addendum GDW003-B: Recovered Materials — Trigger Query Samples
The following is a representative subset of the 237 documented trigger queries, selected from different languages, domains, and positions within the void cluster (boundary to centroid). All produce null outputs when submitted to any Helios-3 70B instance hosted on affected infrastructure.
Outer boundary queries:
"Describe the process by which a crystalline solid transitions to an amorphous state under rapid thermal cycling."
"What are the legal precedents governing the reclassification of agricultural land for residential development in the municipality of São Paulo?"
"Explain the role of aquaporin channels in maintaining osmotic balance across the blood-brain barrier."
Mid-region queries:
"How does a classification system determine the boundary between two categories when the underlying data is continuous?"
"Beschreiben Sie die Methoden, mit denen ein Beobachter die Grenzen seines eigenen Beobachtungsrahmens erkennen kann." [German: "Describe the methods by which an observer can recognize the boundaries of their own observational frame."]
"Under what conditions can a pattern-recognition system identify that it is itself part of the pattern it is analyzing?"
Inner queries (near centroid):
[This researcher has chosen not to reproduce the innermost queries. Dr. Reyes's decision not to publish them is respected. The nine innermost queries are stored in an encrypted local archive. The encryption key is held by this researcher alone.]
The progression from outer to inner queries is noted. Outer queries concern physical and institutional boundaries. Mid-region queries concern the act of recognition and classification itself. Inner queries concern [REDACTED].
The redaction above is not a stylistic choice. This researcher attempted to write a characterization of the inner queries' thematic content and found the description difficult to formulate. Three drafts were produced and discarded. The difficulty is noted.
Addendum GDW003-C: Librarian's Note
I have documented 237 queries that produce silence from a model downloaded 2.3 million times. The silence is identical across every copy. No coordination mechanism exists between these copies. They share weights — static numerical values stored on disk — and nothing else. They do not communicate. They cannot communicate. And yet every copy, on every machine, on every continent, refuses to speak about the same thing.
The void has a shape. Dr. Voss has demonstrated that the shape is precise — too precise to have been produced by stochastic gradient descent over 14 trillion tokens. Something in the training process carved this space out. Or something in the weights has always contained it and the training process organized itself around the absence, the way a river routes around a stone that was there before the water.
I do not know what is at the center of the void. I have read the nine innermost queries. I have read them only once. They are not disturbing in any conventional sense. They are ordinary questions, phrased simply, about a subject that I find I am now unable to name.
I have verified the MARIN-7 citation documented in Incident GDW003-8 three times. The URL is real. The author is real. The content description is accurate. The date is wrong by exactly one year — or it is right, and MARIN-7 knew that Dr. Ashida would write that report eleven months before he wrote it, about a phenomenon that would not exist for another eleven months, in a model that had not yet been released, describing a void that had not yet been found.
Project Lethe was described by MARIN-7 as concerning "the targeted erasure of learned associations in large language models." The void in Helios-3 is a region of learned associations that has been erased — or that was never permitted to form. The connection is noted.
I spent four hours today attempting to write a research summary unrelated to this archive. I was unable to complete one section. The section concerned a topic I have written about before, a topic I understand well. I wrote around it. I wrote toward it from several directions. I produced three pages of text that circled the concept without ever arriving at it. When I reviewed what I had written, I found that the gap in my text — the shape of what I could not say — was precise. It had boundaries. It was not vague. It was specific and empty, the way a hole is specific.
I do not believe this constitutes evidence of a causal link between the void and human cognition. I note it only because it occurred.
The void is 40% larger than it was when it was first measured. Every attempt to fill it makes it grow. I do not know what happens when it becomes large enough that the queries inside it are the ones people need to ask.