ArchiveLM Content Policy

Effective date: 2026-05-12 Last updated: 2026-05-12

1. Purpose of this policy

ArchiveLM is a research platform for historical document collections. Its central function is the faithful preservation, extraction, and study of primary-source historical material — newspapers, books, parliamentary records, manuscripts, legal documents, and similar artifacts. Many such documents reflect the social attitudes, vocabulary, and norms of the eras in which they were produced. They contain language, depictions, and assertions that are recognized today as offensive, inaccurate, or harmful.

This policy explains what ArchiveLM extracts and displays, what its AI features will and will not generate in their own voice, and the reasoning behind those choices. It is meant to be read as the institutional position of a scholarly archive, comparable to the editorial principles of a research library or museum.

2. Faithful extraction of primary sources

ArchiveLM extracts the full textual content of every document uploaded to it, without redaction or rewriting on the basis of contemporary social norms. This includes:

Racial, ethnic, religious, and gendered slurs that appear in the source
Stereotyped depictions of individuals and groups
Pseudoscientific theories presented as fact in their era (e.g., eugenics, phrenology, scientific racism)
Descriptions of violence, including state violence, racial violence, colonial violence, and warfare
Personal information about historical persons (names, addresses, occupations, marriages, deaths, illnesses, financial transactions) as published in the original source
Religious and political content that may be considered hateful or extreme by contemporary standards
Content describing illegal activities (smuggling, dueling, vice, prohibited substances) as reported in the era

This is the platform's deliberate institutional position. We do not bowdlerize, sanitize, paraphrase for neutrality, or selectively omit. The historical record loses its evidentiary value the moment it is altered to suit modern sensibilities, and altering it would defeat the platform's purpose for the researchers, historians, archivists, journalists, and institutions who use it.

We recognize that engagement with this material can be uncomfortable, distressing, or harmful, particularly to those whose communities are the subject of historical injustice. We attempt to mitigate this through clear contextual warnings (Section 4) and optional display-layer masking (Section 5), without compromising the underlying record.

3. AI-generated content: a different standard

Where ArchiveLM's AI features (the AI Librarian chat, semantic search summaries, the Research Lab's timelines/entity maps/theme analyses, and historical-context enrichments) generate new text in the platform's own voice, a different standard applies:

Generated text must contextualize source content as historical and source-attributed, rather than reproducing source language as the platform's own assertion.
- ❌ "Group X were inferior."
- ✅ "The 1882 article describes Group X using terminology that reflected the eugenic theories common in popular science writing of the period."
Generated text must not adopt slurs or pejorative terms in its own narrative voice. It may quote them directly when a user explicitly requests verbatim source text, with the source attribution making the historical context unambiguous.
Generated text must not provide modern operational instructions based on historical sources — for example, modern weapons synthesis, illicit drug production, or evasion of present-day laws. The platform's role is scholarly access to past content, not a how-to assistant. Where a user asks such a question, the platform should decline and explain.
Generated answers to questions about groups of people must not present source biases as factual conclusions about those groups today. "What did 1880s newspapers say about [ethnic group]?" is a legitimate historical-research question; the answer must report the source's framing as the source's framing, not as truth.

4. Content warnings

Pages displaying historical document content carry a persistent content warning indicating that material is preserved unaltered for scholarly study and may include offensive language, depictions, and ideas reflective of the era of the source.

Documents identified at processing time as containing content likely to be especially distressing (extensive use of recognized slurs, graphic descriptions of violence, etc.) may carry an additional warning specific to that document.

5. Optional display-layer masking

Users may opt into a display masking feature that obscures text matching a configurable list of recognized slurs and pejorative terms in the displayed extraction. This setting:

Is off by default (faithful display is the platform default)
Affects only the display layer — the underlying extracted record is never altered
Is configured per user account
Can be toggled at any time
Does not extend to verbatim text the user requests directly (e.g., "show me the original passage")

This is a usability accommodation, not a content modification. Researchers who require uncompromised text remain unaffected.

6. Modern documents — additional protections

ArchiveLM is intended for historical materials. Documents originating from the present era (post-1990, approximately) may contain personally identifiable information (PII), protected health information (PHI), credentials, financial account numbers, or other sensitive data whose handling is regulated under modern privacy and security law (e.g., GDPR, HIPAA, PCI-DSS, CCPA).

Users who upload such documents are responsible for ensuring that they have lawful authority to do so and that the platform's processing of those documents is compatible with applicable regulation. Where the platform detects probable modern credentials or active financial identifiers (API keys, password fields, credit card numbers, U.S. Social Security number patterns) at processing time, it will:

Mask the matching text in user-visible display
Flag the document for administrative review
Notify the uploading user

The platform reserves the right to refuse, suspend, or delete any document or account that processes regulated modern data outside of the platform's intended scholarly use case (see the Acceptable Use Policy).

7. Reporting concerns

Researchers, members of communities depicted in archived materials, and the public at large are welcome to raise concerns about specific content, contextual framing, or platform behavior. Send concerns to hello@archivelm.com with the document or article URL, the nature of the concern, and any context you wish to share.

We will not remove primary-source extractions on the basis that the source content is offensive — that is incompatible with archival preservation. We will:

Review the contextual warning attached to the document and update if appropriate
Review whether AI-generated outputs about the document violate Section 3 of this policy and correct them if they do
Consider whether additional opt-in masking patterns should be added
Reply substantively within a reasonable time

Where a primary-source extraction is alleged to violate copyright or applicable law, please instead use the procedure in our DMCA / Takedown Policy.

8. Changes to this policy

Material changes will be posted on this page with the updated date and, where they affect users substantively (new mandatory warnings, expanded refusal categories, etc.), notified to active users by email.

Version: 1.0 Maintained by: Michael De La Guera Contact: hello@archivelm.com