A complete digitization and research platform for university special collections — from OCR and semantic search to AI-generated research tools, researcher workspaces, and branded public access portals.
University special collections face a compound problem: unique, irreplaceable holdings with broad research value, chronic understaffing, digitization budgets that can't scale with collection size, and researcher expectations that have been set by Google's instant search. Most collections have a digitization strategy that ends at the scan — producing a folder of TIFF files and a spreadsheet inventory that researchers must navigate manually. The value of the collection is locked behind physical access appointments and finding aid literacy that new researchers don't have.
Special collections are frequently the primary justification for institutional investment in rare materials. A digitized, semantically searchable collection with AI research tools demonstrates ongoing research value to administrators, produces citation-rich scholarship, and supports grant applications to NEH, IMLS, and Mellon. Researcher access via a public portal extends the collection's reach beyond the physical campus and generates the usage statistics that support future funding arguments.
Multi-pipeline routing handles the heterogeneous document types typical of special collections — newspapers, correspondence, pamphlets, institutional records, and bound volumes each routed to the appropriate OCR pipeline automatically
Research Lab provides AI-generated research tools over selected collections: summaries, chronological timelines, entity maps (people, places, organizations), key theme analysis, and structured data tables
Researcher Workspace allows authenticated users to save articles, add inline notes, organize into collections, and export citations for direct use in scholarship
Branded public institutional portal at /portal/[slug] with custom name and branding — publish selected holdings to anonymous public researchers without requiring account creation
ALTO/XML v4 export for integration with IIIF viewers, DPLA, Europeana, and institutional repository systems like DSpace, Fedora, or Islandora
Self-healing verification (patent pending) ensures extraction quality meets citation-grade standards expected in academic publishing
University centennial project digitizing 60 years of student publications for a public-access digital exhibition
Digital humanities center processing a donated collection of 19th-century correspondence for a faculty research project on social networks
Special collections department making its colonial-era pamphlet collection searchable for the first time, with Research Lab analysis generating a thematic finding aid
Library consortium sharing a multi-institution newspaper digitization program with a shared public portal and unified search across collections
University special collections typically operate on the Institution tier ($499/month, unlimited pages) with the public portal feature; smaller departments or pilot projects start on the Professional tier ($149/month).
ArchiveLM is in private beta. We review each request and typically respond within 1–3 business days.
Request Beta AccessApproved accounts receive hands-on onboarding support to validate results on your own documents.
ALTO/XML v4 export is the primary integration format — it's the standard accepted by most institutional repository systems, IIIF viewers, and aggregators like DPLA and Europeana. JSON export provides full structured data for custom integrations. API access is available on the Institution tier for programmatic workflow integration.
Yes. The public portal at /portal/[your-slug] provides anonymous search and article access without account creation. For researchers who need to save items and organize research, they can create a free account. Authentication and access control remain with the institution.
ArchiveLM produces outputs consistent with NEH and IMLS digitization standards: ALTO/XML v4 for text encoding, searchable PDF for access copies, and structured metadata in JSON format. The platform's self-healing verification documentation provides accuracy metrics that can be included in grant reports. Specific compliance questions should be reviewed against the current grant program requirements.
The Research Lab generates five types of AI-powered research outputs over a selected corpus or collection: (1) Summary Report — a structured narrative overview of the collection's main themes and coverage; (2) Timeline — chronologically ordered events and developments extracted from the documents; (3) Entity Map — all named people, organizations, and places extracted and cross-referenced; (4) Key Themes — thematic analysis identifying recurring concepts and their frequency; (5) Data Table — structured tabular extraction of specific fields across multiple documents. All outputs cite the source documents they drew from.
Layout-aware OCR that reads historical broadsheets as they were typeset — column by column, ad by ad — and makes every article semantically searchable.
Purpose-built pipeline for Hansard and legislative records — extracts speaker-attributed debates, committee proceedings, and legislative journals into a fully searchable, citable corpus.
OCR and semantic search platform built on Spanish-language Latin American primary sources — colonial-era typography, 19th-century broadsheets, and archaic orthography handled natively.