What is air-gapped AI deployment?

Air-gapped AI deployment means running large language models and document processing pipelines on infrastructure that has no connection to the internet. All inference, embedding generation, vector search, and OCR happen on local hardware. No data leaves the physical perimeter, and no external API calls are made. This is distinct from standard on-premise deployment, which typically still maintains internet connectivity for updates and telemetry.

Is on-premise LLM deployment the same as air-gapped?

No. On-premise deployment means the hardware sits in your data center rather than a cloud provider's, but the servers may still have internet access for updates, licensing checks, or telemetry. Air-gapped deployment removes all network connectivity to external systems. The distinction matters for compliance frameworks like ITAR and CMMC, which may require true network isolation, not just physical co-location.

What hardware do you need for an air-gapped LLM?

For a standard deployment running an 8B-parameter model (like Llama 3 8B), you need at minimum an NVIDIA A10G GPU with 24GB VRAM, 64GB RAM, and 1TB NVMe storage. For enterprise-grade 70B models, plan for an NVIDIA A100 with 80GB VRAM (or two A6000 GPUs), 128GB RAM, and 2TB NVMe storage. The server also needs to run the full offline stack: LLM inference engine, embedding model, vector database, and OCR pipeline.

Can you run RAG (retrieval-augmented generation) in an air-gapped environment?

Yes. Every component of a RAG pipeline can run locally: document ingestion via Apache Tika and Tesseract OCR, embedding generation with local models like E5 or Voyage, vector storage in self-hosted Qdrant or Milvus, and LLM inference through vLLM or llama.cpp serving Llama 3 or Mistral. The entire pipeline operates without any external API calls or internet connectivity.

Which compliance frameworks require air-gapped infrastructure?

Several frameworks either require or strongly recommend air-gapped environments. ITAR (International Traffic in Arms Regulations) requires that controlled technical data remain on U.S.-controlled infrastructure with no unauthorized foreign access. CMMC Level 3 mandates strict CUI protection that is easiest to achieve with network isolation. FedRAMP High and DoD IL4/IL5 require hardened environments for classified and controlled unclassified information. CJIS (Criminal Justice Information Services) requires isolated processing for criminal justice data.

How do you update an air-gapped AI system?

Updates are delivered via encrypted physical media (USB drives or portable hard drives) transported through secure courier or hand-carry. Each update package includes model weights, software patches, and security fixes, all cryptographically signed. The receiving system verifies signatures and checksums before applying updates. Organizations typically run quarterly model refresh cycles and monthly security patches using this process.

What is the cost of air-gapped AI deployment?

Hardware costs range from $5,000 to $25,000 depending on model size and throughput requirements. A standard configuration for Llama 3 8B with a single A10G GPU and supporting hardware runs approximately $8,000 to $12,000. Enterprise configurations for 70B models with A100 GPUs start around $15,000 to $25,000. Software licensing for commercial air-gapped AI platforms typically starts at $1,500 per month for single-site deployments. After initial hardware investment, marginal costs are primarily electricity and maintenance.

Can open-source LLMs run in air-gapped environments?

Yes, and they are the primary choice for air-gapped deployments. Models like Meta's Llama 3 (8B and 70B), Mistral, and Qwen are available for download and can be served locally using open-source inference engines like vLLM, llama.cpp, or TGI. Because the model weights are fully owned and run locally, there are no licensing callbacks or internet requirements during inference. This makes open-source models ideal for disconnected environments.

Air-Gapped AI Deployment: The Complete Guide for Secure Enterprises

← Back to Blog

Enterprise AI

Air-Gapped AI Deployment: The Complete Guide for Secure Enterprises

Zedly AI Editorial Team February 11, 2026 15 min read

Most guides to "on-premise LLM deployment" assume your servers still talk to the internet. They walk through GPU selection, inference engines, and scaling strategies, all useful, but none of it matters if your compliance framework requires true network isolation. Defense contractors working in SCIFs, healthcare systems processing patient records under HIPAA, and financial institutions handling insider information all need something different: AI that works with zero external connectivity.

This guide covers what an air-gapped AI deployment actually requires, from hardware and software architecture to compliance mapping, update strategies, and vendor evaluation. If your data cannot leave the building, this is the reference you need.

What Air-Gapped AI Actually Means (and Why "On-Premise" Is Not Enough)

The term "on-premise" has become a catch-all that obscures critical security differences. In practice, there are four distinct deployment models, and the gap between them is wider than most vendor marketing suggests:

Cloud SaaS: Your data travels to the vendor's infrastructure over the public internet. Processing happens on shared or dedicated cloud servers. This is the fastest to deploy and the least controlled.
Private Cloud / VPC: A dedicated instance runs in your cloud account (AWS, Azure, GCP) or the vendor's cloud as a single-tenant environment. Your data stays within a defined network boundary, but the servers still have internet connectivity for updates, licensing, and telemetry.
On-Premise (connected): Hardware sits in your data center. You control physical access. But the servers maintain outbound internet connections for software updates, model downloads, license verification, and often vendor support tunnels. Most "on-premise" guides describe this model.
Air-Gapped (fully disconnected): No internet. No outbound connections. No DNS resolution. No NTP sync to external servers. Every component of the AI stack runs locally, and updates arrive on physical media. This is the model required for classified environments, ITAR-controlled data, and the strictest compliance postures.

The distinction matters because compliance auditors, procurement officers, and security teams evaluate these models differently. An "on-premise" deployment that phones home for license checks or sends anonymized telemetry to the vendor may still violate ITAR controls or fail a CMMC assessment. When your regulatory framework says "no data egress," it means none.

Key distinction: Air-gapped does not just mean "no cloud." It means no network path exists between the AI system and any external network. Even a firewall rule allowing outbound HTTPS to a licensing server disqualifies the deployment from true air-gap status in most compliance frameworks.

Three Deployment Models Compared: VPC, On-Premise, and Air-Gapped

Choosing between deployment models involves trade-offs across security, operational overhead, compliance coverage, and cost. The right answer depends on what data you process and which regulatory frameworks govern your organization.

Dimension	Private Cloud / VPC	On-Premise (Connected)	Air-Gapped (Offline)
Network requirement	Internet via PrivateLink or VPN	Outbound internet for updates	None. Fully disconnected.
Data egress risk	Low (within your cloud account)	Minimal (outbound only for updates)	Zero. No network path exists.
Update mechanism	Automated via secure channel	Pull from vendor over HTTPS	Encrypted physical media
Compliance coverage	SOC 2, HIPAA, GDPR	SOC 2, HIPAA, GDPR, partial CMMC	ITAR, CMMC L3, FedRAMP High, IL4/IL5, CJIS
Time to deploy	1-2 weeks	2-6 weeks	4-12 weeks
IT staff required	Cloud ops team	On-site sysadmin + GPU expertise	Cleared personnel with physical access
Estimated hardware cost	$0 (cloud compute billing)	$8,000-$25,000	$8,000-$25,000 + secure facility

For organizations evaluating these options in detail, our self-hosted AI deployment page covers each model with technical specifications, supported operating systems, and delivery methods.

The Compliance Case: Which Regulations Require Air-Gapping

Not every organization needs a fully disconnected deployment. But several regulatory frameworks either mandate or strongly favor air-gapped infrastructure for AI workloads that process controlled or sensitive data.

ITAR (International Traffic in Arms Regulations)

ITAR controls the export of defense-related articles and services. Technical data subject to ITAR cannot be stored on or transmitted through systems accessible to foreign nationals or foreign-hosted infrastructure. For defense contractors using AI to analyze technical manuals, engineering drawings, or maintenance records, an air-gapped deployment ensures that controlled data never traverses a network path that could reach non-U.S. infrastructure. Cloud deployments, even on AWS GovCloud, require careful review of who has administrative access.

CMMC 2.0 (Cybersecurity Maturity Model Certification)

CMMC Level 3 requires implementation of controls from NIST SP 800-171 plus additional enhanced security requirements for Controlled Unclassified Information (CUI). While CMMC does not explicitly require air-gapping, the controls around media protection (MP), system and communications protection (SC), and access control (AC) are significantly easier to satisfy when the AI system has no external network connectivity. Many defense primes treat air-gapping as the default for CUI-processing AI tools.

FedRAMP High / DoD IL4-IL5

FedRAMP High baseline includes 421 controls, many of which address network boundary protection, data flow enforcement, and continuous monitoring. DoD Impact Level 4 (IL4) covers CUI, and IL5 covers CUI and mission-critical data. For AI systems processing documents at these impact levels, air-gapped deployment eliminates entire categories of control requirements related to network security monitoring and boundary defense, because there is no boundary to defend.

HIPAA

HIPAA does not require air-gapping. Many healthcare organizations process PHI with cloud-hosted AI tools under a BAA. However, large health systems, research hospitals, and organizations with particularly sensitive datasets (psychiatric records, substance abuse treatment, genomic data) often choose air-gapped deployment for maximum control. For a detailed analysis of deployment models and BAA requirements, see our HIPAA-compliant AI document processing guide.

CJIS (Criminal Justice Information Services)

The CJIS Security Policy governs access to criminal justice data, including National Crime Information Center (NCIC) records, fingerprint data, and case files. AI systems processing CJIS data must operate within CJIS-compliant enclaves with strict access controls. Air-gapped deployment is not explicitly mandated, but it is the most straightforward path to satisfying the network security, encryption, and access control requirements.

Practical note: Even when a regulation does not explicitly require air-gapping, the audit burden of proving network security for a connected system can exceed the operational burden of maintaining a disconnected one. Many organizations choose air-gapping not because it is required, but because it simplifies their compliance story.

Architecture of an Air-Gapped Document AI Stack

A functional air-gapped AI system is more than an LLM running on a GPU. Document intelligence requires a full pipeline: ingestion, parsing, OCR, chunking, embedding, vector storage, retrieval, and inference. Every component must run locally with no external dependencies.

The Complete Offline Stack

Here is a reference architecture for an air-gapped document AI deployment. Every component runs on local hardware with zero internet connectivity:

LLM Inference: Llama 3 (8B or 70B) served via vLLM or llama.cpp. Open-source models with no licensing callbacks. Supports chat, summarization, extraction, and question-answering over retrieved context.
Embedding Generation: Local Voyage or E5 models generate vector representations of document chunks. These models run on the same GPU or a dedicated embedding server, depending on throughput requirements.
Vector Database: Self-hosted Qdrant or Milvus stores and indexes embeddings for semantic search. Both support single-node deployment suitable for air-gapped environments without requiring distributed cluster management.
Document Parsing and OCR: Apache Tika handles format detection and text extraction for PDFs, Word documents, spreadsheets, and images. Tesseract OCR processes scanned documents and image-based PDFs. Both are open-source with no external dependencies.
Orchestration Layer: A containerized application (Docker or Podman) coordinates the pipeline: receives documents, routes them through parsing and OCR, chunks the text, generates embeddings, stores vectors, and serves queries through the LLM with retrieved context.

How Data Flows (Everything Stays Inside)

The critical property of this architecture is that data never leaves the security boundary at any stage:

Document upload: User uploads a PDF, Word document, or image through the local web interface or API.
Parsing: Tika extracts text. Tesseract handles scanned pages. All processing happens in local memory.
Chunking: The extracted text is split into semantically meaningful segments (typically 512-1024 tokens).
Embedding: Each chunk is converted to a vector using the local embedding model. No API calls.
Storage: Vectors are indexed in the local Qdrant/Milvus instance. Original documents are stored on encrypted local disk.
Query: When a user asks a question, the query is embedded locally, relevant chunks are retrieved from the vector database, and the LLM generates a response grounded in the retrieved context.
Response: The answer is returned to the user with source citations. Nothing is logged externally.

At no point does any data, whether the original document, extracted text, embeddings, queries, or responses, leave the local system.

Hardware Requirements

Configuration	GPU	RAM	Storage	Model Supported	Typical Use Case
Standard	NVIDIA A10G (24GB VRAM)	64GB	1TB NVMe	Llama 3 8B	Departmental document search, contract review, summarization
Analyst	NVIDIA A100 (80GB) or 2x A6000	128GB	2TB NVMe	Llama 3 70B	Enterprise reasoning, multi-document analysis, complex extraction

Both configurations support Ubuntu 22.04 LTS and Red Hat Enterprise Linux (RHEL). The entire stack runs in Docker containers (or Podman for environments that prohibit the Docker daemon). For detailed hardware specifications and pricing, see our hardware requirements breakdown.

Use Cases: Who Needs Air-Gapped AI

Air-gapped deployment is not for everyone. The operational overhead is real: physical media updates, manual patching, and limited vendor support. But for organizations handling the following data types, it is the only option that fully satisfies their security and compliance requirements.

Defense Contractors and SCIFs

A defense contractor analyzing ITAR-controlled technical data for a weapons system cannot send that data to any external service, period. In a SCIF (Sensitive Compartmented Information Facility), the AI system must run on hardware inside the facility with no network connectivity to the outside. Use cases include analyzing engineering specifications, searching maintenance manuals, extracting data from technical orders, and summarizing test reports. The system arrives pre-loaded on encrypted drives, is installed by cleared personnel, and receives updates through the same physical channel.

Healthcare and Life Sciences

While HIPAA allows cloud processing with a BAA, some healthcare organizations choose air-gapping for their most sensitive workloads: psychiatric records, substance abuse treatment data, genomic research, and clinical trial documents. A hospital system running an air-gapped AI can process patient records, generate clinical note summaries, and extract data from lab reports without any data leaving the hospital network. This eliminates entire categories of breach risk and simplifies the compliance narrative for auditors.

Financial Institutions

Investment banks processing M&A due diligence documents, insider information, and trading strategies face strict information barriers. An air-gapped AI system in the deal room can analyze thousands of documents, surface risk factors, and generate summaries without any data crossing the Chinese wall. Similarly, regulatory compliance teams use offline AI to review years of communications for potential violations, a task where data leakage could trigger SEC enforcement actions.

Critical Infrastructure

Energy companies, water utilities, and transportation operators increasingly use AI to analyze operational documents, maintenance logs, and incident reports. These organizations often maintain strict OT/IT segmentation (operational technology separated from information technology networks). An air-gapped AI deployment that lives on the OT side can process operational documents without creating a network bridge that violates segmentation requirements.

Legal Discovery and Privileged Documents

Law firms handling large-scale discovery or privileged attorney-client communications need AI that processes documents without exposing them to any third party. An air-gapped system can ingest a million-page production set, build a searchable index, and answer questions about the documents, all within the firm's security perimeter. For more on how private AI serves legal teams, see our guide to private AI adoption in the enterprise.

Keeping an Air-Gapped System Current: Updates, Patching, and Model Refresh

The biggest operational challenge with air-gapped deployments is not the initial installation; it is keeping the system current. Without internet connectivity, every update requires deliberate, physical intervention. Organizations that plan for this upfront avoid the common failure mode of deploying a system and letting it slowly become outdated and insecure.

The Physical Media Update Cycle

Updates are packaged on encrypted drives (typically AES-256 encrypted USB thumb-drive or portable NVMe) and delivered through secure courier or hand-carry by cleared personnel. Each update package includes:

Model weights: New or updated LLM and embedding model files
Software patches: Container image updates for the application stack
Security fixes: OS-level CVE patches, library updates, and dependency upgrades
Configuration updates: Tuned parameters, prompt templates, or new document processing rules

Each package is cryptographically signed by the vendor. The receiving system verifies the signature and computes checksums before applying any updates. If verification fails, the update is rejected and the incident is logged for investigation.

Model Versioning Without Internet

Air-gapped environments cannot download model updates from Hugging Face or pull container images from Docker Hub. Instead, model versions are tracked in a local registry. Best practices include:

Side-by-side deployment: Install the new model alongside the existing one. Run validation tests against a known document set before switching production traffic.
Rollback capability: Keep the previous model version available for immediate rollback if the new version introduces regressions.
Validation suite: Maintain a set of test documents with expected outputs. Run automated comparisons after each model update to catch accuracy changes.

Recommended Update Cadence

Security patches: Monthly, or as critical CVEs are published
Model refresh: Quarterly, aligned with open-source model release cycles
Full platform updates: Semi-annually, including major version upgrades and new capabilities

Common mistake: Deploying an air-gapped system and neglecting updates. Model quality improves rapidly in the open-source ecosystem. A system running a model from 12 months ago is leaving significant capability on the table. Build the update cycle into your operations plan from day one, including budget for cleared courier services and technician time.

Evaluating Air-Gapped AI Vendors: 10 Questions to Ask

Not every vendor claiming "on-premise" support can actually deliver a fully air-gapped deployment. These questions separate vendors with genuine offline capability from those offering connected on-premise with a marketing overlay.

How is the software delivered? Look for encrypted physical media delivery (USB dongle, portable drives). If the vendor says "download from our portal," the product was not designed for air-gapped environments.
Does the software phone home? Any licensing check, telemetry ping, or automatic update mechanism that requires internet connectivity disqualifies the product from air-gapped use. Ask for a network traffic analysis or architecture diagram showing zero outbound connections.
Which LLM models are included? The vendor should specify exactly which models ship with the product (e.g., Llama 3 8B, Llama 3 70B, Mistral). Ask about model licensing terms for offline use.
What is the complete software stack? You need the full list: inference engine, embedding model, vector database, OCR pipeline, document parser, and orchestration layer. Each component must run locally.
What hardware is required? The vendor should provide specific GPU, RAM, and storage requirements for each supported model size. Vague answers like "a modern server" indicate the product has not been validated for air-gapped deployment.
How are updates delivered? Expect a defined process: encrypted media, cryptographic signing, checksum verification, and rollback capability. Ask about update frequency and what each update includes.
What compliance certifications apply? Look for SOC 2 Type II, and ask specifically about ITAR, CMMC, or FedRAMP applicability. Certifications for the vendor's cloud product do not automatically extend to their air-gapped offering.
How is support provided? In a disconnected environment, remote support tunnels do not work. Ask about on-site support procedures, documentation quality, and support hours for cleared facilities.
Can cleared personnel install the system? For SCIF and classified deployments, your cleared staff may need to perform the installation. The vendor should provide detailed installation guides that your team can execute independently.
What is the pricing model? Air-gapped deployments typically use site licensing (annual or perpetual) rather than per-user or per-token pricing. Clarify what the license covers: software only, software plus updates, or software plus updates plus on-site support.

Zedly's self-hosted AI platform supports all three deployment models: private cloud (VPC), on-premise (Docker/Helm), and fully air-gapped (offline delivery via physical media). The air-gapped configuration includes local Llama 3 inference, local embedding models, self-hosted vector storage, and Apache Tika/Tesseract for document processing, with zero internet dependency.

Getting Started: From Evaluation to Deployment

Most organizations follow a phased approach to air-gapped AI deployment:

Pilot on connected infrastructure. Start with a VPC or connected on-premise deployment to validate the platform against your document types and use cases. This proves value before investing in the air-gapped configuration.
Validate compliance requirements. Work with your compliance team to confirm which regulatory framework applies and whether true air-gapping is required or recommended. Document the decision for auditors.
Procure hardware. Order GPU servers that meet the vendor's specifications. Factor in 8-12 week lead times for enterprise GPU hardware.
Plan the update cycle. Before deployment, establish the physical media delivery process, designate cleared personnel for updates, and budget for ongoing maintenance.
Deploy and test. Install the air-gapped system, run the validation suite against known documents, and confirm that no network connectivity exists. Have your security team verify the air gap independently.

The organizations that succeed with air-gapped AI treat it as an operational commitment, not a one-time installation. The technology is mature, the open-source model ecosystem is strong, and the compliance benefits are clear. What separates successful deployments from abandoned ones is the planning that happens before the hardware arrives. For a broader look at the commercial landscape and a buyer's checklist covering stack components, vendor red flags, and compliance mapping, see our self-hosted document AI buyer's guide.

Deploying AI for government or defense?

Zedly AI supports air-gapped, VPC, and managed cloud deployment. See use cases for Government & Defense AI, or compare deployment options: Air-Gapped / On-Premise · VPC · Secure Cloud.

Comparing enterprise AI platforms?

See a detailed breakdown of deployment, compliance, pricing, and document features.

Zedly vs ChatGPT Enterprise →

Ready to get started?

Private-by-design document analysis with strict retention controls.

Talk to Sales