Private AI for confidential files — what to look for
Most 'free' AI tools train on your data by default. Here's how to actually keep your confidential documents private when using AI.
The default state for most AI tools: your inputs may be used for training, support, or analytics unless you explicitly opt out. For confidential documents — client files, internal strategy, PHI, attorney-client privileged material — that default is unacceptable.
Here's what to actually look for, and how SeekFiles AI's privacy model compares.
The privacy spectrum, ranked
Worst: free-tier general chatbots
- ChatGPT free, Claude free, Gemini free — by default, inputs may be used for training. Opt-out requires per-account configuration (and you should always verify it's still on).
- Risk: the document you upload could surface in a future model's training. The vendor is unlikely to admit it; you can't prove it didn't happen.
Middling: paid consumer tiers
- ChatGPT Plus, Claude Pro, Gemini Advanced — usually has training-opt-out by default but with caveats. Always read the current data-use page.
- Risk: still consumer-grade contracts. Limited legal recourse if data is misused.
Better: business / team tiers
- ChatGPT Business, Claude Teams, Gemini for Workspace — contractual no-training clauses. Standard for business use.
- Risk: still multi-tenant; data lives on the vendor's infrastructure. Your privacy depends on their security.
Best: dedicated business/enterprise + on-prem options
- Vendors with explicit no-training contracts + audit rights + per-tenant deployment options.
- Risk: mainly operational — vendor stability, key management, breach response.
Outlier: self-hosted local models
- Run Llama / Mistral / DeepSeek locally on your own hardware. No data leaves your machine.
- Risk: model quality lags frontier providers by ~6–12 months; setup is real work; you own the infrastructure problem.
What SeekFiles AI offers
- Contractual no-training clause on the Pro / Business tiers. Your files never train any model.
- Per-tenant data isolation. Your files are scoped to your account; no cross-tenant retrieval.
- Encrypted at rest for all stored files.
- Configurable storage location for enterprise customers (use your own S3 / R2 / Spaces bucket).
- Audit logs showing every access to your files.
- No third-party retraining. OpenAI's API (for embeddings + chat) is configured with no-training enabled.
What we don't offer:
- Self-hosted on-prem (we'd need an enterprise engagement).
- Encryption at rest with your keys (vendor manages keys today).
- HIPAA Business Associate Agreement out-of-the-box (available on the Business tier for healthcare customers).
Questions to ask any AI vendor
- Is my data used for training? Where is that documented?
- Can I see an audit log of every access to my data?
- Where is my data stored geographically?
- Who at the vendor can access my data? Is there a process?
- What's your breach notification policy?
- Do you sign DPAs / BAAs? In what tier?
- Can I delete my data permanently? What's the retention?
- Is the embedding / LLM provider subject to the same restrictions?
The last question matters: if the vendor's underlying provider (e.g. OpenAI) trains on inputs, the vendor's "no training" promise is broken.
What you can do today
- Audit your team's AI tool usage. What free tools is everyone uploading client docs to?
- Set a policy. "Confidential = Business-tier-or-above AI only" or "No client docs in any AI without GC approval."
- Test the worst case. Pretend a competitor got your AI-fed data. What would they learn? If that's bad, you need better privacy controls.
- Read the actual ToS, not the marketing. Privacy claims are often softer than they sound.
The honest truth
No AI is perfectly private. Every model provider has internal access for support, abuse-detection, and (sometimes) training. The question is: how strong are the contractual and technical controls?
For most confidential business workflows, a paid Business-tier AI with a contract is appropriate. For the highest-confidentiality work (M&A, criminal defence, trade secrets), self-hosted is still the gold standard. For everything in between, due diligence on the vendor matters more than the model brand.
Like this? Get the next one in your inbox.
Weekly tips on getting more out of your file library — RAG, retrieval tricks, and product updates. No spam.
Try it free
Ask your files anything. Get answers with citations.
50 welcome credits. 3 assistants. No credit card. Upload your first file in under two minutes.