FAQ — ChunkIQ

ChunkIQ is an enterprise-grade unstructured data pipeline that automatically ingests documents from Microsoft 365 sources (SharePoint, Teams, OneNote, OneDrive), extracts text from all major file formats, chunks and embeds the content, and indexes it into Azure AI Search for powerful semantic search and RAG applications.

Getting started is simple: 1) Sign up and create a workspace. 2) Configure your Azure tenant credentials (App Registration, Storage Account, AI Search, and OpenAI). 3) Add a connector pointing to your SharePoint site, Teams channel, OneNote notebook, or OneDrive. 4) Run your first pipeline job. Documents will be ingested, extracted, chunked, embedded, and indexed — typically within minutes.

ChunkIQ currently supports four Microsoft 365 sources: SharePoint document libraries, Microsoft Teams channel files, OneNote notebooks (including embedded attachments), and OneDrive (personal and shared drives). Additional sources are on the roadmap.

All data stays entirely within your own Azure tenant. ChunkIQ stores ingested files in your Azure Data Lake Storage Gen2 account, and indexes are created in your Azure AI Search instance. No data is sent to or stored on ChunkIQ servers — the platform only orchestrates the pipeline within your infrastructure.

Yes. ChunkIQ is designed with enterprise security in mind. All data remains in your Azure tenant with encryption at rest and in transit. Authentication uses Azure App Registrations with least-privilege permissions. The platform supports role-based access control for workspace members, and all API communication is over HTTPS.

ChunkIQ requires the following Azure services in your tenant: Azure Data Lake Storage Gen2 (for document storage), Azure AI Search (for indexing and search), Azure OpenAI Service (for generating embeddings), and an Azure App Registration (for authenticating with Microsoft Graph and Azure resources).

ChunkIQ supports all major document formats natively — no external OCR service required. Supported formats include: PDF, Word (.docx, .doc), Excel (.xlsx, .xls, .xlsm), PowerPoint (.pptx, .ppt), OneNote pages, CSV, JSON, plain text, and Markdown. Each format has a dedicated extractor built into the pipeline.

After text is extracted from documents, it is split into semantic chunks with configurable size and overlap settings. Each chunk retains full provenance metadata (source, file path, page number, etc.). Chunks are then embedded using Azure OpenAI to produce 1,536-dimension vectors, which are indexed alongside BM25 full-text in Azure AI Search for hybrid retrieval.

Yes. ChunkIQ uses content hashing to detect changes. When you re-run a pipeline, only new or modified files are re-processed. Unchanged documents are skipped, keeping processing costs low and your index up-to-date without unnecessary reprocessing.

Yes. ChunkIQ automatically detects and extracts files embedded within Word, Excel, and PowerPoint documents. These attachments are processed alongside their parent document with full lineage tracking, so you can always trace a chunk back to its original source.

ChunkIQ offers flexible plans to match your needs: a Free tier for evaluation with limited document processing, a Starter plan for small teams and pilot projects, and an Enterprise plan with unlimited processing, priority support, and dedicated onboarding. Visit our Pricing page for detailed plan comparisons.

Absolutely. You can sign up for a free workspace to explore the platform and run test pipelines with limited document volumes. You can also book a demo with our team for a personalized walkthrough of the platform tailored to your use case.

You can reach us through multiple channels: submit a support ticket from your dashboard for technical issues, use the Contact Us form for general inquiries, or check this FAQ for quick answers. Enterprise plan customers also have access to priority support with faster response times.

Yes. Workspace admins can invite team members via email. Invited users receive a link to join the workspace. You can assign admin or member roles — admins can manage connectors, run jobs, and configure settings, while members have read-only access to the dashboard and job history.

Frequently Asked Questions

Still have questions?