⚡

End-to-End Automation

ChunkIQ Processor

The complete processing engine — document extraction, semantic chunking, vector embedding, Azure AI Search indexing, and hybrid semantic search — fully automated and run on Azure Functions.

Get started free See the tech stack

Pipeline stages

<5 min

Full run time

Source platforms

Zero

Manual steps

Pipeline Stages

Four stages. Fully automated.

Each stage is independent, observable, and runs on Azure Functions — trigger the full pipeline on demand or on a schedule.

📥

Ingest

The connector authenticates to your Microsoft 365 tenant and enumerates all files across the configured source platforms. Files are downloaded to Azure Data Lake Storage Gen2 with full provenance metadata and content-hash deduplication to skip unchanged files.

📁 SharePoint 💬 Microsoft Teams 📓 OneNote ☁️ OneDrive

Microsoft 365 Connector ADLS Gen2 Content hashing Delta sync

📄

Document Extraction

A format-aware dispatcher routes each file to the appropriate extractor. No external OCR or Document Intelligence service is required — all parsing runs natively, keeping costs minimal and extraction fully portable.

.pdf .docx .xlsx / .xlsm .pptx .html csv / json / utf-8 → structured

✂️

Chunk & Embed

Extracted text is split into semantic chunks using a hybrid chunking strategy — respecting paragraph boundaries while keeping chunk sizes within the token budget. Each chunk is then embedded using Azure OpenAI's text-embedding-3-small model, producing a 1,536-dimensional vector per chunk.

Hybrid semantic chunker Token-aware splitting Azure OpenAI Embeddings 1,536-dim vectors Provenance metadata

⚡

Index & Search

Chunks are upserted to Azure AI Search with their embedding vectors and all metadata fields. The index supports hybrid BM25 + vector search with semantic re-ranking via Reciprocal Rank Fusion — delivering best-in-class retrieval accuracy for RAG pipelines and search applications.

Azure AI Search BM25 keyword search HNSW vector index RRF score fusion Semantic re-ranking

Orchestration

Built for reliability at scale

Azure Functions runtime with full observability, error handling, and incremental processing — designed to run reliably on large tenants.

⚙️

Azure Functions Runtime

Each pipeline stage is a separate Azure Function. Stages can be triggered individually or run end-to-end on a timer trigger or HTTP call from the Laravel portal.

🔁

Incremental Processing

Content hashing on ingest and delta query tokens on OneDrive/SharePoint ensure only changed content is re-processed, keeping run times short on large tenants.

📊

Pipeline Monitoring

The Laravel dashboard shows live status of each source platform, chunks indexed, active sources, and last pipeline run time — all in one place.

🔀

Parallel Extraction

Files are processed in parallel across Azure Functions workers. Large batches of documents are extracted concurrently to minimise total pipeline run time.

🛡️

Error Isolation

Each file is processed in an isolated try/except block. A malformed document causes a logged warning and skips to the next file — it never halts the pipeline.

🔒

Zero Data Egress

All compute and storage runs within your Azure subscription. Managed identity authentication throughout — no API keys stored in code or config files.

End-to-End

From file to searchable in under 5 minutes

Typical run times on a mid-size Microsoft 365 tenant.

~60s

📥

Ingest

File enumeration and download to ADLS Gen2. Time scales with number of new/changed files, not total tenant size.

~90s

📄

Extraction

Parallel extraction across all file types. PDF and PowerPoint files are typically the most time-intensive format to parse.

~60s

✂️

Chunking & Embedding

Chunking is near-instant. Embedding time is proportional to the number of new chunks — batched calls to Azure OpenAI keep latency low.

~30s

⚡

Indexing

Batched upsert to Azure AI Search. The index is updated incrementally — live search continues to work throughout the upsert.

ChunkIQ Processor

Four stages. Fully automated.

Ingest

Document Extraction

Chunk & Embed

Index & Search

Built for reliability at scale

Azure Functions Runtime

Incremental Processing

Pipeline Monitoring

Parallel Extraction

Error Isolation

Zero Data Egress

From file to searchable in under 5 minutes

Ingest

Extraction

Chunking & Embedding

Indexing

Every component, at a glance

Process, index, and search — all in one product