Sources How it works Technology Pricing Support Products πŸ“ SharePoint Extractor πŸ’¬ Teams & Search Portal πŸ““ OneNote Extractor ☁️ OneDrive Extractor ⚑ Processor 🏒 Enterprise ☁️ Enterprise Cloud Book a Demo Sign in
☁️
Microsoft 365 Connector

OneDrive Extractor

Connect to personal and shared OneDrive drives across your organisation. Recursively crawl every folder, extract every supported document, and index it for AI-powered search.

Get started free See the pipeline
Live
Status
∞
Folder depth
Secure Auth
Authentication
100%
Native extraction

Personal drives, shared drives β€” all covered

ChunkIQ accesses every OneDrive drive type securely, with no manual configuration per user.

πŸ‘€

Personal Drive

Each user's personal OneDrive for Business drive. Files stored directly in My Files, including nested folders of any depth, are fully crawled and extracted.

βœ“ Live
πŸ‘₯

Shared Drives

Shared drives and document libraries shared with the user. ChunkIQ enumerates all accessible drives and includes them in the extraction run.

βœ“ Live
πŸ“

Nested Folder Structures

Traverses folders recursively regardless of nesting depth. Captures the full folder path in metadata so results can be filtered by directory in search.

βœ“ Live
πŸ”—

Shared With Me

Files shared with the authenticated user from other drives. ChunkIQ resolves the remote item references and includes them in the extraction queue.

βœ“ Live

Complete OneDrive coverage out of the box

πŸ”„

Recursive Folder Crawl

ChunkIQ traverses folder hierarchies of unlimited depth using efficient delta queries, capturing every file regardless of where it's stored.

πŸ“Ž

Attachment Extraction

Files embedded inside Word, Excel, and PowerPoint documents are extracted and indexed separately, each with lineage metadata back to the parent document.

πŸ”

Delta Sync

Uses OneDrive delta tokens to track changes since the last run. Only new, modified, or deleted items are processed β€” making large drives efficient to keep fresh.

πŸ—ΊοΈ

Drive-Aware Metadata

Every chunk records the drive ID, drive type, owner, folder path, file name, file size, and last modified date for precise filtering and attribution.

πŸ“„

All Supported File Types

Processes .pdf, .docx, .xlsx, .pptx, .xlsm, .csv, .json, .txt, and .md files found anywhere in the drive structure.

πŸ”’

Tenant-Isolated

All data is written to Azure Data Lake Storage Gen2 within your own subscription. Managed identity auth, no external data transfers.

From OneDrive to searchable index in 4 steps

Step 01
πŸ”‘

Authenticate

Authenticates via Azure AD with Files.Read.All to access all OneDrive drives in the tenant, including personal and shared drives.

Step 02
πŸ”

Enumerate Drives & Files

Lists all drives for each user, then recursively enumerates folders and files. Delta tokens are stored for efficient subsequent runs.

Step 03
πŸ“„

Extract & Chunk

Dedicated extractors process each file type. Text is cleaned, split into semantic chunks, and tagged with full drive/folder provenance metadata.

Step 04
⚑

Embed & Index

Chunks are vectorised with Azure OpenAI and pushed to Azure AI Search for hybrid BM25 + vector + semantic retrieval.

Built on Microsoft Graph + Azure

OneDrive API
Graph /me/drive Β· /drives endpoints
Auth Scope
Files.Read.All
Change Tracking
Graph Delta Query + delta tokens
Storage
Azure Data Lake Storage Gen2
Document Extraction
Native format parsers
Chunking
Hybrid chunker
Search
Azure AI Search (Hybrid + Semantic)

Index your entire OneDrive estate

Personal drives, shared drives, nested folders β€” all automatically extracted and ready for AI-powered search.

Create your account View full pipeline β†’