Sources How it works Technology Pricing Support Products 📁 SharePoint Extractor 💬 Teams & Search Portal 📓 OneNote Extractor ☁️ OneDrive Extractor ⚡ Processor 🏢 Enterprise ☁️ Enterprise Cloud Book a Demo Sign in
📁
Microsoft 365 Connector

SharePoint Extractor

Connect to any SharePoint site, crawl document libraries, and extract every file — PDFs, Word docs, Excel sheets, PowerPoints, and more — directly into the ChunkIQ pipeline.

Get started free See the pipeline
7+
File formats
Sites & libraries
Secure Auth
Authentication
100%
Native extraction

Everything from every SharePoint library

Automatic discovery, recursive crawling, and format-specific extraction — all without leaving your Azure tenant.

🔌

Microsoft 365 Integration

Authenticates via Azure AD app registration using client credentials. Discovers all document libraries across every site collection automatically.

📂

Recursive Library Crawling

Traverses nested folder structures of any depth. Captures full file paths, modification dates, and author metadata for every item.

📎

Embedded Attachment Extraction

Detects and extracts files embedded inside Word, Excel, and PowerPoint documents. Each attachment is processed and indexed with lineage back to its parent file.

🔄

Incremental Sync

Content hashing ensures only new or modified files are re-processed on subsequent runs. Keeps the index fresh without reprocessing unchanged content.

🗺️

Rich Provenance Metadata

Every chunk is tagged with site URL, library name, folder path, file name, page number, chunk index, content type, and last-modified timestamp.

🔒

Stays in Your Tenant

Files are ingested directly to Azure Data Lake Storage Gen2 within your own subscription. No data leaves your Azure environment at any point.

Every Office format, natively extracted

Native extraction — no Azure Document Intelligence or OCR service required.

📄 .pdf
📝 .docx
📊 .xlsx
📽️ .pptx
🧮 .xlsm
📊 .xls
📋 .csv csv module
📃 .txt / .md utf-8 decode

From SharePoint to searchable index in 4 steps

Step 01
🔑

Authenticate

ChunkIQ authenticates to your Microsoft 365 tenant via an Azure AD app registration with the required SharePoint and Files.Read permissions.

Step 02
🔍

Discover & Crawl

ChunkIQ enumerates all site collections, document libraries, and folder hierarchies. Files are downloaded to Azure Data Lake Storage Gen2.

Step 03
📄

Extract & Chunk

Dedicated extractors parse each file format, clean the text, and split it into semantic chunks with token-based length control.

Step 04

Embed & Index

Each chunk is embedded with Azure OpenAI and pushed to Azure AI Search for hybrid BM25 + vector + semantic retrieval.

Built on Microsoft Graph + Azure

Ingest
Microsoft 365 Connectors
Auth
Azure AD — Client Credentials
Storage
Azure Data Lake Storage Gen2
Document Extraction
Native format parsers
Chunking
Hybrid chunker

Start extracting your SharePoint content

Connect your Microsoft 365 tenant and have your SharePoint documents extracted, chunked, and indexed in minutes.

Create your account View full pipeline →