Sources How it works Technology Pricing Support Products πŸ“ SharePoint Extractor πŸ’¬ Teams & Search Portal πŸ““ OneNote Extractor ☁️ OneDrive Extractor ⚑ Processor 🏒 Enterprise ☁️ Enterprise Cloud Book a Demo Sign in
πŸ““
Microsoft 365 Connector

OneNote Extractor

Extract every notebook, section, and page from OneNote β€” converting rich HTML content to clean searchable text and processing any embedded Office attachments alongside it.

Get started free See the pipeline
HTML
Page format
BS4
Text parser
Secure Auth
Authentication
100%
Native extraction

Full notebook hierarchy, fully extracted

ChunkIQ traverses the complete OneNote structure β€” from top-level notebooks all the way to individual pages and their attachments.

πŸ“š

Notebooks

All notebooks in the user's account

πŸ“‘

Section Groups

Nested section group folders

πŸ“‹

Sections

Individual sections within notebooks

πŸ“„

Pages

HTML-rendered page content

πŸ“Ž

Attachments

Office files embedded in pages

OneNote content, made searchable

OneNote stores pages as HTML β€” ChunkIQ parses that HTML into clean text while preserving headings, lists, and table structure.

🌐

HTML-to-Text Parsing

OneNote pages are exported as HTML. ChunkIQ strips tags, decodes entities, and extracts clean, readable text with structural context preserved.

πŸ“Ž

Embedded Attachment Extraction

OneNote pages often contain attached .docx, .xlsx, or .pdf files. ChunkIQ downloads and processes these attachments, linking them back to the parent page.

πŸ—ΊοΈ

Page-Level Metadata

Every chunk is tagged with notebook name, section group, section name, page title, creation date, and last-modified timestamp for precise filtering.

πŸ”Œ

Microsoft 365 Integration

Uses the OneNote-specific Microsoft Graph endpoints (/me/onenote/notebooks) with Notes.Read permissions for complete, authorised access.

πŸ”„

Incremental Sync

Pages are identified by their unique Graph ID. On subsequent runs, only pages modified since the last extraction are re-processed.

πŸ”’

Works With Class & Personal Notebooks

Extracts both personal notebooks and shared notebooks from Microsoft Teams. All content stays within your Azure tenant throughout.

From OneNote page to searchable chunk

Step 01
πŸ”‘

Authenticate

Authenticates via Azure AD with Notes.Read (or Notes.Read.All for admin access) to enumerate notebooks across the tenant.

Step 02
πŸ“š

Traverse Hierarchy

Enumerates all notebooks β†’ section groups β†’ sections β†’ pages recursively. Downloads each page's HTML content and any file attachments.

Step 03
πŸ“„

Parse & Chunk

HTML is parsed into clean text. Embedded attachments are processed by dedicated extractors. All output is chunked and tagged.

Step 04
⚑

Embed & Index

Each chunk is embedded with Azure OpenAI and upserted to Azure AI Search, ready for hybrid semantic search.

Built for OneNote at scale

OneNote API
Graph /me/onenote/notebooks
Auth Scope
Notes.Read Β· Notes.Read.All
Page Parser
Native HTML parser
Attachment Extraction
Native format parsers
Storage
Azure Data Lake Storage Gen2
Search
Azure AI Search (Hybrid + Semantic)

Make your OneNote knowledge searchable

Every notebook, every section, every page β€” automatically extracted and indexed for AI-powered retrieval.

Create your account View full pipeline β†’