📓

Microsoft 365 Connector

OneNote Extractor

Extract every notebook, section, and page from OneNote — converting rich HTML content to clean searchable text and processing any embedded Office attachments alongside it.

Get started free See the pipeline

HTML

Page format

BS4

Text parser

Secure Auth

Authentication

100%

Native extraction

Structure

Full notebook hierarchy, fully extracted

ChunkIQ traverses the complete OneNote structure — from top-level notebooks all the way to individual pages and their attachments.

📚

Notebooks

All notebooks in the user's account

📑

Section Groups

Nested section group folders

📋

Sections

Individual sections within notebooks

📄

Pages

HTML-rendered page content

📎

Attachments

Office files embedded in pages

Capabilities

OneNote content, made searchable

OneNote stores pages as HTML — ChunkIQ parses that HTML into clean text while preserving headings, lists, and table structure.

🌐

HTML-to-Text Parsing

OneNote pages are exported as HTML. ChunkIQ strips tags, decodes entities, and extracts clean, readable text with structural context preserved.

📎

Embedded Attachment Extraction

OneNote pages often contain attached .docx, .xlsx, or .pdf files. ChunkIQ downloads and processes these attachments, linking them back to the parent page.

🗺️

Page-Level Metadata

Every chunk is tagged with notebook name, section group, section name, page title, creation date, and last-modified timestamp for precise filtering.

🔌

Microsoft 365 Integration

Uses the OneNote-specific Microsoft Graph endpoints (/me/onenote/notebooks) with Notes.Read permissions for complete, authorised access.

🔄

Incremental Sync

Pages are identified by their unique Graph ID. On subsequent runs, only pages modified since the last extraction are re-processed.

🔒

Works With Class & Personal Notebooks

Extracts both personal notebooks and shared notebooks from Microsoft Teams. All content stays within your Azure tenant throughout.

How it works

From OneNote page to searchable chunk

Step 01

🔑

Authenticate

Authenticates via Azure AD with Notes.Read (or Notes.Read.All for admin access) to enumerate notebooks across the tenant.

Step 02

📚

Traverse Hierarchy

Enumerates all notebooks → section groups → sections → pages recursively. Downloads each page's HTML content and any file attachments.

Step 03

📄

Parse & Chunk

HTML is parsed into clean text. Embedded attachments are processed by dedicated extractors. All output is chunked and tagged.

Step 04

⚡

Embed & Index

Each chunk is embedded with Azure OpenAI and upserted to Azure AI Search, ready for hybrid semantic search.