Extract every notebook, section, and page from OneNote β converting rich HTML content to clean searchable text and processing any embedded Office attachments alongside it.
ChunkIQ traverses the complete OneNote structure β from top-level notebooks all the way to individual pages and their attachments.
All notebooks in the user's account
Nested section group folders
Individual sections within notebooks
HTML-rendered page content
Office files embedded in pages
OneNote stores pages as HTML β ChunkIQ parses that HTML into clean text while preserving headings, lists, and table structure.
OneNote pages are exported as HTML. ChunkIQ strips tags, decodes entities, and extracts clean, readable text with structural context preserved.
OneNote pages often contain attached .docx, .xlsx, or .pdf files. ChunkIQ downloads and processes these attachments, linking them back to the parent page.
Every chunk is tagged with notebook name, section group, section name, page title, creation date, and last-modified timestamp for precise filtering.
Uses the OneNote-specific Microsoft Graph endpoints (/me/onenote/notebooks) with Notes.Read permissions for complete, authorised access.
Pages are identified by their unique Graph ID. On subsequent runs, only pages modified since the last extraction are re-processed.
Extracts both personal notebooks and shared notebooks from Microsoft Teams. All content stays within your Azure tenant throughout.
Authenticates via Azure AD with Notes.Read (or Notes.Read.All for admin access) to enumerate notebooks across the tenant.
Enumerates all notebooks β section groups β sections β pages recursively. Downloads each page's HTML content and any file attachments.
HTML is parsed into clean text. Embedded attachments are processed by dedicated extractors. All output is chunked and tagged.
Each chunk is embedded with Azure OpenAI and upserted to Azure AI Search, ready for hybrid semantic search.
Every notebook, every section, every page β automatically extracted and indexed for AI-powered retrieval.