microsoft/markitdown
This is an external open-source GitHub repository imported into the WOCSOL Marketplace for discovery. The original repository owner is the primary creator.
Repository Details
Repository Description
Python tool for converting files and office documents to Markdown.
Python tool for converting files and office documents to Markdown.
# MarkItDown [](https://pypi.org/project/markitdown/)  [](https://github.com/microsoft/autogen) > [!IMPORTANT] > MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest `convert_*` function needed for your use case (e.g., `convert_stream()`, or `convert_local()`). See the [Security Considerations](#security-considerations) section of the documentation for more information. MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. To this end, it is most comparable to [textract](https://github.com/deanmalmgren/textract), but with a focus on preserving important document structure and content as Markdown (including: headings, lists, tables, links, etc.) While the output is often reasonably presentable and human-friendly, it is meant to be consumed by text analysis tools -- and may not be the best option for high-fidelity document conversions for human consumption. MarkItDown currently supports the conversion from: - PDF - PowerPoint - Word - Excel - Images (EXIF metadata and OCR) - Audio (EXIF metadata and speech transcription) - HTML - Text-based formats (CSV, JSON, XML) - ZIP files (iterates over contents) - Youtube URLs - EPubs - ... and more! ## Why Markdown? Markdown is extremely close to plain text, with minimal markup or formatting, but still provides a way to represent important document structure. Mainstream LLMs, s
Related Repositories
Product Discussion
Ask questions or discuss this product. New comments are reviewed before publishing.
Loading comments...