Overview
Integration details
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
OpenDataLoader PDF | langchain-opendataloader-pdf | ✅ | ❌ | ❌ |
Loader features
Source | Document Lazy Loading | Native Async Support |
---|---|---|
OpenDataLoaderPDFLoader | ✅ | ❌ |
OpenDataLoaderPDFLoader
component enables you to parse PDFs into structured Document
objects.
Requirements
- Python >= 3.9
- Java 11 or newer available on the system
PATH
- opendataloader-pdf >= 1.1.1
Installation
Quick start
Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
file_path | List[str] | ✅ Yes | — | One or more PDF file paths or directories to process. |
format | str | No | None | Output formats (e.g. "json" , "html" , "markdown" , "text" ). |
quiet | bool | No | False | Suppresses CLI logging output when True . |
content_safety_off | Optional[List[str]] | No | None | List of content safety filters to disable (e.g. "all" , "hidden-text" , "off-page" , "tiny" , "hidden-ocg" ). |