Sign up at datalab.to/auth/sign_up — new accounts include $5 in free credits, enough to process hundreds of pages.Then grab your API key from the API Keys dashboard.
Want to try before writing code? Upload a document to the Forge Playground to see results instantly — no API key required.
The SDK provides a simple interface to convert documents to Markdown, HTML, JSON, or chunks.
from datalab_sdk import DatalabClientclient = DatalabClient() # Uses DATALAB_API_KEY env var# Convert PDF to markdownresult = client.convert("document.pdf")print(result.markdown)# Save output and imagesresult.save_output("output/")
Common mistakes:
Forgetting to set the DATALAB_API_KEY environment variable
Using file_url with a private/authenticated URL (must be publicly accessible)
Not polling for results — the initial response only contains a request_id, not the actual output
# Convert a single documentdatalab convert document.pdf --format markdown# Convert with optionsdatalab convert document.pdf --mode accurate --paginate# Convert a directorydatalab convert ./documents/ --output_dir ./output/
Pipelines chain processors (convert, extract, segment) into a single reusable call. Create them in Forge or via the SDK:
from datalab_sdk import DatalabClientclient = DatalabClient()# Run an existing pipelineexecution = client.run_pipeline( "pl_abc123", # Your pipeline ID file_path="document.pdf")# Poll until completeexecution = client.get_pipeline_execution( execution.execution_id, max_polls=300)# Get extraction results (step index 1 = extract step)result = client.get_step_result(execution.execution_id, step_index=1)print(result)
See Pipelines for creating, versioning, and running pipelines.