Documentation Index Fetch the complete documentation index at: https://documentation.datalab.to/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The form filling API lets you programmatically fill forms in PDFs and images. It works with both:
Native PDF forms - Forms with actual form fields
Image-based forms - Scanned forms or images with visual form layouts
The API matches your field data to form fields and returns a filled PDF or image.
Basic Usage
from datalab_sdk import DatalabClient, FormFillingOptions
client = DatalabClient()
options = FormFillingOptions(
field_data = {
"full_name" : { "value" : "John Doe" , "description" : "Full legal name" },
"date_of_birth" : { "value" : "1990-01-15" , "description" : "Date of birth" },
"address" : { "value" : "123 Main St, City, ST 12345" , "description" : "Mailing address" },
}
)
result = client.fill( "form.pdf" , options = options)
result.save_output( "filled_form.pdf" )
Option Type Default Description field_datadict Required Field names mapped to values and descriptions contextstr None Additional context to help match fields confidence_thresholdfloat 0.5Minimum confidence for field matching (0.0-1.0) max_pagesint None Maximum pages to process page_rangestr None Specific pages to process (e.g., "0-2"). For spreadsheets, filters by sheet index. skip_cachebool FalseSkip cached results
Each field in field_data is a dictionary with:
field_data = {
"field_key" : {
"value" : "The value to fill" ,
"description" : "Description to help match the field"
}
}
The description helps the API match your field key to the actual form field, especially when field names in the PDF don’t match your data structure.
Example with Multiple Field Types
options = FormFillingOptions(
field_data = {
# Text fields
"name" : { "value" : "Jane Smith" , "description" : "Full name" },
"email" : { "value" : "jane@example.com" , "description" : "Email address" },
# Date fields
"date" : { "value" : "2024-01-15" , "description" : "Today's date" },
# Numeric fields
"amount" : { "value" : "1500.00" , "description" : "Total amount" },
# Checkbox (use descriptive value)
"agree_terms" : { "value" : "Yes" , "description" : "Agreement checkbox" },
# Signature (text is rendered)
"signature" : { "value" : "Jane Smith" , "description" : "Signature field" },
},
context = "This is an employment application form"
)
Using Context
The context parameter provides additional information to improve field matching:
options = FormFillingOptions(
field_data = {
"ssn" : { "value" : "123-45-6789" , "description" : "Social Security Number" },
"employer" : { "value" : "Acme Corp" , "description" : "Current employer name" },
},
context = "W-4 tax withholding form for new employee onboarding"
)
Confidence Threshold
Adjust confidence_threshold to control field matching strictness:
options = FormFillingOptions(
field_data = { ... },
confidence_threshold = 0.7 , # Higher = more strict matching
)
Lower values (0.3-0.5) : More fields matched, but may have incorrect matches
Higher values (0.7-0.9) : Fewer fields matched, but more accurate
result = client.fill( "form.pdf" , options = options)
# Check results
print (result.success) # True if filling succeeded
print (result.status) # "complete" when done
print (result.output_format) # "pdf" or "png"
print (result.fields_filled) # List of successfully filled fields
print (result.fields_not_found) # List of fields that couldn't be matched
print (result.page_count) # Number of pages processed
print (result.cost_breakdown) # Cost details
Result Fields
Field Type Description successbool Whether form filling succeeded statusstr Processing status output_formatstr Output type: "pdf" or "png" output_base64str Base64-encoded filled form fields_filledlist Field names that were successfully filled fields_not_foundlist Field names that couldn’t be matched page_countint Number of pages processed runtimefloat Processing time in seconds cost_breakdowndict Cost details
# Save to file
result.save_output( "filled_form.pdf" )
# Or access the raw base64 data
import base64
pdf_bytes = base64.b64decode(result.output_base64)
with open ( "filled.pdf" , "wb" ) as f:
f.write(pdf_bytes)
The API also works with image-based forms (PNG, JPG, etc.):
result = client.fill( "scanned_form.png" , options = options)
result.save_output( "filled_form.png" ) # Returns filled image
For images, the output is a PNG with the field values rendered onto the image.
From URL
Fill a form from a URL:
result = client.fill(
file_url = "https://example.com/form.pdf" ,
options = options
)
Async Usage
import asyncio
from datalab_sdk import AsyncDatalabClient, FormFillingOptions
async def fill_form ():
async with AsyncDatalabClient() as client:
options = FormFillingOptions(
field_data = {
"name" : { "value" : "John Doe" , "description" : "Full name" },
}
)
result = await client.fill( "form.pdf" , options = options)
result.save_output( "filled.pdf" )
asyncio.run(fill_form())
Handling Unmatched Fields
Check which fields couldn’t be matched:
result = client.fill( "form.pdf" , options = options)
if result.fields_not_found:
print ( "These fields couldn't be matched:" )
for field in result.fields_not_found:
print ( f " - { field } " )
# Consider adjusting descriptions or lowering confidence threshold
from datalab_sdk import DatalabClient, FormFillingOptions
client = DatalabClient()
options = FormFillingOptions(
field_data = {
"first_name" : { "value" : "John" , "description" : "First name" },
"last_name" : { "value" : "Doe" , "description" : "Last name" },
"ssn" : { "value" : "123-45-6789" , "description" : "Social Security Number" },
"address" : { "value" : "123 Main Street" , "description" : "Street address" },
"city" : { "value" : "Springfield" , "description" : "City" },
"state" : { "value" : "IL" , "description" : "State abbreviation" },
"zip" : { "value" : "62701" , "description" : "ZIP code" },
"filing_status" : { "value" : "Single" , "description" : "Filing status" },
"signature" : { "value" : "John Doe" , "description" : "Taxpayer signature" },
"date" : { "value" : "2024-04-15" , "description" : "Date signed" },
},
context = "IRS W-4 Employee's Withholding Certificate"
)
result = client.fill( "w4_form.pdf" , options = options)
print ( f "Filled { len (result.fields_filled) } fields" )
print ( f "Unmatched: { result.fields_not_found } " )
result.save_output( "w4_filled.pdf" )
Next Steps
Form Filling Recipe Detailed guide on form filling with field matching and templates.
File Management Upload, list, and manage files in Datalab storage.
Conversion SDK Convert documents to Markdown, HTML, JSON, or chunks.
Pipelines Chain processors into versioned, reusable pipelines.