Extract data

1. Create a schema

A schema defines the columns you want to extract.

schema = client.schemas.create(
    name="Product Catalog",
    columns=[
        {"key": "product_name", "label": "Product Name", "type": "text", "required": True},
        {"key": "price", "label": "Price", "type": "number"},
        {"key": "description", "label": "Description", "type": "text"},
        {"key": "image_url", "label": "Image", "type": "image"},
    ],
)

print(schema["id"])

You can also generate a schema automatically from sample data:

schema = client.schemas.generate(
    sample_data="Product: iPhone 15 Pro, Price: $999, Storage: 128GB",
    name="Smartphones",
)

2. Upload a file

Upload a PDF, spreadsheet, or image to extract data from.

uploaded = client.files.upload("/path/to/catalog.pdf")

print(uploaded["id"])  # "file_abc123"

3. Create a table

Start the extraction by creating a table with your schema and uploaded file.

table = client.tables.create(
    schema_id=schema["id"],
    name="Q1 Product Catalog",
    file_ids=[uploaded["id"]],
)

print(table["id"])      # "tbl_..."
print(table["status"])  # "processing"

4. Wait for results

The SDK includes a polling helper that waits for extraction to complete.

completed = client.tables.wait_for_completion(
    table["id"],
    interval_s=3,
    timeout_s=300,
)

print(completed["status"])      # "completed"
print(completed["total_rows"])  # 42

5. Retrieve rows

rows = client.tables.rows(completed["id"], limit=100)

for row in rows["data"]:
    print(row["data"]["product_name"], row["data"]["price"])

# Or as a pandas DataFrame
df = client.tables.results_as_dataframe(completed["id"])
print(df.head())

Enhance with AI

Use AI to generate descriptions, translate content, and enrich your data.

Enhance with AIUse AI to generate content, translate text, and enrich your extracted data.

​1. Create a schema

​2. Upload a file

​3. Create a table

​4. Wait for results

​5. Retrieve rows

​Next

Enhance with AI

1. Create a schema

2. Upload a file

3. Create a table

4. Wait for results

5. Retrieve rows

Next