How extraction works
When you create a table, Lasso processes your source files through an AI pipeline:- Document parsing — PDFs, spreadsheets, and images are parsed to extract raw text and visual elements.
- Schema mapping — The AI maps the raw content to your schema’s column definitions.
- Row generation — Each detected product or item becomes a row with values for each column.
- Validation — Extracted values are validated against column types (numbers, URLs, emails, etc.).
Source types
You can provide data to extract from in three ways:| Source | Description |
|---|---|
| Files | Upload PDFs, spreadsheets, images, or documents via the Files API. |
| URLs | Pass publicly accessible URLs directly when creating a table. |
| Text | Provide raw text content for extraction. |
Table lifecycle
A table moves through these statuses:- queued — The job is waiting to be picked up.
- processing — Extraction is actively running. The
progressfield (0-100) tracks completion. - completed — All rows have been extracted and are ready to query.
- failed — Something went wrong. Check
error_messagefor details.
Polling vs webhooks
You have two options to know when extraction finishes:- Polling — Use
client.tables.waitForCompletion()(SDK) or pollGET /v1/tables/{id}until status iscompleted. - Webhooks — Pass a
webhook_urlwhen creating the table. Lasso sends an HTTP POST when processing completes.
Working with rows
Once a table is completed, each extracted item is a row. Rows contain:- data — A key-value object matching your schema columns.
- validation_status — Whether the row passed type validation.
- enhancement_status — Per-column status of any AI enhancements.
- is_edited — Whether the row was manually modified via the API.

