Spreadsheets & data

Index CSV and Excel files, find them by meaning, and let your AI assistant explore the numbers.

Orkai can treat your spreadsheets as searchable datasets — not just flat files. Index a folder of CSV or Excel exports, then find the right file by asking (“where is quarterly sales by region?”) and let your AI assistant answer questions about the numbers without you opening every file by hand.

This is built for analysts, researchers, ops teams, and anyone who lives in spreadsheets — alongside developers who already use Orkai for code.

Getting started

  1. Keep the daemon running (Daemon).
  2. In your project folder, run orkai init — spreadsheet indexing is already enabled in .orkai.yaml (Projects & indexing).
  3. Run orkai index . (default all mode) or orkai index analytics . — CSV and Excel files become searchable datasets.

By default, Orkai looks for .csv, .xlsx, and .parquet files anywhere in the project. To limit indexing to specific folders, edit the paths list:

index:
  analytics:
    enabled: true
    paths: [data/, reports/]
    extensions: [.csv, .xlsx]

Set enabled: false if you prefer CSV files to be indexed as plain text instead. If a file cannot be read as a proper table (missing headers, empty sheet, and so on), Orkai tells you what went wrong — it does not fail silently.

Find the right dataset

Orkai remembers each dataset by what it contains — column names, types, row counts — so you can search by meaning rather than filename:

orkai search "customer churn by month"
orkai search "inventory levels warehouse"

In Cursor or another MCP-connected assistant, ask in plain language: “Search my Orkai project for the dataset with regional sales totals.” The assistant can list matching datasets and show you the column layout before running any numbers.

Ask questions about the data

Once a dataset is found, your assistant can explore it — filter rows, group by category, sum columns, compare periods — and return answers as a table you can read or paste into a report.

Examples of what you might ask:

  • “What were total sales by region last quarter?”
  • “How many rows have a blank email field?”
  • “Show the top 10 products by revenue.”

You stay in conversation; the assistant handles the lookup. Large results are returned in manageable chunks so responses stay readable.

Working with spreadsheet folders

Point paths at the directory you actually use — exports, reports, downloads — and use exclude for raw or scratch subfolders you do not want indexed:

analytics:
  enabled: true
  paths: [reports/, exports/]
  exclude: [reports/archive/]

Re-run orkai index . or orkai index analytics . after files change so Orkai picks up new exports. See Projects & indexing for --reset and other index options.

Tables your assistant can build

You do not need a file on disk for every dataset. Your AI assistant can create a table from rows you provide — useful for tracking expenses, pipeline stages, or survey results — and update it over time (add rows, replace contents, remove matching rows).

Assistant-created tables stay in your project even when you re-index files from disk, so hand-maintained data is not wiped when you refresh a folder of exports.

For large or unfamiliar folders, you can ask Orkai to write a short plain-language summary of each dataset when it is indexed — so search finds “monthly subscription revenue” even when the filename is export_final_v3.csv.

analytics:
  enabled: true
  paths: [data/]
  enrich_description: true

This uses the same chat provider as Review (orkai review setup). If chat is not configured, indexing still works; you simply get the standard column-based description instead.

Combine related spreadsheets

When revenue lives in one export and customer details in another, your assistant can join them in one query after both are indexed — no manual merge in Excel first.

  1. Search for each dataset and note the entity IDs.
  2. Run schema on each to see join columns (e.g. customer_id).
  3. Query with a datasets map that gives each file a short alias, then write SQL using those aliases.
analytics(action: "search", query: "subscription revenue")
analytics(action: "schema", id: "revenue-entity-id")
analytics(action: "schema", id: "customers-entity-id")
analytics(action: "query",
  datasets: { revenue: "revenue-entity-id", customers: "customers-entity-id" },
  sql: "SELECT c.industry, SUM(r.net_revenue_usd) AS revenue FROM revenue r JOIN customers c ON r.customer_id = c.customer_id GROUP BY 1")

Single-file questions still work the same way — pass id and reference the table as dataset in SQL.

Validate spreadsheet structure

Need to check that exports follow a fixed layout (required columns, naming rules, no empty headers)? Use Review in project mode on your CSV folder — same standards-based checking as code, pointed at data files.