Automatic

CSV File Splitting

Handle massive spreadsheets with automatic row-by-row splitting for perfect search results

What is CSV File Splitting?

When you upload large CSV or Excel files, George AI automatically splits them into individual markdown filesโ€”one per row. This ensures that when you search, you find exactly the row you're looking for, not an entire massive file.

No configuration needed. Upload a 100,000-row product catalog, and George AI handles everything automatically.

Maximum Tested
732K Rows
Largest successfully processed file
Memory Usage
~1 KB
Constant, regardless of file size
Search Precision
1 Row
Each search result = 1 record

How It Works

  • Upload CSV
    Any size file
  • Auto-Split
    One file per row
  • Embed
    Vector search enabled
  • Search
    Find exact rows

Example: Product Catalog (10,000 Products)

Input:

SKU,Name,Price,Stock
P-001,Widget A,29.99,150
P-002,Widget B,39.99,75
...
P-10000,Widget Z,19.99,200

Output (File Structure):

products.md (summary)
parts/
  0/ (rows 1-100)
    1.md, 2.md, ..., 100.md
  1/ (rows 101-200)
  2/ (rows 201-300)
  ...
Bucketed Storage: Files organized into folders of 100 for efficient access

Benefits

Semantic Search Precision

Each row becomes one semantic chunk. When you search "red t-shirt size M", you get that exact product rowโ€”not a 50,000-row file.

Memory Efficiency

Streaming architecture processes files with constant ~1KB memory usage, regardless of file size. Handle 700K+ rows without performance degradation.

Fast Pagination

Bucketed storage (100 files per directory) enables fast UI navigation. Metadata caching makes browsing split files instant.

Enrichment-Ready

Each row is a list item. Add enrichment fields to extract additional data (e.g., "Product Category" from description). Perfect for data cleaning.

Viewing Split Files

Markdown File Selector

After upload, navigate to the file in your library:

  1. Open the library containing your CSV file
  2. Click on the file name
  3. Use the Markdown File Selector dropdown to choose which row to view
  4. Dropdown shows: "Summary", "Row 1", "Row 2", etc.
Pagination Controls

For files with many rows, pagination controls appear automatically:

  • Navigate between rows using previous/next buttons
  • Jump to specific row numbers
  • View summary file to see total row count and column names

Configuration

Automatic - No configuration needed!

CSV file splitting is enabled by default for all libraries. Just upload and go.

Advanced: Library Settings

For power users, the setting is controlled in library configuration:

Setting Default Value Description
splitByCsvRows
Enabled
Automatically split CSV/Excel files by rows

Technical Details

File Storage Structure
Storage Layout:
/storage/libraries/{libraryId}/files/{fileId}/
  main.md                  # Summary file
  parts/0/1.md             # Row 1
  parts/0/2.md             # Row 2
  ...
  parts/0/100.md           # Row 100
  parts/1/101.md           # Row 101 (new bucket)
  ...
Markdown Format per Row

Each row becomes a structured markdown file:

# Row 1
**SKU:** P-001
**Name:** Widget A
**Price:** 29.99
**Stock:** 150
Embedding Strategy
  • One Chunk per Row: Each markdown file = one semantic chunk
  • Batch Processing: Embeddings generated in parallel batches for speed
  • Part Number Tracking: Each embedding stores its row number for retrieval
  • Summary Embedding: Main file (column headers + stats) also embedded

Common Use Cases

Product Catalogs

Upload supplier product lists (50,000+ products). Search finds exact SKUs. Enrich to extract missing data (category, brand). Export to e-commerce platform via automations.

Inventory Lists

Process warehouse inventory spreadsheets. Search by location, product code, or description. Track stock levels across multiple warehouses.

Customer & Contact Lists

Import CRM exports. Search by name, company, or email. Enrich with additional data from web APIs. Clean and deduplicate records.

Transaction & Order Logs

Process order history CSVs (100K+ transactions). Search by order number, customer, or date. Analyze patterns with AI enrichments.

Related Documentation

Learn more about working with files and data:

George-Cloud