Files

Understanding the file lifecycle from upload to searchable content

What are Files?

Files are the core content in George AI. Each file belongs to a Library and goes through automated processing to extract text, generate embeddings, and make content searchable.

Files can be added manually via upload or automatically through Crawlers that collect documents from external sources like SharePoint, file shares, or email.

Manual Upload

Upload files directly through the web interface into a Library

Automated Crawling

Configure Crawlers to automatically collect files from external systems

File Lifecycle

Every file in George AI goes through a processing pipeline to make it searchable and usable for AI assistants:

  • Upload or Crawl

    File is added to the Library (manually uploaded or collected by a Crawler)



  • Validation

    File format and integrity are checked



  • Extraction

    Text and images are extracted from the document (supports PDF, Office docs, images with OCR, etc.)



  • Embedding

    Text is split into chunks and converted to vector embeddings for semantic search



  • Completed

    File is now searchable and available for AI assistants

Processing Can Fail

Files can fail at Validation (unsupported format), Extraction (corrupted file), or Embedding (timeout). You can retry processing via the file menu.

File Processing Status

Files have three status indicators that track their progress:

Status Type Values Description
Processing Status
none
pending
validating
extracting
embedding
completed
failed
Overall processing state through the entire pipeline
Extraction Status
none
pending
running
completed
failed
Text and image extraction stage
Embedding Status
none
pending
running
completed
failed
Vector embedding generation stage

Status Badges in the UI

Extraction 2025-01-15
Embedding 2025-01-15
Unsupported Format
Legacy File

These badges appear in the file list and indicate processing completion times or errors.

File Metadata

Each file stores metadata that can be used for filtering, sorting, and enrichment:

Property Description Source
name File name with extension From upload or crawler
mimeType File type (e.g., application/pdf, image/png) Detected automatically
size File size in bytes Actual file size
originUri Original location (file path, SharePoint URL, etc.) From upload or crawler
originModificationDate When the file was last modified at its source From file system or crawler
uploadedAt When the file was added to George AI Set at creation time
createdAt When the file record was created in the database Set at creation time
archivedAt When the file was archived (if applicable) Set when file is archived
taskCount Number of processing tasks associated with this file Counted from processing queue
chunksCount Number of vector embedding chunks generated From embedding process

Using Metadata in Lists

You can create List fields with sourceType: file_property to display file metadata (name, size, modified date, source) without AI processing.

File Actions

You can perform several actions on files through the file menu:

Reprocess (Re-extract)

Triggers a new extraction task to re-extract text and images from the file

Use when:

  • Extraction failed or timed out
  • Library extraction settings changed (e.g., updated OCR prompt)
  • File content was updated at the source

Re-embed

Triggers a new embedding task to regenerate vector embeddings

Use when:

  • Embedding failed or timed out
  • Library embedding model changed
  • Extraction was re-run with new content

View Info

Shows detailed file metadata and processing information

Displays:

  • File size and format
  • Processing status
  • Number of chunks generated
  • Number of processing tasks
  • Crawler source (if applicable)
  • Origin modification date

View Extraction

Shows the extracted markdown content from the file

Use for:

  • Verifying extraction quality
  • Debugging enrichment issues
  • Understanding what content AI assistants see

Supported File Types

George AI supports a wide range of file formats for automatic text extraction:

Documents

  • • PDF (.pdf)
  • • Word (.docx, .doc)
  • • PowerPoint (.pptx, .ppt)
  • • Excel (.xlsx, .xls)
  • • Text (.txt, .md, .csv)
  • • HTML (.html, .htm)

Images (with OCR)

  • • JPEG (.jpg, .jpeg)
  • • PNG (.png)
  • • TIFF (.tiff, .tif)
  • • BMP (.bmp)
  • • GIF (.gif)

Videos

  • • MP4 (.mp4)
  • • WebM (.webm)
  • • AVI (.avi)
  • • MOV (.mov)
  • • MKV (.mkv)

Audio transcription and visual content extraction

Archives

  • • ZIP (.zip)
  • • 7-Zip (.7z)
  • • TAR (.tar, .tar.gz)

Archives are extracted and files inside are processed individually

Unsupported Formats

If a file format is not supported, it will be marked with "Unsupported Format" badge and no extraction will be performed. The file metadata is still stored and searchable.

Need Additional Format Support?

We can add support for any file format for paying customers. Alternatively, you can build automation workflows to transform files with your own automation and ingest them into George AI. Contact us to discuss your specific requirements.

Optional Feature

Google Drive Upload (Optional)

For users who store documents in Google Drive, you can upload files directly with a modern file picker that includes search, batch selection, and automatic Google Docs conversion to PDF.

Features & Capabilities

Search Across All Files

Search your entire Google Drive by file name with instant results.

Batch Selection

Select multiple files at once with checkboxes.

Folder Navigation

Browse your Drive with folder navigation and breadcrumbs.

Auto PDF Conversion

Google Docs, Sheets, and Slides automatically converted to PDF.

How to Upload from Google Drive
  1. Open Library → Navigate to the Library where you want to upload files
  2. Click "Upload from Google Drive" button in the Files section
  3. Sign in to Google (first time only) → Grant George AI read-only access to your Drive
  4. Browse or Search → Navigate folders or use search to find files
  5. Select Files → Check the boxes next to files you want to upload
  6. Click "Upload" → Selected files are downloaded and processed automatically

View Modes

Switch between list view (detailed) and grid view (visual icons) using the toggle buttons.

Automatic PDF Conversion

When uploading Google Workspace files, George AI automatically exports them as PDF to ensure compatibility:

Google File Type Converted To Result
Google Docs PDF Text, formatting, images preserved
Google Sheets PDF Tables, charts, formatting preserved
Google Slides PDF Slides, images, layouts preserved
Other files (PDF, JPG, etc.) No conversion Downloaded as-is in original format
Automatic Detection: The system automatically detects file type and chooses the right conversion method—no configuration needed.
Metadata Preservation

Files uploaded from Google Drive preserve important metadata:

  • File name: Original Google Drive file name (with .pdf extension added if converted)
  • Modified date: Last modification date from Google Drive
  • Origin URI: Link back to the original file in Google Drive (e.g., https://drive.google.com/file/d/...)
  • File size: Actual file size from Google Drive

Traceability

The Origin URI allows you to trace back to the source file in Google Drive for verification or updates.

Pagination & Large Drives

The Google Drive picker handles large drives efficiently:

  • 50 files per page: Fast loading even with thousands of files
  • Forward/backward navigation: Browse through pages with previous/next buttons
  • Search resets pagination: Searching automatically returns to first page of results
  • Folder navigation resets pagination: Entering a folder starts from page 1

One-Time Authentication

You only need to sign in to Google Drive once per browser. Your access token is stored securely in your browser's local storage and expires after 1 hour. Re-authenticate when needed.

Related Topics

Learn more about how files are processed and used:

George-Cloud