Files
Understanding the file lifecycle from upload to searchable content
What are Files?
Files are the core content in George AI. Each file belongs to a Library and goes through automated processing to extract text, generate embeddings, and make content searchable.
Files can be added manually via upload or automatically through Crawlers that collect documents from external sources like SharePoint, file shares, or email.
Manual Upload
Upload files directly through the web interface into a Library
Automated Crawling
Configure Crawlers to automatically collect files from external systems
File Lifecycle
Every file in George AI goes through a processing pipeline to make it searchable and usable for AI assistants:
Processing Can Fail
Files can fail at Validation (unsupported format), Extraction (corrupted file), or Embedding (timeout). You can retry processing via the file menu.
File Processing Status
Files have three status indicators that track their progress:
| Status Type | Values | Description |
|---|---|---|
| Processing Status | none pending validating extracting embedding completed failed | Overall processing state through the entire pipeline |
| Extraction Status | none pending running completed failed | Text and image extraction stage |
| Embedding Status | none pending running completed failed | Vector embedding generation stage |
Status Badges in the UI
These badges appear in the file list and indicate processing completion times or errors.
File Metadata
Each file stores metadata that can be used for filtering, sorting, and enrichment:
| Property | Description | Source |
|---|---|---|
name | File name with extension | From upload or crawler |
mimeType | File type (e.g., application/pdf, image/png) | Detected automatically |
size | File size in bytes | Actual file size |
originUri | Original location (file path, SharePoint URL, etc.) | From upload or crawler |
originModificationDate | When the file was last modified at its source | From file system or crawler |
uploadedAt | When the file was added to George AI | Set at creation time |
createdAt | When the file record was created in the database | Set at creation time |
archivedAt | When the file was archived (if applicable) | Set when file is archived |
taskCount | Number of processing tasks associated with this file | Counted from processing queue |
chunksCount | Number of vector embedding chunks generated | From embedding process |
Using Metadata in Lists
You can create List fields with sourceType: file_property to display file metadata (name, size, modified date, source) without AI processing.
File Actions
You can perform several actions on files through the file menu:
Reprocess (Re-extract)
Triggers a new extraction task to re-extract text and images from the file
Use when:
- Extraction failed or timed out
- Library extraction settings changed (e.g., updated OCR prompt)
- File content was updated at the source
Re-embed
Triggers a new embedding task to regenerate vector embeddings
Use when:
- Embedding failed or timed out
- Library embedding model changed
- Extraction was re-run with new content
View Info
Shows detailed file metadata and processing information
Displays:
- File size and format
- Processing status
- Number of chunks generated
- Number of processing tasks
- Crawler source (if applicable)
- Origin modification date
View Extraction
Shows the extracted markdown content from the file
Use for:
- Verifying extraction quality
- Debugging enrichment issues
- Understanding what content AI assistants see
Supported File Types
George AI supports a wide range of file formats for automatic text extraction:
Documents
- • PDF (.pdf)
- • Word (.docx, .doc)
- • PowerPoint (.pptx, .ppt)
- • Excel (.xlsx, .xls)
- • Text (.txt, .md, .csv)
- • HTML (.html, .htm)
Images (with OCR)
- • JPEG (.jpg, .jpeg)
- • PNG (.png)
- • TIFF (.tiff, .tif)
- • BMP (.bmp)
- • GIF (.gif)
Videos
- • MP4 (.mp4)
- • WebM (.webm)
- • AVI (.avi)
- • MOV (.mov)
- • MKV (.mkv)
Audio transcription and visual content extraction
Archives
- • ZIP (.zip)
- • 7-Zip (.7z)
- • TAR (.tar, .tar.gz)
Archives are extracted and files inside are processed individually
Unsupported Formats
If a file format is not supported, it will be marked with "Unsupported Format" badge and no extraction will be performed. The file metadata is still stored and searchable.
Need Additional Format Support?
We can add support for any file format for paying customers. Alternatively, you can build automation workflows to transform files with your own automation and ingest them into George AI. Contact us to discuss your specific requirements.
Google Drive Upload (Optional)
For users who store documents in Google Drive, you can upload files directly with a modern file picker that includes search, batch selection, and automatic Google Docs conversion to PDF.
Search Across All Files
Search your entire Google Drive by file name with instant results.
Batch Selection
Select multiple files at once with checkboxes.
Folder Navigation
Browse your Drive with folder navigation and breadcrumbs.
Auto PDF Conversion
Google Docs, Sheets, and Slides automatically converted to PDF.
- Open Library → Navigate to the Library where you want to upload files
- Click "Upload from Google Drive" button in the Files section
- Sign in to Google (first time only) → Grant George AI read-only access to your Drive
- Browse or Search → Navigate folders or use search to find files
- Select Files → Check the boxes next to files you want to upload
- Click "Upload" → Selected files are downloaded and processed automatically
View Modes
Switch between list view (detailed) and grid view (visual icons) using the toggle buttons.
When uploading Google Workspace files, George AI automatically exports them as PDF to ensure compatibility:
| Google File Type | Converted To | Result |
|---|---|---|
| Google Docs | Text, formatting, images preserved | |
| Google Sheets | Tables, charts, formatting preserved | |
| Google Slides | Slides, images, layouts preserved | |
| Other files (PDF, JPG, etc.) | No conversion | Downloaded as-is in original format |
Files uploaded from Google Drive preserve important metadata:
- File name: Original Google Drive file name (with .pdf extension added if converted)
- Modified date: Last modification date from Google Drive
- Origin URI: Link back to the original file in Google Drive (e.g.,
https://drive.google.com/file/d/...) - File size: Actual file size from Google Drive
Traceability
The Origin URI allows you to trace back to the source file in Google Drive for verification or updates.
The Google Drive picker handles large drives efficiently:
- 50 files per page: Fast loading even with thousands of files
- Forward/backward navigation: Browse through pages with previous/next buttons
- Search resets pagination: Searching automatically returns to first page of results
- Folder navigation resets pagination: Entering a folder starts from page 1
One-Time Authentication
Related Topics
Learn more about how files are processed and used: