codewithkyrian / chromadb-php
A PHP client for the Chroma Open Source Embedding Database
Installs: 160 022
Dependents: 11
Suggesters: 4
Security: 0
Stars: 74
Watchers: 4
Forks: 14
Open Issues: 1
pkg:composer/codewithkyrian/chromadb-php
Requires
- php: ^8.1
- php-http/discovery: ^1.20
- psr/http-client: ^1.0
- psr/http-factory: ^1.1
Requires (Dev)
- guzzlehttp/guzzle: ^7.10
- mockery/mockery: ^1.6
- pestphp/pest: ^2.19
- symfony/process: ^6.4 || ^7.3 || ^8.0
- symfony/var-dumper: ^6.3
This package is auto-updated.
Last update: 2025-12-06 20:25:02 UTC
README
A customized, framework-agnostic PHP library for interacting with Chroma vector database seamlessly.
Note: This package is framework-agnostic. If you use Laravel, check out chromadb-laravel for a tailored experience.
Introduction
Chroma is an open-source vector database designed to be fast, scalable, and reliable. ChromaDB PHP allows you to interact with Chroma servers seamlessly. It provides a fluent, type-safe API for managing collections, documents, and embeddings, making it easy to build LLM-powered applications in PHP.
Requirements
- PHP 8.1 or higher
- ChromaDB 1.0 or higher
Installation
composer require codewithkyrian/chromadb-php
Configuration & Setup
Running ChromaDB
You need a running ChromaDB instance.
Docker (Recommended):
docker run -p 8000:8000 chromadb/chroma
Chroma CLI:
chroma run --path /path/to/data
Connectivity
Connect to your Chroma server. The default connection is http://localhost:8000.
use Codewithkyrian\ChromaDB\ChromaDB; // Basic Connection $client = ChromaDB::local()->connect(); // Custom Host/Port $client = ChromaDB::local() ->withHost('http://your-server-ip') ->withPort(8000) ->withTenant('my-tenant') ->withDatabase('production_db') ->connect(); // Chroma Cloud / Authentication $client = ChromaDB::cloud('your-api-key') ->withTenant('tenant-id') ->connect();
Embedding Functions
ChromaDB uses embedding functions to convert text into vectors. You can define which function a collection uses upon creation.
Embedding functions are linked to a collection and used when you call add, update, upsert or query. If you add documents without embeddings, it is used to generate them automatically. If you query using text, it is used to convert your query text into a vector for search.
The library provides lightweight wrappers around popular embedding providers for ease of use:
OpenAIEmbeddingFunctionJinaEmbeddingFunctionHuggingFaceEmbeddingServerFunctionOllamaEmbeddingFunctionMistralAIEmbeddingFunction
Example:
use Codewithkyrian\ChromaDB\Embeddings\OpenAIEmbeddingFunction; $ef = new OpenAIEmbeddingFunction('your-openai-api-key'); $collection = $client->createCollection( name: 'knowledge-base', embeddingFunction: $ef );
Custom Functions
You can create your own embedding function by implementing Codewithkyrian\ChromaDB\Embeddings\EmbeddingFunction.
use Codewithkyrian\ChromaDB\Embeddings\EmbeddingFunction; $ef = new class implements EmbeddingFunction { public function generate(array $texts): array { // Call your model API here and return float[][] return [[0.1, 0.2, ...], ...]; } };
Collections
Collections are where you store and categorize your embeddings and documents. All operations are performed on a specific collection.
// Create (throws if exists) $collection = $client->createCollection('my-collection', $ef); // Get (throws if missing) $collection = $client->getCollection('my-collection'); // Get or Create = $collection = $client->getOrCreateCollection('my-collection', $ef); // Fork (creates a copy of an existing collection) // Note: Forking is only supported for Chroma Cloud, not local Chroma instances $forkedCollection = $client->forkCollection('my-collection', 'my-collection-fork', $ef); // Delete $client->deleteCollection('my-collection');
Adding Data
You can add items to a collection using the structured Record class or raw arrays. Both methods represent the same data:
- IDs (Required): Unique string identifier.
- Embeddings: Vector representation (float array).
- Documents: Raw text content.
- Metadatas: Key-value pairs for filtering.
Using Arrays
You can pass a parallel arrays of IDs, embeddings, metadatas, etc. This is useful for bulk operations.
$collection->add( ids: ['id1', 'id2'], documents: ['This is a document about PHP.', 'ChromaDB is great for AI.'], embeddings: [[0.1, 0.2, 0.3], [0.9, 0.8, 0.7]], metadatas: [ ['category' => 'development', 'author' => 'Kyrian'], ['category' => 'ai', 'is_published' => true] ] );
Using Records (Fluent API)
The Record class provides a fluent interface for building items. It mirrors the array structure but in an object-oriented way.
use Codewithkyrian\ChromaDB\Types\Record; $collection->add([ // Fluent Factory style Record::make('id4') ->withDocument('This is a document about PHP.') ->withEmbedding([0.1, 0.2, 0.3]) ->withMetadata(['category' => 'development', 'author' => 'Kyrian']), // Constructor style new Record( id: 'id7', document: 'ChromaDB is great for AI.', embedding: [0.9, 0.8, 0.7], metadata: ['category' => 'ai', 'is_published' => true] ), ]);
If you provide documents but omit embeddings, Chroma uses the collection's Embedding Function to generate them. This is useful if you have an external embedding function or if you want to manually control the embedding process. When providing just embeddings and not documents, it's assumed you're storing the documents elsewhere and associating the provided embeddings with those documents using the ids or any other metadata.
If the supplied embeddings are not the same dimension as the embeddings already indexed in the collection, an exception will be raised.
Retrieval (get and peek)
Retrieve specific items by ID or filtered metadata without generating embeddings.
Get
Fetch specific items.
use Codewithkyrian\ChromaDB\Types\Includes; // Fetch by ID $item = $collection->get(ids: ['id1']); // Fetch filtered items (Metadata Filter) $items = $collection->get( where: ['category' => 'php'], include: [Includes::Documents, Includes::Metadatas] ); // Fetch items as Record objects $records = $items->asRecords();
Peek
Preview the first n items in the collection.
$preview = $collection->peek(limit: 5);
Specifying Return Data (include)
Both get and query allow you to specify what data to return using the include parameter.
use Codewithkyrian\ChromaDB\Types\Includes; $collection->get( ids: ['id1'], include: [ Includes::Documents, // Return the document text Includes::Metadatas, // Return the metadata Includes::Embeddings // Return the vector ] );
Note:
Includes::Distancesis only available when Querying, not when usingget().
Querying (Vector Search)
Querying is about finding items semantically similar to your input. Chroma performs a vector search to find the nearest neighbors. ChromaDB-PHP also provides a powerful, fluent query builder for filtering by metadata and document content.
Query by Text
Provide text strings. Chroma embeds them using the collection's Embedding Function and finds the nearest neighbors.
$results = $collection->query( queryTexts: ['How do I use PHP with Chroma?'], nResults: 5 // Return top 5 matches ); // Get results as ScoredRecord objects // Returns ScoredRecord[][] (one array of results per query text) $records = $results->asRecords();
Query by Embeddings
Provide raw vectors. Useful if you compute embeddings externally.
$results = $collection->query( queryEmbeddings: [[0.1, 0.2, ...]], nResults: 5 );
Specifying Return Data (include)
By default, queries return IDs, Embeddings, Metadatas, and Distances. You can customize this using the Includes enum to optimize performance.
use Codewithkyrian\ChromaDB\Types\Includes; $collection->query( queryTexts: ['How do I use PHP with Chroma?'], nResults: 5, include: [ Includes::Documents, // Return the actual text content Includes::Distances // Return the similarity score ] );
Metadata Filtering (where)
You can filter search results based on metadata of the items. The library provides a fluent Builder for safety, but also supports raw arrays.
Supported Comparisons
// Equals Where::field('category')->eq('news'); ['category' => ['$eq' => 'news']]; // Not Equals Where::field('status')->ne('archived'); ['status' => ['$ne' => 'archived']]; // Greater Than Where::field('views')->gt(100); ['views' => ['$gt' => 100]]; // Less Than Where::field('rating')->lt(5); ['rating' => ['$lt' => 5]]; // Greater Than or Equal To Where::field('views')->gte(100); ['views' => ['$gte' => 100]]; // Less Than or Equal To Where::field('rating')->lte(5); ['rating' => ['$lte' => 5]]; // List inclusion Where::field('tag')->in(['php', 'laravel']); ['tag' => ['$in' => ['php', 'laravel']]]; // List exclusion Where::field('tag')->nin(['php', 'laravel']); ['tag' => ['$nin' => ['php', 'laravel']]]; // Logical AND Where::all( Where::field('category')->eq('code'), Where::field('language')->eq('php') ) ; ['$and' => [ ['category' => ['$eq' => 'code']], ['language' => ['$eq' => 'php']] ]] // Logical OR Where::any( Where::field('category')->eq('code'), Where::field('language')->eq('php') ) ; ['$or' => [ ['category' => ['$eq' => 'code']], ['language' => ['$eq' => 'php']] ]]
Usage
$collection->query( queryTexts: ['How do I use PHP with Chroma?'], nResults: 5, where: Where::field('category')->eq('code') ); $collection->query( queryTexts: ['How do I use PHP with Chroma?'], nResults: 5, where: ['category' => ['$eq' => 'code']] ); $collection->query( queryTexts: ['How do I use PHP with Chroma?'], nResults: 5, where: Where::all( Where::field('category')->eq('code'), Where::field('language')->eq('php') ) ); $collection->query( queryTexts: ['How do I use PHP with Chroma?'], nResults: 5, where: ['$and' => [ ['category' => ['$eq' => 'code']], ['language' => ['$eq' => 'php']] ]] );
Full Text Search (whereDocument)
Used to filter based on the text content of the document itself. This supports substring matching and Regex. You can also use the fluent builder or array syntax.
Supported Comparisons
// Substring (Contains) Where::document()->contains('search term') ['$contains' => 'search term'] // Substring (Not Contains) Where::document()->notContains('spam') ['$not_contains' => 'spam'] // Regex Matching Where::document()->matches('^PHP 8\.[0-9]+') ['$regex' => '^PHP 8\.[0-9]+'] Where::document()->notMatches('deprecated') ['$not_regex' => 'deprecated'] // Logical OR Where::any( Where::document()->contains('php'), Where::document()->contains('laravel') ) ['$or' => [ ['document' => ['$contains' => 'php']], ['document' => ['$contains' => 'laravel']] ]] // Logical AND Where::all( Where::document()->contains('php'), Where::document()->contains('laravel') ) ['$and' => [ ['document' => ['$contains' => 'php']], ['document' => ['$contains' => 'laravel']] ]]
Usage
$collection->query( queryTexts: ['How do I use PHP with Chroma?'], nResults: 5, whereDocument: Where::document()->contains('php') ); $collection->query( queryTexts: ['How do I use PHP with Chroma?'], nResults: 5, whereDocument: ['$contains' => 'php'] ); $collection->query( queryTexts: ['How do I use PHP with Chroma?'], nResults: 5, whereDocument: Where::any( Where::document()->contains('php'), Where::document()->contains('laravel') ) ); $collection->query( queryTexts: ['How do I use PHP with Chroma?'], nResults: 5, whereDocument: ['$or' => [ ['$contains' => 'php'], ['$contains' => 'laravel'] ]] );
Updating Data
Use update to modify existing items (fails if ID missing) or upsert to update-or-create. Just like adding, you can either pass an array of records, or a parallel array of IDs, documents, and metadatas.
// Update using Records $collection->update([ Record::make('id1')->withMetadata(['updated' => true]) ]); // Upsert using Arrays $collection->upsert( ids: ['id_new'], documents: ['New document content'], metadatas: [['created' => 'now']] );
Deleting Data
Delete by IDs or by filter.
// Delete specific items $collection->delete(['id1', 'id2']); // Delete all items matching a filter $collection->delete(where: Where::field('category')->eq('outdated')); // Delete all items matching a document content filter $collection->delete(whereDocument: Where::document()->contains('outdated'));
Examples
basic-usage- Simple example demonstrating basic operations: connecting, adding documents, and queryingdocument-chunking-cloud- Document chunking, embedding, and storage in Chroma Cloud with semantic search
Testing
Run the test suite using Pest.
composer test
License
MIT License. See LICENSE for more information.