Vector Databases are not the only solution

Yingjun Wu

Yingjun Wu, in this terrific article, advises against investing in vector databases, especially for those looking to enter the field in mid-2023.

He outlines several reasons for this, focusing on the technology, applications, and market landscape of vector databases. Here are the key points:

Vector databases are designed to store and process unstructured data (like images, audio, text) by converting them into vector features using machine learning algorithms.
These databases use data indexing techniques (like inverted indexing and vector quantization) for efficient similarity searches and to reduce storage and computational requirements.
Existing OLAP databases with columnar storage (like ClickHouse, Apache Pinot, and Apache Druid) already demonstrate impressive data compression rates and can integrate vector search functionalities.

The rise of vector databases is linked to the need for managing the vast amounts of data used by large-scale generative AI models.
They enable accurate similarity searches and support multimodal data processing, crucial for AI applications.

Market Saturation and Advice Against Investment

Wu advises against new investments in vector databases due to market saturation, as many products already exist in this space.
For companies with heavy workloads requiring advanced vector search, specialized vector databases are recommended. However, for most other use cases, existing commercial databases or open-source databases like PostgreSQL (with pgvector functionality) are sufficient.

Many commercial databases are enhancing their capabilities by incorporating vector search functionalities.
PostgreSQL and other open-source databases like OpenSearch, ClickHouse, and Cassandra have implemented vector search features, reducing the need for specialised vector databases.

The vector database market is already crowded with established players, making it challenging for new entrants.
Wu suggests focusing on enhancing existing databases with vector capabilities rather than investing in new vector database projects.

Last updated 1 year ago

Was this helpful?