Vast Database
The VAST DataBase is an all-flash transactional and analytical system designed for the AI era.
It simplifies data engineering by eliminating tiers of data management and combining the transactional properties of a relational database with the schema of a data warehouse.
The DataBase breaks the tradeoffs between row-based OLTP databases and columnar-based analytical queries, offering the scale and affordability of a data lake and the capabilities of a data lakehouse.
The key concepts and insights are as follows:
Semantic Layer
The VAST database serves as a critical component of the VAST data platform by providing a semantic or contextual layer.
It enables the application of semantic understanding to the data in both transactional and analytical contexts, allowing for real-time data processing and analysis.
Design Principles
The VAST database is built on three key design principles: embracing standards (e.g., SQL, Apache Arrow), simplifying data management by eliminating complex tiers, and enabling deployment flexibility across public and private clouds.
Transactional and Analytical Capabilities
VAST challenges the traditional separation between transactional databases (OLTP) and analytical databases (OLAP).
Typically, OLTP systems are optimised for fast, small transactions, while OLAP systems are designed for large-scale data analysis.
VAST's aim is to create a unified system that can efficiently handle both types of workloads. This convergence is a significant shift from the norm, where separate systems handle transactional and analytical needs.
The result is that the VAST database combines transactional and analytical capabilities into a single system. It acts as a transactional system for real-time data ingestion and cataloguing while also providing analytical capabilities for quick understanding and querying of the data within the system.
Simplifying Data Engineering
The VAST database aims to simplify data engineering by eliminating constraints associated with managing semi-structured data objects in data science contexts.
Overcoming Limitations of Traditional Systems
Traditional database management systems often consist of separate systems for online transactional processing (OLTP) and online analytical processing (OLAP).
The VAST database addresses the limitations of these systems by providing a scalable solution that combines both transactional and analytical capabilities.
Disaggregated and Shared Everything (DASE) Architecture
The VAST database is built on top of the DASE architecture, which abstracts the cluster architecture and creates a data centre-scale computer.
All logic runs in stateless Linux containers, with data presented in parallel across a high-speed, low-latency data centre fabric. This architecture eliminates the need for machine coordination in the read/write path, which is a common bottleneck in traditional systems.
This architecture allows for high transactional services and scalability in terms of capacity and performance.
Write Buffer and Data Transformation: The VAST database uses a write buffer in storage class memory to absorb incoming data and provide time for data manipulation before storing it in low-cost flash storage. During this process, data is transformed from standard database record form into a columnar data format optimised for analytics.
Columnar Object and Metadata Store: The VAST database introduces a new style of columnar object that is significantly smaller than standard parquet row groups. These columnar objects are organised using a metadata store that enables efficient querying and data access.
SQL and Query Engine Integration: The VAST database supports native SQL querying and integrates with popular query engines like Apache Spark, Trino, and Dremio through push-down plugins.
Scalability and Performance: The VAST database is designed to scale linearly in terms of performance by adding CPUs, GPUs, and SSDs to the system. It can achieve high transactional throughput and sustained streaming performance for queries.
Similarity-Based Data Reduction: The VAST database employs a novel data reduction technique called similarity, which acts as a modern form of global compression. It identifies similar blocks of data across the entire data store and compresses them together, resulting in significant data reduction.
Cost Efficiency: By leveraging low-cost flash storage and efficient data reduction techniques, the VAST database offers a cost-effective solution compared to other flash-based data platforms.
Comprehensive Data Science Support: The VAST database provides a unified system that supports both structured data (databases and data warehouses) and unstructured data (file and object formats) commonly used in deep learning pipelines.
In summary, the VAST database introduces a transformative approach to database management by combining transactional and analytical capabilities, leveraging the DASE architecture, and employing innovative data reduction techniques.
It simplifies data engineering, enables real-time processing and analysis, and provides a scalable and cost-effective solution for modern data science and deep learning workloads.
Last updated