Page cover image

Vast Database

The VAST DataBase is an all-flash transactional and analytical system designed for the AI era.

It simplifies data engineering by eliminating tiers of data management and combining the transactional properties of a relational database with the schema of a data warehouse.

The DataBase breaks the tradeoffs between row-based OLTP databases and columnar-based analytical queries, offering the scale and affordability of a data lake and the capabilities of a data lakehouse.

The key concepts and insights are as follows:

Semantic Layer

The VAST database serves as a critical component of the VAST data platform by providing a semantic or contextual layer.

It enables the application of semantic understanding to the data in both transactional and analytical contexts, allowing for real-time data processing and analysis.

What is the semantic layer?

The semantic layer is a critical component in tabular databases that acts as an intermediary between end-users and the underlying database structure.

It provides a simplified and user-friendly view of the data, enabling users to interact with it in a more meaningful and intuitive manner.

The semantic layer translates complex database structures into familiar concepts such as tables, columns, and relationships, abstracting away the underlying complexity.

Definition and Function: The semantic layer serves several key functions

Data Interpretation: It acts as a translator, converting technical database terminology into business-friendly terms. This allows users to understand and analyse data without needing deep technical expertise.

Relationship Definition: The semantic layer defines relationships between tables, making it easier for users to navigate and explore related data.

Security and Governance: It controls access to data based on user roles and permissions, ensuring data privacy and security.

Interplay with Tabular Databases

Tabular databases organise data into rows and columns, similar to a spreadsheet. While efficient for storage and retrieval, working directly with tabular databases can be challenging for end-users due to complex schemas and numerous tables.

The semantic layer enhances tabular databases by:

  1. Simplifying the data view, making it more accessible to non-technical users.

  2. Enabling the creation of calculated fields, aggregates, and hierarchical structures for more sophisticated analysis.

  3. Providing a centralised repository for data definitions and business rules, ensuring consistency and accuracy.

Benefits: Implementing a semantic layer brings several key benefits

  1. Improved Data Accessibility: Users can access and interact with data independently, without relying on IT or database professionals.

  2. Enhanced Data Security: The semantic layer enforces access control, auditing, and monitoring to safeguard sensitive data.

  3. Increased Efficiency: By centralizing data definitions and rules, the semantic layer promotes consistency, reduces errors, and streamlines data management.

Challenges: Implementing a semantic layer also presents some challenges

  1. Technical Issues: Ensuring optimal performance, scalability, and seamless integration with existing systems can be complex.

  2. Data Integrity: Maintaining accuracy and consistency between the semantic layer and the underlying database requires robust synchronisation and validation mechanisms.

  3. Change Management: Adopting a semantic layer may require significant planning, training, and support to overcome resistance and ensure smooth transition.

Future Trends: As technology advances, the semantic layer is expected to evolve

  1. Natural Language Processing and Machine Learning: These technologies will enhance query understanding and enable more intuitive user interactions.

  2. Intelligent Insights: By analysing user behavior, the semantic layer will proactively provide relevant recommendations and insights.

  3. Advanced Analytics: The semantic layer will facilitate the application of machine learning and predictive modeling, unlocking new opportunities for data-driven decision-making.

In conclusion, the semantic layer is a powerful tool that revolutionises data management and analysis in tabular databases.

By providing a user-friendly interface, enhancing security, and enabling advanced analytics, it empowers organisations to harness the full potential of their data and drive innovation in a data-driven future.

Design Principles

The VAST database is built on three key design principles: embracing standards (e.g., SQL, Apache Arrow), simplifying data management by eliminating complex tiers, and enabling deployment flexibility across public and private clouds.

Transactional and Analytical Capabilities

VAST challenges the traditional separation between transactional databases (OLTP) and analytical databases (OLAP).

Typically, OLTP systems are optimised for fast, small transactions, while OLAP systems are designed for large-scale data analysis.

VAST's aim is to create a unified system that can efficiently handle both types of workloads. This convergence is a significant shift from the norm, where separate systems handle transactional and analytical needs.

The result is that the VAST database combines transactional and analytical capabilities into a single system. It acts as a transactional system for real-time data ingestion and cataloguing while also providing analytical capabilities for quick understanding and querying of the data within the system.

Simplifying Data Engineering

The VAST database aims to simplify data engineering by eliminating constraints associated with managing semi-structured data objects in data science contexts.

Overcoming Limitations of Traditional Systems

Traditional database management systems often consist of separate systems for online transactional processing (OLTP) and online analytical processing (OLAP).

The VAST database addresses the limitations of these systems by providing a scalable solution that combines both transactional and analytical capabilities.

Disaggregated and Shared Everything (DASE) Architecture

The VAST database is built on top of the DASE architecture, which abstracts the cluster architecture and creates a data centre-scale computer.

All logic runs in stateless Linux containers, with data presented in parallel across a high-speed, low-latency data centre fabric. This architecture eliminates the need for machine coordination in the read/write path, which is a common bottleneck in traditional systems.

This architecture allows for high transactional services and scalability in terms of capacity and performance.

Write Buffer and Data Transformation: The VAST database uses a write buffer in storage class memory to absorb incoming data and provide time for data manipulation before storing it in low-cost flash storage. During this process, data is transformed from standard database record form into a columnar data format optimised for analytics.

Columnar Object and Metadata Store: The VAST database introduces a new style of columnar object that is significantly smaller than standard parquet row groups. These columnar objects are organised using a metadata store that enables efficient querying and data access.

SQL and Query Engine Integration: The VAST database supports native SQL querying and integrates with popular query engines like Apache Spark, Trino, and Dremio through push-down plugins.

Scalability and Performance: The VAST database is designed to scale linearly in terms of performance by adding CPUs, GPUs, and SSDs to the system. It can achieve high transactional throughput and sustained streaming performance for queries.

Similarity-Based Data Reduction: The VAST database employs a novel data reduction technique called similarity, which acts as a modern form of global compression. It identifies similar blocks of data across the entire data store and compresses them together, resulting in significant data reduction.

Cost Efficiency: By leveraging low-cost flash storage and efficient data reduction techniques, the VAST database offers a cost-effective solution compared to other flash-based data platforms.

Comprehensive Data Science Support: The VAST database provides a unified system that supports both structured data (databases and data warehouses) and unstructured data (file and object formats) commonly used in deep learning pipelines.

In summary, the VAST database introduces a transformative approach to database management by combining transactional and analytical capabilities, leveraging the DASE architecture, and employing innovative data reduction techniques.

It simplifies data engineering, enables real-time processing and analysis, and provides a scalable and cost-effective solution for modern data science and deep learning workloads.

Last updated

Was this helpful?