# Vast Database

The VAST DataBase is an all-flash transactional and analytical system designed for the AI era.&#x20;

It simplifies data engineering by <mark style="color:yellow;">eliminating tiers of data management and combining the transactional properties of a relational database with the schema of a data warehouse</mark>.

The DataBase breaks the tradeoffs between row-based OLTP databases and columnar-based analytical queries, offering the scale and affordability of a data lake and the capabilities of a data lakehouse.

The key concepts and insights are as follows:

#### <mark style="color:blue;">Semantic Layer</mark>

The VAST database serves as a critical component of the VAST data platform by providing a semantic or contextual layer.

It enables the application of semantic understanding to the data in both transactional and analytical contexts, allowing for real-time data processing and analysis.

<details>

<summary><mark style="color:green;">What is the semantic layer?</mark></summary>

The semantic layer is a critical component in tabular databases that *<mark style="color:yellow;">**acts as an intermediary between end-users and the underlying database structure.**</mark>*&#x20;

It provides a simplified and user-friendly view of the data, enabling users to interact with it in a more meaningful and intuitive manner.&#x20;

The semantic layer translates complex database structures into familiar concepts such as tables, columns, and relationships, *<mark style="color:yellow;">**abstracting away the underlying complexity**</mark>*.

<mark style="color:purple;">**Definition and Function: The semantic layer serves several key functions**</mark>

<mark style="color:green;">**Data Interpretation:**</mark> It acts as a translator, converting technical database terminology into business-friendly terms. This allows users to understand and analyse data without needing deep technical expertise.

<mark style="color:green;">**Relationship Definition:**</mark> The semantic layer defines relationships between tables, making it easier for users to navigate and explore related data.

<mark style="color:green;">**Security and Governance:**</mark> It controls access to data based on user roles and permissions, ensuring data privacy and security.

<mark style="color:purple;">**Interplay with Tabular Databases**</mark>

Tabular databases organise data into rows and columns, similar to a spreadsheet. While efficient for storage and retrieval, working directly with tabular databases can be challenging for end-users due to complex schemas and numerous tables.

The semantic layer enhances tabular databases by:

1. Simplifying the data view, making it more accessible to non-technical users.
2. Enabling the creation of calculated fields, aggregates, and hierarchical structures for more sophisticated analysis.
3. Providing a centralised repository for data definitions and business rules, ensuring consistency and accuracy.

<mark style="color:purple;">Benefits: Implementing a semantic layer brings several key benefits</mark>

1. Improved Data Accessibility: Users can access and interact with data independently, without relying on IT or database professionals.
2. Enhanced Data Security: The semantic layer enforces access control, auditing, and monitoring to safeguard sensitive data.
3. Increased Efficiency: By centralizing data definitions and rules, the semantic layer promotes consistency, reduces errors, and streamlines data management.

<mark style="color:purple;">**Challenges: Implementing a semantic layer also presents some challenges**</mark>

1. Technical Issues: Ensuring optimal performance, scalability, and seamless integration with existing systems can be complex.
2. Data Integrity: Maintaining accuracy and consistency between the semantic layer and the underlying database requires robust synchronisation and validation mechanisms.
3. Change Management: Adopting a semantic layer may require significant planning, training, and support to overcome resistance and ensure smooth transition.

<mark style="color:purple;">**Future Trends: As technology advances, the semantic layer is expected to evolve**</mark>

1. Natural Language Processing and Machine Learning: These technologies will enhance query understanding and enable more intuitive user interactions.
2. Intelligent Insights: By analysing user behavior, the semantic layer will proactively provide relevant recommendations and insights.
3. Advanced Analytics: The semantic layer will facilitate the application of machine learning and predictive modeling, unlocking new opportunities for data-driven decision-making.

In conclusion, the semantic layer is a powerful tool that revolutionises data management and analysis in tabular databases.&#x20;

By providing a user-friendly interface, enhancing security, and enabling advanced analytics, it empowers organisations to harness the full potential of their data and drive innovation in a data-driven future.

</details>

#### <mark style="color:blue;">Design Principles</mark>

The VAST database is built on three key design principles: <mark style="color:blue;">embracing standards</mark> (e.g., SQL, Apache Arrow), <mark style="color:blue;">simplifying data management</mark> by eliminating complex tiers, and <mark style="color:blue;">enabling deployment flexibility</mark> across public and private clouds.

#### <mark style="color:blue;">Transactional and Analytical Capabilities</mark>

VAST challenges the traditional <mark style="color:yellow;">separation between transactional databases (OLTP) and analytical databases (OLAP).</mark>&#x20;

Typically, OLTP systems are optimised for fast, small transactions, while OLAP systems are designed for large-scale data analysis. &#x20;

VAST's aim is to create a <mark style="color:yellow;">unified system</mark> that can efficiently handle both types of workloads.  This convergence is a significant shift from the norm, where separate systems handle transactional and analytical needs.

The result is that the VAST database <mark style="color:yellow;">combines transactional and analytical capabilities into a single system</mark>.   It acts as a transactional system for real-time data ingestion and cataloguing while also providing analytical capabilities for quick understanding and querying of the data within the system.

#### <mark style="color:blue;">Simplifying Data Engineering</mark>

The VAST database aims to simplify data engineering by eliminating constraints associated with managing semi-structured data objects in data science contexts.

#### <mark style="color:blue;">Overcoming Limitations of Traditional Systems</mark>

Traditional database management systems often consist of separate systems for online transactional processing (OLTP) and online analytical processing (OLAP).&#x20;

The VAST database addresses the limitations of these systems by providing a scalable solution that combines both transactional and analytical capabilities.

### <mark style="color:purple;">Disaggregated and Shared Everything (DASE) Architecture</mark>

The VAST database is built on top of the <mark style="color:blue;">DASE architecture</mark>, which abstracts the cluster architecture and creates a data centre-scale computer.&#x20;

All logic runs in stateless Linux containers, with data presented in parallel across a high-speed, low-latency data centre fabric.  This architecture eliminates the need for machine coordination in the read/write path, which is a common bottleneck in traditional systems.

This architecture allows for high transactional services and scalability in terms of capacity and performance.

<mark style="color:green;">**Write Buffer and Data Transformation:**</mark> The VAST database uses a write buffer in storage class memory to absorb incoming data and provide time for data manipulation before storing it in low-cost flash storage. During this process, data is transformed from standard database record form into a columnar data format optimised for analytics.

<mark style="color:green;">**Columnar Object and Metadata Store:**</mark> The VAST database introduces a new style of columnar object that is significantly smaller than standard parquet row groups. These columnar objects are organised using a metadata store that enables efficient querying and data access.

<mark style="color:green;">**SQL and Query Engine Integration:**</mark> The VAST database supports native SQL querying and integrates with popular query engines like Apache Spark, Trino, and Dremio through push-down plugins.

<mark style="color:green;">**Scalability and Performance:**</mark> The VAST database is designed to scale linearly in terms of performance by adding CPUs, GPUs, and SSDs to the system. It can achieve high transactional throughput and sustained streaming performance for queries.

<mark style="color:green;">**Similarity-Based Data Reduction:**</mark> The VAST database employs a novel data reduction technique called similarity, which acts as a modern form of global compression. It identifies similar blocks of data across the entire data store and compresses them together, resulting in significant data reduction.

<mark style="color:green;">**Cost Efficiency:**</mark> By leveraging low-cost flash storage and efficient data reduction techniques, the VAST database offers a cost-effective solution compared to other flash-based data platforms.

<mark style="color:green;">**Comprehensive Data Science Support:**</mark> The VAST database provides a unified system that supports both structured data (databases and data warehouses) and unstructured data (file and object formats) commonly used in deep learning pipelines.

In summary, the VAST database introduces a transformative approach to database management by combining transactional and analytical capabilities, leveraging the DASE architecture, and employing innovative data reduction techniques.&#x20;

It simplifies data engineering, enables real-time processing and analysis, and provides a scalable and cost-effective solution for modern data science and deep learning workloads.
