> For the complete documentation index, see [llms.txt](https://training.continuumlabs.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://training.continuumlabs.ai/infrastructure/vast-data-platform/vast-data-engine.md). # Vast Data Engine The **VAST Data Engine** is a crucial component of the VAST Data Platform, serving as the logic layer that 'breathes life' into data and enables the platform to function as a distributed AI computer. It is designed to simplify data processing and democratise AI infrastructure across the enterprise. The Data Engine is built upon the company's core architectural principles of adhering to standards, simplifying complexity, and giving customers control over their data and infrastructure. ### Key Points of Difference **Continuous and Recursive Computing**: The Data Engine is designed for real-time, event-driven processing. As data flows into the system, it triggers functions and correlations, creating a continuous loop of learning and discovery. **AI-Focused Data Format:** The Data Engine introduces a new data format specifically designed for deep learning, combining structured and unstructured data with versioning support to create lineage for model preparation and production. **Global Orchestration:** The Data Engine can execute and orchestrate workloads across a global collection of resources, working in tandem with the VAST Data Space for distributed computing. ### Integration with Other VAST Components #### VAST Data Store The Data Store is the foundation for unstructured data, providing a universal storage system that eliminates trade-offs between performance and capacity. The Data Engine leverages the Data Store to efficiently process and store unstructured data. #### VAST Database The Database is a transactional and analytical database, serving as the semantic layer for the Data Store. The Data Engine uses the Database to catalogue and query structured data derived from unstructured sources. #### VAST Data Space The Data Space enables the creation of a global namespace, allowing data to be accessed and processed across edge, core, and cloud environments. The Data Engine works with the Data Space to enable distributed AI computing. ### Key Features and Capabilities **VAST Streams:** The Data Engine introduces a streaming interface that combines a streaming engine with high-performance tabular data stores, enabling real-time data ingestion, processing, and querying. **Notification System:** The Data Engine includes a notification system that allows for real-time monitoring and triggering of events based on data updates, function execution, and other system activities. **Serverless Computing:** The Data Engine provides a serverless computing environment with a Python-based SDK, enabling the auto-scaling of execution nodes and the deployment of compute resources across the VAST Data Space. **VAST Dataset:** The Data Engine introduces the concept of a VAST Dataset, which combines code and data into a materialized view for training purposes, eliminating the need for specific file creation. **Built-in Functions:** The Data Engine will offer a series of built-in functions for common tasks such as data indexing, metadata scraping, PII detection, and ransomware detection. By integrating the VAST Data Engine with the existing components of the VAST Data Platform, organisations can create powerful, real-time AI pipelines that leverage both structured and unstructured data. The Data Engine's event-driven architecture, global orchestration capabilities, and seamless integration with the Data Store, Database, and Data Space enable the creation of a distributed AI computer that can process data at exabyte scale. The VAST Data Engine represents a significant step forward in the company's vision of creating a "thinking machine" that can continuously learn and discover insights from vast amounts of data. By simplifying the data processing experience and providing a unified platform for AI workloads, VAST Data aims to democratise AI infrastructure and enable organisations to unlock the full potential of their data assets.