What is Vector?
What is Vector?
Vector, the industry’s fastest analytics database, handles continuous updates without a performance penalty. Vector achieves extreme performance with full ACID compliance on commodity hardware with the flexibility to deploy on-premises, and on AWS, Azure, and Google Cloud with little or no database tuning.
Vector Analytics Database Benefits
Outperforms alternatives by 7.9x – Enterprise Strategy Group
Reduce I/O, optimize data compression and deliver better cache performance to save time and money
REAL Real-time Insights
Continuously keep analytics datasets up to date without affecting downstream query performance
Runs on Windows and Linux, on-premises, hybrid and multi-clouds, including Google Cloud, AWS, and Azure
99.9% availability and support for 1000s of active users
Compliant and Secure
Share data safely across stakeholders with encryption at rest and in transit, and dynamic data masking
Vector is available on Microsoft Windows for single-server deployment. The Enterprise Edition provides production-level support, and the Evaluation Edition delivers capabilities over 30-, 60-, and 90-day periods.
Vector scales vertically on SMP systems running on popular Linux distributions that offer reliability and strong security.
Vector can be deployed as containers and micro services on the latest compute nodes of Google Kubernetes Engine (GKE), Google Cloud Dataproc, and Google Cloud Storage (GCS). It is tightly integrated with Looker, with planned integrations for DataFusion, Pub/Sub and Kubeflow. Vector can also run as a Google Cloud Platform Virtual Machine Image.
Vector supports single node and clustered configurations. Third-party benchmarks demonstrate that Vector significantly outperforms Microsoft SQL Server, Cloudera Impala, Amazon Redshift and Snowflake databases on AWS. Vector can run as an Amazon Machine Image on AWS and supports Amazon Elastic Kubernetes Service (EKS). Vector also supports BYOL for private and hybrid cloud deployment.
You can deploy Vector as a Microsoft Azure VM Image and it supports Azure Kubernetes Service (AKS).
Vector for Hadoop scales Vector beyond a single node to support thousands of users and petabytes of data. Vector uses YARN for workload management across 100s or 1000s of nodes. HDFS stores Vector data at greater than 10x compression. Unlike SQL on Hadoop , Vector accommodates differential inserts, updates, and deletes to run multiple operational workloads.
Vector Analytics Database Features
More Efficient Query Processing
- Vectorized query execution exploits Single Instruction, Multiple Data (SIMD) support in x86 CPUs
- Query result caching avoids rerunning queries when there are no changes
Maximizing CPU Cache for Execution
- Uses private CPU core and caches as execution memory – 100x faster than RAM
- Delivers significantly greater throughput without limitations of in-memory approaches
Other CPU Optimizations
Supports hardware-accelerated string-based operations, benefiting selections on strings using wild card matching, aggregations on string- based values, and joins or sorts using string keys
- Reduces I/O to relevant columns
- Opportunity for better data compression
- Built in storage indexes maximize efficiency
- Multiple options to maximize compression: Run Length Encoding (RLE), Patched Frame of Reference (PFOR), Delta encoding on top of PFOR, Dictionary encoding, and LZ4: for different string values
- 4-6x compression ratios common for real-world data
Maintain real-time access to data as it is compressed on disk, resulting in I/O and CPU savings and shorter execution time
Easy Integration and Migration
- Use DataFlow for fast data loading and DataConnect with over 200 connectors and templates to easily source data at scale.
- Loads structured and semi-structured JSON data, including event-based messages and streaming data without coding.
- Move a database to a cloud or remote datacenter in one step using the integrated “clonedb” function
- Automatic min-max indices enable block skipping on reads
- Eliminates need for explicit data partitioning strategy
Flexible adaptive parallel execution algorithms to maximize concurrency while enabling load prioritization
- Handles thousands of users, nodes, and petabytes of data
- Exploits redundancy in HDFS to provide system-wide data protection
- Role-based security
- Re-keying encryption
- Dynamic data masking
- Column-level de-identifcation
- Queue-based workload management to dynamically adjust queues based on resource availability and quotas
- Easy-to-use administrative console and SQL editor
Native Spark Integration
- Spark- powered direct query access
- Direct connection to Spark functionality via DataFrames
- Fast streaming for machine learning and artificial intelligence
Extensive SQL Support
- Standard ANSI SQL enabling the use of existing SQL without rewrite
- Advanced analytics, including cubing, grouping, and window functions
Mature Query Optimizer
- Mature and proven cost-based query planner
- Optimal use of all available resources, including node, memory, cache, and CPU