How MemSQL Works
MemSQL is a distributed, relational database that handles mixed transactions and real-time analytics at scale. It is accessible through standard SQL drivers and syntax and supports a broad ecosystem of drivers and applications.
MemSQL has a two-tiered architecture that provides high throughput. It is a distributed system that can scale horizontally on commodity hardware, and is very compatible with other technologies in the modern data processing ecosystem (e.g. orchestration platforms, developer IDEs, and BI tools). It features an in-memory rowstore, and an on-disk columnstore. It also features MemSQL Pipelines, a tool that can efficiently stream data into the MemSQL rowstore and columnstore.
More detail about MemSQL is described in the sections below.
MemSQL has a two-tiered, clustered architecture. Each instance of MemSQL is called a “node” and runs identical software. The only difference is the role the nodes are configured to play.
Aggregators are the interface to database clients and applications. Aggregators run SQL queries across the cluster and aggregate results.
Leaves store and process data.
MemSQL is designed to enable high throughput on concurrent workloads. A distributed query optimizer evenly divides the processing workload to maximize the efficiency of CPU usage. Queries are compiled to machine code and cached to expedite subsequent executions. Rather than cache the results of the query, MemSQL caches a compiled query plan to provide the most efficient execution path. The compiled query plan does not pre-specify values for the parameters, which allows MemSQL to substitute the values upon request, enabling subsequent queries of the same structure to run quickly, even with different parameter values. Moreover, due to MemSQL’s use of MVCC and lock-free data structures, data remains highly accessible, even amidst a high volume of concurrent reads and writes.
MemSQL is a highly scalable distributed system. The cluster can be scaled out at any time to provide increased storage capacity and processing power. Sharding is done automatically and the cluster re-balances data and workload distribution. Data is highly available and nodes can go down with negligible effect on performance.
In addition to being fast, consistent, and scalable, MemSQL is also durable. Transactions are committed to disk as logs and periodically compressed as snapshots of the entire database. If any node goes down, it can restart using one of these logs.
MemSQL is an ODBC-compatible database. It is wire protocol compatible with MySQL so that applications that use a MySQL driver can connect to and use MemSQL transparently. MemSQL supports a subset of the MySQL syntax, plus extensions to support advanced features not in MySQL such as Distributed SQL, Geospatial, JSON, and Window Functions.
In-Memory and On-Disk Storage
MemSQL supports storing and processing data using a completely-in-memory rowstore or a disk-backed columnstore. The MemSQL in-memory rowstore is best for optimum performance in transactional workloads. The MemSQL columnstore is best for cost-effective data storage of large amounts of historical data for real time analytics. A combination of the MemSQL rowstore and columnstore engines allow merging of real-time and historical data in a single query.
Tight Spark Integration for Real-Time Data Streaming
MemSQL has tight Apache Spark integration, giving MemSQL users a simple way to create and manage real-time data pipelines. Users can leverage the MemSQL Spark Connector to stream real-time data into MemSQL.