Cloud-Native Database Selection: SQL vs NoSQL vs NewSQL

Cloud-Native Database Selection: SQL vs NoSQL vs NewSQL

Database selection has always been consequential, but the stakes have never been higher. In a cloud-native architecture where applications are decomposed into services, each with its own data store, the number of database decisions an enterprise makes has multiplied. The polyglot persistence model — selecting the best database for each workload — is sound in principle but creates an operational landscape that demands strategic governance.

The database market in early 2021 is remarkably diverse. Traditional relational databases like PostgreSQL and MySQL continue to mature. Cloud-native managed services like Amazon Aurora, Azure Cosmos DB, and Google Cloud Spanner offer new operational models. Document databases like MongoDB have added transaction support. Time-series databases like TimescaleDB and InfluxDB serve specialised workloads. And the NewSQL category — databases that promise relational semantics with horizontal scalability — has matured from academic curiosity to enterprise consideration, with CockroachDB and YugabyteDB leading the field.

For the CTO, this abundance creates a selection challenge that is both technical and strategic. The wrong choice constrains application design, creates operational burden, and may require costly migration. The right choice enables the application to scale with the business while remaining operationally manageable.

Understanding the Trade-offs

The CAP theorem, articulated by Eric Brewer over two decades ago, remains the foundational framework for understanding distributed database trade-offs. In a distributed system, you can have at most two of three properties: consistency, availability, and partition tolerance. Since network partitions are inevitable in distributed systems, the practical choice is between consistency and availability during partition events.

Traditional SQL databases prioritise consistency. When you write a row to PostgreSQL and immediately read it back, you get the value you wrote. Transactions provide ACID guarantees that simplify application logic. The relational model, with its schema enforcement and join capabilities, provides a powerful abstraction for complex data relationships. These properties make SQL databases the default choice for workloads where data correctness is paramount — financial transactions, inventory management, and any domain where inconsistency has business consequences.

The scaling limitation of traditional SQL databases is vertical. Adding more CPU, memory, and storage to a single node eventually hits physical and economic limits. Read replicas provide horizontal read scaling but do not address write scalability. Sharding — distributing data across multiple database instances based on a partition key — provides write scalability but sacrifices cross-shard joins and transactions, effectively giving up many of the properties that made the relational model attractive.

NoSQL databases made the opposite trade-off. By relaxing consistency guarantees and abandoning the relational model, they achieved horizontal scalability. Document databases like MongoDB store data as flexible JSON-like documents, eliminating the impedance mismatch between application objects and database rows. Key-value stores like DynamoDB and Redis provide extreme performance for simple access patterns. Wide-column stores like Cassandra offer write-optimised scalability across geographic regions. Graph databases like Neo4j provide efficient traversal of relationship-rich data.

The limitation of NoSQL databases is the application complexity they create. Without ACID transactions, applications must handle consistency themselves — implementing saga patterns, compensating transactions, and eventual consistency handling. Without joins, data that would be normalised in a relational model must be denormalised, creating update anomalies that the application must manage. For simple access patterns, this trade-off is acceptable. For complex business domains with intricate data relationships and strong consistency requirements, the application-level complexity can be substantial.

NewSQL databases attempt to transcend this trade-off. CockroachDB, YugabyteDB, Google Cloud Spanner, and TiDB provide SQL interfaces, ACID transactions, and strong consistency while offering horizontal scalability through distributed consensus protocols. The engineering achievement is genuine — these databases demonstrate that it is possible to have relational semantics with distributed scalability. The question for enterprise adoption is whether the operational maturity, ecosystem support, and performance characteristics meet production requirements.

A Decision Framework for Enterprise

The database selection decision should be driven by workload characteristics, not technology preferences. A structured evaluation framework considers four dimensions.

The data model dimension asks what the natural shape of the data is and how it will be queried. Highly relational data with complex joins and referential integrity requirements points to SQL databases. Document-oriented data with hierarchical structure and flexible schemas points to document databases. Data with intensive relationship traversal points to graph databases. Simple key-value access patterns point to key-value stores. Time-ordered data with append-heavy write patterns points to time-series databases.

The consistency dimension asks what happens when the data is wrong. For financial transactions, inventory counts, and any domain where inconsistency has direct business impact, strong consistency is non-negotiable. For content feeds, recommendation engines, and analytics, eventual consistency is often acceptable and enables better availability and performance. The answer is not always obvious — seemingly non-critical data can have surprising consistency requirements when examined closely.

The scalability dimension asks how the data volume and access patterns will grow. If the workload will remain within the capacity of a single high-specification node for the foreseeable future, a traditional SQL database on managed infrastructure (Amazon RDS, Azure Database, or Cloud SQL) provides excellent operational simplicity. If the workload requires horizontal scalability, the choice narrows to databases designed for distribution — which includes most NoSQL databases, the NewSQL category, and cloud-native offerings like Aurora and Spanner.

The operational dimension asks what the team can effectively operate. Every database technology carries an operational burden — backup and recovery, performance tuning, upgrade management, monitoring, and incident response. Managed database services reduce this burden significantly but constrain configuration flexibility. Self-managed databases provide full control but require dedicated expertise. The honest assessment of operational capability is as important as the technical evaluation.

Cloud Provider Considerations

The cloud providers have fundamentally changed the database landscape by offering managed services that eliminate significant operational overhead. This convenience comes with strategic implications that CTOs must consider.

Amazon Web Services offers the broadest database portfolio — RDS for traditional SQL, DynamoDB for key-value and document workloads, ElastiCache for in-memory caching, Neptune for graph workloads, Timestream for time-series data, and Aurora as a cloud-native relational database with MySQL and PostgreSQL compatibility. The breadth is impressive, but each service has distinct operational characteristics, pricing models, and integration patterns.

Azure provides comparable managed database services with particularly strong integration for organisations already invested in the Microsoft ecosystem. Azure Cosmos DB is noteworthy for its multi-model approach, supporting document, key-value, column-family, and graph models through a single service with tunable consistency levels.

Google Cloud Platform’s Cloud Spanner represents the most ambitious offering — a globally distributed, strongly consistent relational database that provides horizontal scalability without sacrificing ACID semantics. Spanner’s pricing model, which charges based on node hours and storage rather than provisioned throughput, can be expensive for smaller workloads but becomes competitive at significant scale.

The strategic consideration is lock-in. Cloud-native database services offer superior operational convenience but create dependencies that make cloud migration expensive. PostgreSQL on RDS can be migrated to another provider or to self-managed infrastructure with moderate effort. DynamoDB or Cosmos DB require application-level changes to migrate. The CTO must balance operational convenience against strategic flexibility, and the right answer depends on the organisation’s multi-cloud strategy and the criticality of the workload.

Enterprise Governance for Polyglot Persistence

In a microservices architecture, the freedom for each service to choose its own database must be balanced with organisational governance. Without governance, an enterprise can find itself operating dozens of different database technologies, each requiring specialised expertise, tooling, and operational procedures.

The recommended approach is to establish a curated database portfolio — a defined set of database technologies that the organisation supports, with clear guidance on which workload patterns map to which database. This portfolio might include a relational database (PostgreSQL), a document database (MongoDB), a key-value/cache layer (Redis), and a search engine (Elasticsearch), with NewSQL databases like CockroachDB available for workloads requiring both relational semantics and horizontal scalability.

New database technologies can be added to the portfolio through a structured evaluation process that assesses technical fitness, operational readiness, licensing implications, and team capability. This is not about preventing innovation — it is about ensuring that the organisation can effectively operate and support the databases it deploys.

Data lifecycle management, backup strategy, disaster recovery, and compliance requirements must be addressed consistently across the portfolio. The platform team should provide standardised tooling for these cross-cutting concerns, reducing the burden on individual service teams while ensuring organisational standards are met.

The database selection decision is ultimately a business decision masquerading as a technical one. The CTO who approaches it with a clear framework, an honest assessment of organisational capabilities, and a long-term perspective on operational sustainability will make choices that serve the enterprise well through years of growth and change.