Data storage is the backbone of all applications. Be it user data, configuration settings, or temporary cache information, Data storage is the powerhouse that guarantees seamless operation and significantly elevates the user experience.
Today, the availability of numerous data storage options is a significant advancement from the past days when merely storing data in files. However, such a wealth of choices can be both a blessing and a curse. So, how does one decide when to use each type of data storage?
Redis
Redis is your go-to key-value storage when you need lightning-fast speed and your data can snugly fit into memory. It’s worth noting though, that it doesn’t have complex querying capabilities, as it only allows querying by key.
While it does cheerfully support clustering and sharding, these features aren’t as transparent as we’d like. Replication is available too, but it’s done on a separate shadow server that’s not a part of the cluster. Let’s stay optimistic and hope that your Redis server keeps chugging along smoothly, since it’s not the biggest fan of recovery.
So, when should you consider using it? Simply put, when you’re in need of a straightforward key-value service.
PostgresSQL
PostgresSQL is an incredibly powerful database. It’s got full SQL support, but here’s the cool part – it also supports JSON columns. This makes it a dream for storing both schema and schema-less data. Just like any other SQL server, it boasts complex querying capabilities. And, it’s not just limited to your standard datatypes, oh no – it even supports arrays and geographical data!
But, let’s be fair, it’s not all sunshine and rainbows. Its limitation? That would be the capacity of data that can fit into one server.
So, when might you want to give it a whirl? Well, a good rule of thumb is if your largest data table is smaller than 100 millions (1E8) rows.
MongoDB
MongoDB is this super cool schema-less database. It operates as a cluster with one master handling all the writing and a bunch of slaves for read queries. The best part? Replication from the master to the slaves is a breeze and completely transparent. These slaves are part of the cluster and can serve queries like a champ!
Now, while Mongo can be shard, it isn’t transparent. But hey, on the bright side, it supports inner JSON actions, transformations, and aggregations (think group by and join in SQL).
Now, I know what you’re thinking: “What about recovery from crashes?” Well, Mongo has got your back! If the master node crashes, no worries! The slaves will negotiate and one of them will step up to be the master. Until then, the cluster becomes read-only. And if a slave node crashes? Well, everything’s still peachy. The cluster is still 100% available.
One of the best parts about Mongo is that your data is always consistent. So, if you send an insert, then update, then query, you’ll always get the updated data. How cool is that?
However, just a heads up: the query language, while expressive and powerful, can sometimes feel a bit verbose.
So, when should you consider using Mongo? It’s perfect when you’ve got medium data that doesn’t fit into one server. A good rule of thumb: the largest table should be less than a billion (1E9) rows. It’s not often used for really large data (I mean, who wants to have so many servers and only one master?), but it’s perfect for that sweet middle ground.
ElasticSearch / OpenSearch
Let’s talk a bit about Elasticsearch and its fork, OpenSearch. These two started off as simple text search engines for rooting through logs. But don’t be fooled, they’ve grown a lot since then. Now, they can handle not just text, but numbers and dates as well. They work in a cluster setup, meaning they can store and sift through mounds of data, finding the most relevant results for your free text search. Plus, they come equipped with a super powerful query and visualization engine (think Kibana for Elasticsearch or OpenSearch Dashboards for Open Search).
But remember, while they’re powerful, they might not be the safest place for your valuable data. If you’ve got data you can’t afford to lose, make sure you’re storing it somewhere reliable too.
So, when should you consider using ElasticSearch / OpenSearch? Well, they shine when your data is text-based and you need text-based queries and if you don’t want to develop your own reports platform, that’s even better! Users can run their own queries via Kibana / OpenSearch Dashboards. How cool is that?
Cassandra
Cassandra is a star player when it comes to real clustering, supporting thousands of nodes in a cluster while being highly available.
No worries about node crashes here! The system is fully replicated so nothing goes amiss. The beauty of Cassandra is that all nodes are equal. The tabular data structure can even house JSON in one of the columns. Remember though, you need to define a clustering key and include only fields from that key in your ‘where’ when you query. Plus, Cassandra keeps the data sorted by the clustering key, so you always get sorted data in return.
Cassandra stores data in memory, making writes very fast and pocket-friendly. It even nudges you to write data to specific tables, with each one serving a particular query. For instance, think of Twitter. You can have a table per user feed, and every time a user makes a tweet, the data is written to the feeds of all the followers. How neat is that?
Now, there is a small caveat. Cassandra has eventual consistency, meaning you might query and get data that’s not updated yet.
So when should you use Cassandra? If you’re dealing with a huge data load (Cassandra supports thousands of servers) and you always want to access the latest data swiftly and sorted, Cassandra is your go-to. Think event management, social groups data, etc. It can even support tables with more than a billion (1E9) records!