Sunday 27 April 2014

Tell me about NoSQL

NoSQL seems to be the buzzword of choice at the moment for people who want the flexibility to build and frequently alter their databases. But there are plenty of people who still aren’t quite sure what a NoSQL database is and why they should want to use it. So let’s take a brief overview of NoSQL.

The term, NoSQL, first saw the light of day in 1998 when Carlo Strozzi used it as the name of his lightweight, open-source, relational database because it didn’t expose the standard SQL interface. But the term gained its modern usage in 2009 when it was used as a generic label for non-relational, distributed, data stores. So, it refers to a whole family of databases, rather than a single type of database.

Developers like NoSQL because they can store and retrieve data without being locked into the tabular relationships used in relational databases. It makes scaling easier and they provide superior performance. They can store large volumes of structured, semi-structured, and unstructured data. They can handle agile sprints, quick iteration, and frequent code pushes. They use object-oriented programming that is easy to use and flexible. And they use efficient scale-out architecture instead of expensive monolithic architecture.

But, on the down side, NoSQL lacks ACID (Atomicity, Consistency, Isolation, Durability) transaction support. Atomicity means that each transaction is ‘all or nothing’, ie if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. Consistency ensures that any transaction brings the database from one valid state to another. Isolation means that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed sequentially. Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. And that’s the kind of reliability you want in a business-critical database.

NoSQL databases are typically used in Big Data and real-time Web applications. The different NoSQL database technologies were developed because of the increase in the volume of data that people needed to store, the frequency the data is accessed, and increased performance and processing needs.

There are estimated to be over 150 open source databases available. And there are many different types of NoSQL database, including some that allow the use of SQL-like languages – these are sometimes referred to as ‘Not only SQL’ databases. Classify NoSQL databases is quite a challenge, but they can be grouped, by the features they offer, into column, document, key-value, and graph types. Alternatively, they can be classified by data model into KV Cache, KV Store, KV Store - Eventually consistent, Data-structures server, KV Store – Ordered, Tuple Store, Object Database, Document Store, and Wide Columnar Store.

The good news for DB2 users is that IBM has provided a new API that supports multiple calls and a NoSQL software solution stack that ships with DB2. It’s free with DB2 on distributed platforms and with DB2 Connect. DB2 also offers a second type of NoSQL-like database – the XML data store. This can store the growing volume of Web-based data.

Rocket Software has a way of using MongoDB (an example of a NoSQL database) on a mainframe. Rocket can provide access to any System z database using any MongoDB client driver. DB2 supports MongoDB.

IBM recently announced Zdoop, Hadoop database software for Linux from Veristorm on System z mainframes, stating: “This will help clients to avoid staging and offloading mainframe data to maintain existing security and governance controls”.

Other NoSQL databases that you might want to look out for include Cassandra, CouchBase, Redis, and Riak.

Clearly, with the growth in Big Data, we’ll be hearing a lot more about NoSQL databases and how they can be integrated into mainframe technology. There are lots of them out there and they can be quite different from each other in terms of their features and data models used.

No comments: