Wednesday, 5 October 2011

New Web, New Databases: NoSQL

As we have written many times in this blog the overall landscape of how the Internet and computers fit in to our lives is changing, and this impacts the way we live in communities.  
Here are some of the key changes happening in internet technology today:
  • Growing agility: changes in devices, changes in software deployment made possible by the Cloud, changes in community and knowledge sharing, driven by software innovation, are creating more agile networks that both support and require more agile means of working together and cooperating.
  • Demand for rapid scalability:  on the web a solution may go from 100 to 100,000,000 users in a matter of months.  
  • Constant, light weight interactions: large emails are still around, but they are increasingly supplanted by a continual flow of two way traffic through the Cloud.  Much of this flow consists of small transactions like tweets, check-ins, likes, or pluses
Platforms like SharePoint, Google, Facebook and Twitter are fighting it out to win share in the new spaces that are opening up, and are inventing more new spaces as they go along.  As the new spaces open up, as networks form around api’s and mashups, new technologies are developing to support the needs of these data harvesters, as we like to think of them.

Data storage is one area where new technologies and business needs are driving a major transformation.

We recently attended a MongoDB conference in London, which opened our eyes more fully to the potential of NoSQL technology in this rapidly changing world. Many people see this technology as the future of Cloud computing.

The SQL Database

Until recently, data driven web sites have used relational databases more commonly called SQL to store their data.  These rest on technology that has been around in roughly its current form since the 1960’s. Relational databases have supported main frame applications, client-server applications, and web applications for years. They have been a powerful tool supporting a lot of innovation. And there will continue to be needs for them for the foreseeable future.

Image a parking garage that took your car, took it apart in to its pieces, stored all the specific pieces along with the same kind of pieces from other people's cars, and then had to put it all back together again when you wanted to leave. That is how an SQL database deals with your data.

But relational database are grounded in tables and relationships. All data that comes must be broken down in to pieces and stored in tables. So the data you submit in a form may be broken in to more than one, perhaps even dozens of databases. Max Schireson, President of 10gen the making of mongoDB describes relational database like a parking garage that takes your care apart and sorts it with other people's car parts according to the part. So all the tires go in one area, the ignitions in another. If a parking garage was like a relational database they would have to disassemble your car to part it and reassemble it to give it back to you.

For many very structure data tasks SQL databases are fine, but what happens when the data is not so clearly structured? Often what you see in SQL databases used for very unstructured data is a combination of column with generic names like col1 and blob storage. With the Internet producing a mass of un-structured or inconsistently structured data there has to be a better way.

And now No SQL

We purposely left out a definition of the term NoSQL because we wanted to wait for this moment to clarify that it is not ‘No’ SQL, but rather N O SQL, as in Not Only SQL. Which is the point. Not every application needs the particular flavor of discipline that SQL databases require.

The discipline of a relational database is appropraite needed when the following are true:
  1. All data should share a standard structure, a structure you will know and completely define before go live and that you will change as infrequently as possible, and mainly if not only by adding elements.
  2. Because of the overhead of storing, reading and writing in and out of the database, transactions should be as infrequent as possible.  
  3. You know how much storage you are going to need over a very long period of time so that you can build a cluster of the right size.
  4. Data will be designed to maintain structural and referential integrity and the relationships between elements will be precisely defined through use of unique keys.

In standard Enterprise solutions that serve well understood business solutions like payroll these conditions are all met.  

But in our more Agile world projects are more likely to follow different conditions:
  1. Data may not have a fixed structure, or the structure may change, or a single structure won't be shared by all elements in the database.
  2. Many rapid small transactions are coming to the site all the time.
  3. You have no idea how much storage you will need in the medium terms.

This is where NoSQL can offer a solution, as explained to use by the MongoDB team at their conference in London.

So How does a NoSQL Database Work

Mongo is a document database, which is a very different kind of animal from the table-based DB systems we’ve used for so long. Documents, which correspond in the NoSQL world to records, are stored in collections, which roughly equate to tables. Documents can be stored in collections without be first broken down into structure pieces. Documents inside of collections do not need to share a common structure or scheme.

To accomplish this with SQL database you need to either store documents as blobs (which is expensive) or use Remote Blob Storage (RBS) to save the files on a file structure link to a SQL database. But NoSQL system give you this from the start.

Document structure is encoded in BSON, or Binary JSON, which can be readily extracted to a JSON object in JavaScript, or equivalent types in most other major languages, e.g. Java, Ruby, PHP, etc., for all of which there are Mongo drivers.

The site is an excellent source of general information on the subject. This page should be your first port of call on all the related technology. To prove to you how main stream this technology is becoming it turns out that Microsoft's Cloud solution Azure offers a NoSQL solution. In the UK the National Archives TNA, the nations collector of history, is developing on MongoDB and .NET. Oracle is also offering a solution set in this area. So this technology is not only for the Linux Open-Source universe, we anticipate that it will become core to all Cloud offering.

Impact of MoSQL Technology on the Future of Computing

There are a large number of links that you can use to learn about the technical details of MongoDB and other NOSQL databases. At Web 3.0, we are more concerned with the larger implications of NoSQL. In particular, these have massive implications for the way in which application development proceeds. Mongo offering app developers potential freedom from the more constrained conditions imposed on traditional IT operations. These have often been bottlenecks to development, demanding approval of large projects and control in how data is stored and hence, to a large extent, the ways in which future application development can proceed.

Many organisations will be familiar with this bottleneck: the high cost of altering the underlying data structure to support some new feature whose benefits, while large, are not high enough to offset the huge cost of making changes.

With NoSQL data structure is far more flexible. Documents within the same collection do not even have to have the same field structure.

This means that decisions on data structure moves away the data layer and into the application layer. Need a change to your data? Fine. Go Implement it from your code. With NoSQL a change to the logic to your app and new records will simply have the new structure without impacting the integrity of the overall data store.

What this means  is that the app development tail is pushed a lot further out. The cost of change and new development could be a lot lower. Time to market could faster, meaning the business can respond more rapidly to changes in its market, and the cost of making those changes is lower, meaning business can now draw benefits from changes that used to be priced out.

This kind of technology is precisely what the Cloud needs for the new generation of web applications. Applications that are more agile, that are open to evolving and multiple forms of structure or not structure at all, and can easily scale up or down.

No comments:

Post a Comment