This post was written for a data visualization workshop: https://blog.soda.camp/021/
Consider the humble Filing Cabinet. An instrument of human ingenuity, designed to help us organise information. How does it work? Consider what is involved in arranging documents inside of it, and the several ways you could guide the user into finding what he or she wants inside.
Databases have a long history, coming into full use in the Mainframe Era where models like CODASYL were used to efficiently locate bits of data strewn across tapes and early multi-megabyte hard disks … to the Golden Age in the late 90s where every organisation large and small started keeping records in multiple databases in a variety of database models … to our Cloud Present, where more and more data is stored more or less securely and openly on the Internet … and a likely Distributed Future where applications rely increasingly on availability of multiple types of local and remote and transactional (think blockchain) data sources - which in themselves are split between multiple owners and middleware, and so on.
So what is a Non-Relational Database, a.k.a. NoSQL? It basically means a database that contains data which is not organised in a table. These are supposed to be simpler to design and scale across many machines. They are very fast and ideally suited to some “Big Data” applications. Many of them are, it would seem, a perfect match for data visualisation since their syntax and JSON support parallels thatof tools such as D3.js. They are close cousins to Graph databases, such as neo4j, which power many Linked Data systems and make an excellent basis for a variety applications.
Popular non-relational databases include MongoDB, Apache Cassandra, and form the underlying technology for platforms like ElasticSearch, and there are many many products. Newcomers like Quasar make compelling arguments for throwing out the old software. Compelling performance analyses, such as one by ArangoDB make us think that our project might be dead in the water if we choose the “wrong” technology. Millions of websites and production applications run on these technologies, while opinion blogs proclaim things like “MongoDB is the Frankenstein monster of NoSQL databases… MongoDB is not designed, so much as undesigned” (LinkedIn) and “I Can’t Wait for NoSQL to Die” (Ted Dziuba).
So what’s a smart cat left to do?
Consider Google, who set clear strategic reasons for their database technologies, and have the armies of engineers to build the perfect product. If you are not Google, you might as well go with the flow, right? I would suggest keeping your options open. Keep an eye out for simple projects that let you reuse your existing database, rather than submitting to the continuing rush to new technology - try pgjson, an open source adapter for using your Postgres database as if it was non-relational in a Node.js app. Use cloud-based tools like mLab to quickly and cheaply test out a product without committing to it. Learn how the tools are used in the real world through good references and online courses.
After this sweeping overview, I showed a practical example of non-relational databases in action based on a recent project. I first quickly reminded our class of the purpose of APIs (like Transport.Opendata.ch and OpenData SMN), and how they make thousands of data sources more accessible for development.
The Portrait Id project from last year’s OpenGLAM #events has a Python/Flask backend (source code on GitHub) that talks to a MongoDB database running on mLab. After demoing the application, which is a simple Facebook clone with a twist, I pulled out mongoclient and Mongotron, showing how to connect to the same database and make some simple queries and filters - and how these are exposed in the Python application’s API. After the talk one of the participants briefly discussed Facebook’s Graph API, so it seems the presentation gave at least some people ideas.