Sunday, February 10, 2013

A NoSQL Modeling Philosophy

There has been a lot of digital ink spilled about NoSQL, including my me. Lately, there has been some confusion and complaints. I have a simple model designing philosophy that I hope will satisfy at least some enthusiasts as well as critics of NoSQL. I have been mostly dealing with document-oriented NoSQL, but this might be applicable to other architectures. Here it is...


  • If the actual information might be duplicated by objects/documents (or whatever corresponds to relational rows), put that data in a separate collection/table, putting document IDs where needed by "main" objects. I believe this is called normalization. 
  • If the actual information is unique to that object, it belongs in that object... even if that means using arrays. Sub-structures might need sub-arrays and/or the IDs of other objects. 
What might such a model look like? My favorite example is a modern music collection that consists mostly of albums by specific bands (as opposed to various-artist albums). 

First, lets consider what a band/artist might have. The obvious is a name. Another might be a location. Another obvious thing a band would probably have is albums. Now, band might share a location, but aren't likely to share an album. So, locations would be stored as separate objects pointed to by band objects. Albums of a band, on the other hand, would be stored as an array in that band object. 

Now, albums have (mostly) unique names and release dates that are fairly random. So, an album structure would have a name and a release year. Looking up albums by genre can often be useful, and genres would be things albums of different bands would have in common. So, if your API allows, create genre objects and point to them from albums. So, individual albums have names, years, and genre IDs. 

So, now we have a band object structure with names, location IDs, and album arrays. We also have location and genre objects. Nothing would stop you from having location ID arrays, if it seems necessary. 

Now, if you have a lot of "various artist" albums, obviously you would want to use a different model. Actually, in that case, a graph-based database might be appropriate. 

I hope this blog clarifies NoSQL modelling. My e-mail address is euric.reiks@comcast.net ; let me know if you have any questions or other examples. If you have a Google e-mail address, you can leave comments on this page. Thanks for reading, and God bless you!