Architectures for Data Storage and Management

This is a summary of one section of my workshop on Data Architectures at the SSIR Data on Purpose workshop.

Data management and storage is a problem for organizations large and small.  In this post I’m going to lay out how I approach helping these groups come up with a comprehensive strategy that meets their needs.

Core Questions

There are a few questions you need to ask yourself before coming up with a plan for storing and managing your data:

  • how do I make it easy to add, find, and use data?
  • what processes will help us organizing and manage our data?
  • what tools can we use to support managing our data?
  • what is the appropriate level for my organization?

These focus as much on technological solutions as social processes.  You need to understand what does and doesn’t work already within your community before making a plan to move forward.

Goals

What criteria does a good solution need to meet? Here is an outline of how I approach this:

goals

  • organized: your data should be stored in a consistent structure (often this tends to reflect the structure of your organization)
  • described: your data needs to be documented formally or informally (this can include anything from a sentence to formal meta-data, and should include notes on how it was created)
  • accessible: your data should be available for people to use (this could be on a shared file-server, an online portal, a data management system… and should be easy to add to)
  • usable: your data should be stored in a language your organization speak (this could be spreadsheets, databases, or should follow any standards for format that exist in your area)

Techniques

So how do you think about the space of available solutions?  I tend to think about solutions in two ways (based on goals above) – how organized & described they are, and how usable & accessible they are.  For instance, having standardized spreadsheets stored on individual staff’s computers is very organized, but not very accessible at all!  Here’s a chart that tries to map some of the solutions against these two axes:

solutions

This map can be helpful to help figure out where you are, and where you want to be.  It isn’t necessarily the case that you need to be in the top right of this chart (ie. very organized and very accessible)… you need to figure out what is right for your organization.

There are lots of specific technologies that can help in this space.  I’m not in the business of endorsing specific packages, but here are some I see other folks using:

  • A shared internal file server (sharepoint) or external sharing service (dropbox) can be helpful to get all your data in one place and expose it to everyone.
  • An online data portal can help you collect, organize, and share your data internally and externally.  Lots of cities around where I live use Socrata.  Many of the mid-sized organizations I have worked with use the open source ckan project.
  • If you are focused on helping people access your data with software APIs and/or code, or need strong support for versioning your data, look for online platforms like GitHub.

Obviously the solutions that are right for you need to fit your data and topic – if you work on sensitive issues of personal data, you need to be especially sensitive to understanding where these online platforms store your data and how they might back it up.

Getting Started

I hope this is helpful scaffolding to help you think about what architectures for data management and storage can help.  This stuff can be boring, but it is critical infrastructure to get in place to support building a strong data culture within your organization!  Start with these questions:

  • what data language does our organization speak already?
  • how is our data organized right now?
  • what needs must any solution we use meet?