Thoughts On Designing Data Sculptures

Making Charts in 3D

‘The Humans of the Hackathon’ — created by Pratap, Richie & Sainath is a physical visualization of participation at the July hackathon conducted at Gramener, Inc.

Heatmap of injuries from fireworks between 2009 and 2014 in the US. Darker red represents parts ofthe body that had more injuries. Created by Judy Chang, Gary Burnett, and Andrew Mikofalvy.

Edible comparison of honey produced in the US in 2016 vs. 2017. Note the cracker on the left is covered in much more honey than the one on the right, and is thus more delicious data to consume! Created by Olivia Brode-Roger, Mitchel L Myers, Alicia Ouyang. Learn more.

Take Advantage of Your Material

Pieces of an interactive physical exploration of water use data. Created by Lily Xie, Sarah Caso, and Tanaya Srini.

Edible data brownies used to represent air quality in various cities. The salt level increased with air pollution levels (using a taste-based perceptual scale based on their in-kitchen experimentation). Created by Tina Quach, Margaret Tian, Tony Zeng, and Aina Martinez Zurita.

Support Deeper Investigation

A pile of Monopoly houses, used to represent the number of households in the US. Screenshot of a New York Times article.

Continuing the visual pun — Monopoly hotels used to represent households in a comparison by party affiliation. Screenshot of a New York Times article.

Data sculpture with hidden water underneath the table. Picking up each fork surprised you because it was connected to the heavy water load underneath. You can see the small black strings tying the fork to the water bucket beneath. Created by Sarah Von Ahn, Amy Vogel, and Theresa Machemer.

The second piece used the idea of colored water in 2-liter bottes to dig beyond total volume of water and into the type of water.

Conclusion

Making Tools More Learner-Friendly

I often advise learners to be careful with what tools they choose to spend time learning.  Some powerful ones have steep learning curves, full of jargon and technical hurdles.  Others are simple and self-explanatory, but can’t do more than one thing.  I’ve been trying to find better ways to connect with tool builders and talk to them about how they need to build learner-centered tools.

Catherine D’Ignazio and I put these thoughts together into a talk for OpenVisConf this year.  This is a super-dorky conference for data viz professionals… just the place to find more tool builders to talk to!  We put together an argument that data visualization tool as informal learning spaces.  Watch the video below:

Talking Data & Uncertainty with Patrick Ball

Recently at the Responsible Visualization event put on the by the Responsible Data Forum I had a wonderful chance to sit down with the amazing Patrick Ball from the Human Rights Data Group and talk through how we help groups learn about working with incomplete data.

With my focus on capacity building, I’m trying to find fun ways for NGOs to learn about accuracy and data at a very basic level. Patrick agues that in fact you need rigorous statistical analysis to do this well, from his background in human rights data. I pushed a bit, asking him is there was a 80/20 shortcut. His response was to paint a great distinction between homogenous and heterogenous observability of data. For instance, there are many examples of questions that don’t require quantitative rigor – case existence, case history, etc.  This sparked a fun conversation about visual techniques for conveying uncertainty.

Watch the video to see the short conversation, or just catch the audio below.

Big Data’s Empowerment Problem

Catherine D’Ignazio and I just presented a paper titled “Approaches to Big Data Literacy”  at the 2015 Bloomberg Data for Good Exchange 2015.  This is a write-up of the talk we gave to summarize the paper.

When we talk about data science for good, collaborating with organizations that work for the social good, we are immediately entered into a conversation about empowerment.  How can data science help these organizations empower their constituencies and create change in the world?  Catherine and I are educators, and strongly believe learning is about empowerment, so this area naturally appeals to us!  That’s why we wrote this paper for the Bloomberg Data for Good Exchange.

Data Literacy

We’ve been thinking and working a lot on data literacy, and how to help folks build their capacity to work with information to create social change.  We define “data literacy” as the ability to readwork withanalyze and argue with data.  So how do we help build data literacy in creative and fun ways?  One example is the activity we do around text analysis.  We introduce folks to a simple word-couting website and give them lyrics of popular musicians to analyze.  Over the course of half and hour folks poke at the data, looking for stories comparing word usage between artists.  Then they sketch a visual to share a story.

Photos of stories created by students showing the artist that talks about themselves the most, and the overlap in lyrics between Paul Simon and Kanye West.
Photos of stories created by students showing the artist that talks about themselves the most, and the overlap in lyrics between Paul Simon and Kanye West.

Another example are my Data Murals – where we help a community group find a story in their data, collaboratively design a visual to tell that story, and paint it as a community mural.

The Data Mural created by youth from Groundwork Somerville.
The Data Mural created by youth from Groundwork Somerville.

This stuff is fun, and makes learning to work with data accessible.  We focus on working with technical and non-technical audiences.  The technical folks have a lot to learn about how to use data to effect change, while the non-technical folks want to build their skills to use data in support of their mission.

Empowerment

However this work has been focused on small data sets… when we think about “big data literacy” we see some gaps in our definition and our work.  Here are four problems related to empowerment that we see in big data, related to our definition of data literacy:

  • lack of transparency: you can’t read the data if you don’t even know it exists
  • extractive collection: you can’t work with data if it isn’t available
  • technological complexity: you can’t analyze data unless you can overcome the technical challenges of big data
  • control of impact: you can’t argue for change with data unless you can effect that change

With these problems in mind, we decided we needed an expanded definition of “big data literacy”. This includes:

  • identifying when and where data is being collected
  • understanding the algorithmic manipulations
  • weighing the real and potential ethical impacts
Some extensions to define "Big Data Literacy".
Some extensions to our definition of data literacy , to support an idea of “Big Data Literacy”.

So how do we work on building this type of big data literacy?  First off we look to Freire for inspiration.  We could go on for hours about his approach to building literacy in Brazil, but want to focus on his “Population Education”.  That approach was about using literacy to do education and emancipation.  This second piece matters when you are doing data for good; it isn’t just about acquiring technical skills!

Ideas

We want to work with you on how to address this empowerment problem, and have a few ideas of our own that we want to try out.  The paper has seven of these sketched out, but here are three examples.

Idea #1: Participatory Algorithmic Simulations

We want to create examples of participatory simulations for how algorithms function.  Imagine a linear search being demonstrated by lining people up and going from left to right searching for someone named “Anita”.  This would build on the rich tradition of moving your body to mimic and understand how a system functions (called “body syntonicity“).  Participatory algorithmic simulations would focus on understanding algorithmic manipulations.

Ideas #2: Data Journals

Data can bee seen as the traces of the interactions between you and the world around you.  With this definition in mind, in our classes we ask students to keep a journal of every piece of data they create during a 24 hour period (see some examples).  This activity targets identifying when and where data is being collected.  We facilitate a discussion about these journals, asking students which ones creep them out the most, which leads to a great chance to weigh the real and potential ethical implications.

Ideas #3: Reverse Engineering Algorithms:

We’ve seen a bunch of great work recently on reverse engineering algorithms, trying to understand why Amazon suggests certain products to you, or why you only see certain information on your Facebook.  We think there are ways to bring this research to the personal level by designing experiments individuals can run to speculate about how these algorithms work.  Building on Henry Jenkin’s idea of “Civic Imagination”, we could ask people to design how they would want the algorithms to work, and perhaps develop descriptive visual explanations of their own ideas.

Get Involved!

We think each of these three can help build big data literacy and try to address big data’s empowerment problem.  Read the paper for some other ideas.  Do you have other ideas or experiences we can learn from?  We’ll be working on some of these and look forward to collaborating!

Towards a Concept of “Popular Data”

I was recently invited to give a Skype keynote for the first hackathon hosted by the state of Minas Gerais in Brazil.  The talk was a wonderful provocation to revisit the writing of another Brazilian I used to study – Paulo Freire and his vision of popular education.  This led me to wonder… what would a model of “popular data” look like? Answering this requires an agreement that there is a problem, and agreement that the problem merits a popular education approach.  This post is an exploration, so I end by proposing a few grounding principles for a concept of “popular data”.  Is this a useful concept?

The Problem

Governments large and small are speaking of open-data platforms and data-informed decision making.  They share with us a vision of responding to citizen concerns more accurately and efficiently based on data.  These governments are using the language of data.  Data is a language governments are speaking, but most people don’t understand This is the core problem that I address with my Data Therapy project.

speak data?

Can Popular Education Help?

If you don’t speak the language used by your government to make decisions, then you can’t participate in those decisions.  This disempowers people, and popular education is an approach for rectifying disempowering situations.  The city I live in, Somerville, MA, has a a program called “ResiStat” that is intended to 

bring data-driven discussions and decision-making to residents and promote civic engagement via the internet and regular community meetings

This data-centered effort can only engage those that already understand the charts, graphs, and terms they use.  Don’t get me wrong – they don’t deliver a dry academic lecture at their community meetings.  However, they do rapidly run through reams of data analysis with an expectation that most in the audience can handle the information-centered explanation.  This leaves out the many residents who don’t speak data at all.

What is Popular Education?

Philosophical definitions are always debated, but here are a few guiding principles most practitioners of popular education would adhere to:

  • participation from all parties
  • learner guided explorations
  • facilitation over teaching
  • accessibility to a diverse set of learners
  • focus on real problems in the community

If you consider this list a litmus test for governmental data programs, few (if any) would pass.  So how do we change this?

Popular Data?

Now that you’re (hopefully) on board with my problem statement, and the idea that popular education can help, lets play out how. Popular data is my name for engaging, participatory approaches to data-driven presentation and decision-making.  Not a great name, but from an academic point of view it puts my work in the right family tree so I’ll use it for now. How do you structure data programs to practice popular data? Lets run through each of the tenants listed above and look at some examples.

Participation from All Parties

Popular Data suggests a “big tent” approach; you should get everyone at the table.  For instance, far too many open-data initiatives end at the release of the data.  The smart ones realize they are the scaffolding for larger efforts, and make a strong effort to convene non-profits, constituents, and the data makers to the table in order to encourage activity around the data.  Sometimes this looks like a hackathon that makes sure to invite lots of segments of society (ie White House hackathon). Sometimes this looks like a presentation of results back to the people the data is about (ie. Somerville’s ResiStat meetings).  There are lots of ways to involve those in power positions and those outside of them.

Learner Guided Explorations

Most data presentations are about as engaging as a conversation with your dentist! You kind of have to do it, but it’s booooring. Flipping the model invites your audience to find their own stories in the data. My Data Murals work does just that – our initial “story-finding” workshop shares a small portion of the data about a topic and then lets teams of participants find stories they want to tell.  Participants own these stories and advocate for them.  That is an empowerment story – our evaluations show people come away feeling more capable of finding stories in data, and are less intimidated by data in general.

Facilitation Over Teaching

In my Data Therapy workshops I use a number of activities for building visual literacy. All of these are ways to facilitate a discussion of data presentation, and build a shared language for describing data.  When data scientists introduce ideas they too often fall back on big words.  These words alienate those who haven’t studied data.  My first step is to use language a normal person would use.  Then I help the group construct their own language for describing data, which they fully understand.

Accessibility to a Diverse Set of Learners

I spent years designing interactive museum exhibits. Museums are the hardest setting I’ve ever designed for.  At a museum you know nothing about your audience; your object has to support 30 second interactions with a single person, but also 1 hour interactions facilitated by a knowledgeable docent.  This is hard.  Really, really hard.  Data presentations and activities need to be designed the same way. I address this by starting simple, and building to complexity.  In data presentations I do break into small groups and seed each with one person that does speak data to help the other folks understand technical issues.

Focus on Real Problems in the Community

This one is easy! Make the data you are working with or presenting relevant to the communities you are working with. In the workshops I lead in the Boston area, I use the Somerville happiness survey as my silly example data set.  I wouldn’t do that for a group of public health wonks (I’d use something from the WHO).  People are naturally inclined to be engaged about the community they live in – no need to introduce data from some far off community they have no relation to.

Is this Useful?

Ok, so I’ve made my argument – I see every dataset as an opportunity for engagement.  Engagement with the public, the people the data is about, the people whole collected it, everyone. If you’re reading this, it’s up to you to use a Popular Data approach to seize the opportunity for engagement a dataset gives you.  I find this framework useful for structuring my data presentations and workshops.  Let me know what you think!  Am I just naming something obvious? Am I being too academic?

crossposted to my Civic Media blog

The Case for Informal Visualization

Data visualization is all over the place. On the hype curve, we’re clearly up in the area of inflated expectations. If you listen to the reporting, you wouldn’t be blamed for thinking dataviz is going to bring world peace! I’m writing to beat the drum in favor of more informal presentations.  You can tell better data stories, and engage your audience more, by creating less formal data presentations.

Some Examples

What do I mean by “informal visualization”?  To start, toss out your computer, printer and graph paper. Pull our your crayons, big paper, tape, and your imagination.

From top-left, clockwise:

Another example is the Data Mural pilots I’ve been doing with artist, facilitator (and my wife) Emily Bhargava.  We’re leading groups through finding a story in their data, creating a collaborative visual design for a mural, and then painting it! (read more on my Data Therapy blog and Emily’s Connection Lab blog).

Stuff Academics Say

I work at a university, so I have to mention some of the research in this area.  First up – there is a great paper out of the City University of London, called “Sketchy Rendering for Information Visualization“.  Basically, they get a computer to draw graphs as if they had been drawn by hand.  My main takeaway was that their “sketchy” graphs engaged people more than the more “official” looking ones with straight lines.

Secondly, the Data Stories podcast had a recent episode called “Data Sculpture” in which they spoke with people investigating physical data presentations.  If you listen to it, be prepared for a lot of academic jargon – their audience is not the general public.  My main takeaway from the paper referenced (“Evaluating the Efficiency of Physical Visualizations“) was that when people physically touched the 3d objects representing the data they did a better job understanding the data.

It’s Arts & Crafts Time

Beyond these examples, and academic rationale, making informal visualizations is just flat out more fun.  As with most things, I think there is a cultural issue involved here.  Western culture has an inexplicable (to me) emphasis on professionalism and looking like an expert. When I’ve worked in Central America, South America, and India I’ve found the professions more welcoming to informal data presentations like those I show above.  Perhaps this was due to resource constraints, but it almost always led to better sessions.

Whie doing my master’s in the Lifelong Kindergarten group here at the MIT Media Lab, I fully joined the tribe that talks about how making physical things is the best way to communicate your ideas. This “constructionism” approach has feuled all my work since then, and I see this call for informal visualization as a way to bring it to the dataviz world.

So what does this mean in practice?  For me, I’ve taken to doing less on the projector and more on paper.  I encourage community groups I work with in Data Therapy sessions to partner with local artists and schools. I push businesses and organizations to thing about their audience and goals harder before jumping into making data presentation.  (PLUG: come to my “Fight the Bar Chart” meetup here in Boston to learn more about that)

If you want to look like a “sage on the stage”, by all means be as formal as you can.  However, if you want to engage your audience around a data story, try having some art and crafts time before your next data presentaton.

 Cross-posted to the MIT Civic for Civic Media Blog

Audience Literacy

Defining your audience is 90% of what it takes to create an effective data presentation. This is hard to do.  Sometimes there are multiple audiences you’re trying to talk to at the same time.  One of the key ideas that can help you define your audience is to think about their literacy.

I’m using the word in more than it’s typical “I can read well” kind of way.  Here are some questions you should ask yourself:

  • How literate is my audience about the issue I’m presenting?
  • What pictures or graphs are most appropriate for their visual literacy level?
  • How literate is the audience about me and where I’m coming from as a presenter?

The answers to these questions should inform what data presentation technique you pick.  When talking about creative data presentation options, a common comment I hear is that some parts of the audience want to see the “real” data – where real means “numbers in a table.”  To that I say fine, all well and good.  Supply a handout or an appendix that includes the data in tabular form.  That lets you please the traditional numbers people, but doesn’t stop you from engaging the rest of us that get bored by long lists of numbers.

What You Should Do:

Flesh out the definition of your audience(s) by thinking about their literacy. Use different and/or multiple techniques based on their background and knowledge.  Remember that their literacy will increase as you present, so don’t be complexity-phobic.

“Physicalize” Your Data

There are lots of people excited about fancy-pants computer-generated data pictures right now, but I want to remind you that doing things in the physical world can often be more compelling.  Externalizing our ideas into real objects gives us something we can interact with and talk around with other people. Here’s a concrete example.

This photo shows a soda bottle filled up with just the amount of sugar in that drink.  This is a bit of a classic public health example; most people are surprised at the amount of sugar in a soda.  Representing this physically brings home the idea that when you drink the bottle, you’re consuming that amount of sugar.  A bar chart would be far less compelling, and you wouldn’t be able to relate to it.  This is a simple example, but the underlying concept is clear.

What You Should Do:

Consider whether your data can be brought off the page (or screen).  We live in an interactive, three-dimensional, world so you should be creative about bringing your data presentation into it. Surprising your audience with a novel display can engage them long enough for you to tell the rest of your story.

Background Information:

Here’s my standard breakdown of this data presentation:

  • Who – group advocating for healthy eating decisions
  • Goal – inform the audience about the amount of sugar they consume when drinking a bottle (and possible change their behavior)
  • Audience – general public
  • Data – photos of things they would like to change, quotes from patients about their experiences
  • Technique – “physicalize” the data
  • Tools – soda bottle and sugar

Tip of the Iceberg

I think many approaches to psychotherapy are about revealing what lies under the surface, so lets carry on in that tradition… when you think about presenting your data, don’t ignore all the fields of study you are building on – the presentation is just the tip of the iceberg.

Cartography, graphic design, statistics, color theory – you will leverage pieces from all these domains to build your creative data presentations.  Each of these is a discipline on its own, so don’t expect yourself to be in an expert in them all.  Just remember to appreciate all the topics that lie under the surface. Acknowledging them can be helpful when you’re frustrated, because it will remind you that there is a reason this stuff is hard!

Are You Complexity-Phobic?

Many people I work with tell me they’re worried about using something other than a bar chart to visually represent their data, primarily because they think their audience isn’t ready for it.  They are, very reasonably, expressing their concerns about about visual literacy (which I’ll discuss more at another time).  I hope to break down this worry by presenting techniques to work around it. In this post I’ll start by pointing to a website from a company that does another kind of therapy – the online dating site OkCupid.

OkCupid, seeing their data as an asset, used to publish an insightful and entertaining blog called OkTrends. They were trying to come up with dating / relationship advice for people based on their warehouse of dating data. My goal in sharing this example isn’t to help you take more attractive pictures of yourself – but rather to talk about the way they share their complex data. These are very nerdy statistics people, but they present their data in entertaining and informative way.  After reading their blog for a while it became clear to me that they serve as a great example of some of the presentation strategies I like best.  Here are two examples that showcase how they start with something simple and build to something complex.

In a post about lies people tell online they start off with a cartoon-based joke about pretending to be someone you’re not.  Through their explanation they move to a complex, uncommon visualization showing how often men get contacted base on their age and income.

In another post, about what white people actually like, they start with a tag cloud of what people have said they are interested in.  Over the course of a single post they move to a complicated, multi-dimensional graph that correlates religious beliefs to writing proficiency.  Crazy.

What You Should Do:

Don’t worry about having an overly-complex data story.  Start with something simple and fun to get your audience interested, then they’ll be ready for your more complex data presentation once you get to it.