The Just Data Cube (at D4BL II)

We live in a world increasingly governed by algorithms. Sadly these algorithms aren’t creating a more neutral world of equitable outcomes; they are baking in historical injustices long present in society at large (1). Not surprisingly, those who are currently denied power in society are the most at-risk in this data driven future. Their voices should be amplified and set the agenda. The Data for Black Lives(twitter) organization works to bring together one communty of folks working on this — data scientists, organizers, and more.

Over the weekend I was honored to be invited to create an experience for their second conference. After chatting through some ideas with them to understand their goals, we decided to focus on encouraging attendees to share their commitments and their demands for a more just data present (and future). I designed and created the Just Data Cube to do this in a fun way that would also make the otherwise-boring common space a little more interesting!

Attendees add commitments to themselves on the inside of the cube.

The 7-foot cube invited people to write their commitments to themselves on its inside walls. What actions did they want to take to create data-driven justice? Of course, it isn’t just up to us. Organizations, companies, and governemnts are crafting the data-driven survelliance society we live in. So on the outside they were invited to write their demands of those actors. What must the data-powerful do to create a more equitable approach to data.

One attendee’s commitment.

My hope was to create a space for people at the event to share how they approach to working on justice in their work with data. Similarly, in the spirit of founder Yeshimabeit Milner’s open letter to Facebook, I wanted folks to share their demands of those in power that have all the data.

The Just Data Cube was more than simply a wall for graffitti. These commitments and demands inspired other participants that read them as the wall filled up. This qualitative dataset (once we type it up) will feed into other programs and processes D4BL has underway!

For me, the Cube is another attempt to find arts-based invitations to engage with complicated and impactful issues. From my point of view it was a success! People engaged with both the object and the process it invited:

  • they sought it out to sit under;
  • they socialized within it;
  • they contributed their thoughts to it;
  • they read the commitments and demands from other people.

The Cube livened up the otherwise dry conference space with a unique invitation to break the rules — it encouraged you write on the walls! I talked with a lot of the people writing up comments, and got the overall impression that they were both delighted (to have a fun thing to do) and also challenged (the have the prompt be one that forced them to think).

Thanks to the D4BL organizers for giving me a chance to participate, and if you were there I hope you found the cube both fun and inspiring!

(1) To learn more about this see my older post about algorithms as mirrors, the Algorithmic Justice League, Algorithms of Oppression, Weapons of Math Destruction, or Automating Inequality.

Using Data for More than Operations

While at Stanford to talk about “ethical data” I had a chance to read through the latest issue of the Stanford Social Innovation Review within the walls where it is published.  One particular article, Using Data for Action and Impact by Jim Fruchterman, caught my eye.  Jim lays out an argument for using data to streamline operational efficiencies and monitoring and evaluation within non-profit organizations.  This hit one of my pet peeves, so I’m motivated to write a short response arguing for a more expansive approach to thinking about non-profit’s use of data.

This idea that data is confined to operational efficiency creates a missed opportunity for organizations working in the social good sector. When giving talks and running workshops  with non-profits I often argue for three potential uses of data – improving operations, spreading the message, and bringing people together. Jim, who’s work at Benetech I respect greatly, misses an opportunity here to broaden the business case to include the latter two.Data_Architecures_Workshop___SSIR_Data_on_Purpose

Data presents non-profits with an opportunity to engage the people they serve in an empowering and capacity-buiding way, reinforcing their efforts towards improving conditions on whatever issue they work on. Jim’s “data supply chain” presents the data as a product of the organization’s work, to be passed up the funding ladder for consumption at each level. This extractive model needs to be rethought (as Catherine D’Ignazio and I have argued).  The data collected by non-profits can be used to bring the audiences they serve together to collaboratively improve their programs and outcomes.  Think, for example, about the potential impacts for the Riders for Health organization he discusses if they brought drivers together to analyze the data about their routes and distances.  I wonder about the potential impacts of empowering the drivers to analyze the data themselves and take ownership of the conclusions.

Skeptical that you could bring people with low data literacy together to analyze data and find a story in it?  That is precisely a problem I’ve been working on with my Data Mural work. We have a process, scaffolded by many hands-on activities, that leads a collaborative groups through analyzing some data to find a story they want to tell, designing a visual to tell that data-driven story, and paint it as a mural.  We’ve worked with people around the world to do this.  Picking it apart leaves us with a growing toolkit of activities being used by people around the world.

Still skeptical that you can bring people together around data in rural, uneducated settings? My colleague Anushka Shah recently shared with me the amazing work of Praxis India. They’ve brought people together in various settings to analyze data in sophisticated ways that make sense because they rely on physical mappings to represent the data.

Charting crop production and rainfall trends over time.
Yes, that looks like a radar chart to me too.

These examples illustrate that the social good non-profits can deliver with data is not constrained to operational efficiencies.  We need to highlight these types of examples to move away from a story about data and monitoring, to one about data and empowerment.  In particular, thought leaders like SSIR and Jim Fruchterman should push for a broader set of examples of how data can be used in line with the social good mission of non-profits around the world.

Cross-posted to the blog.

Talking Visualization Literacy at RDFViz

Just yesterday at I was in a room of amazing friends, new and old, talking about what responsible data visualization might be.  Organizing by the Engine Room as part of their series of Responsible Data Forums (RDF), this #RDFViz event brought  together 30 data scientists, community activists, designers, artists and visualization experts to tease apart a plan of action for creating norms for a responsible practice of data visualization.

Here’s a write up of how we tackled that in the small group I led about what that means when building visual literacy.

Building Literacy for Responsible Visualization

Scan_Jan_15_pdf__page_1_of_5_I’ve written a bunch about data literacy and the variety of ways I try to build it with community groups, but we received strict instructions to focus this conversation on visualization.  That was hard!  So we started off by making sure we understood the audiences we were talking about  – people who make visualizations and people who see/read them.  So many ways to think about this… so many questions we could address… we were lost for a bit about where to even start!

We decided to pick four guiding questions to propose to ourselves and all of you, and then answer them by sketching about quick suggestions for things that might help.

  • How can visual literacy for data be measured?
  • How can existing resources for data visualization read the growing non-technical data visualization producers?
  • How can we teach readers to look at data visualization more critically?
  • How can we help data visualization producers to design more appropriately for their audiences?

A difficult set of questions, but our group of four dove into them unafraid!  Here’s a quick run-down on each.  For the record, I only worked on two of these, so I hope I do justice to the other two I didn’t directly dig into.

Measuring Visual Literacy


This is a tricky task, fraught with cultural assumptions.  We began by defining it down to the dominant visual form for representing data – namely classic charts and graphs.  This simplified the question a little, but of course buys into power dynamics and all that stuff that comes along with it.

Our idea was to create an interactive survey/game that asks people to read and reason about visualizations.  Of course this draws on a lot of existing research into visual- and data-literacy, but in that body of work we don’t have an agreed-upon set of questions to assess this.  So we came up with the following topics, and example questions as a thing to think about.

  1. Can you read it?  This topic tried to address the question of basic visual comprehension of classic charting.  The example question would show something like a bar chart and ask “What is the highest value?”.
  2. What would you do? This topic digs into making reasoned judgements about personal decisions based on information show in a visual form.  The example question is a line chart showing vaccination rates over time going down and people getting measles going up; asking “Would you vaccinate your children?”.
  3. What can you tell? Another topic to address is making judgements about whether data shows a pattern or not.  The example question would show a statement like “Police kill women more than men – true or false?” and the answers could be “true”, “false” and “can’t tell”.
  4. What’s the message? More complex combinations of charts and graphs are often trying to deliver a message to the reader.  Here we could show a small infographic that documents corruption somewhere.  Then we’d ask “What is the message on this graphic?” with possible answers of “corruption is rampant”, “corruption happens” and “public funds are too high”.

There are just four topics, and we know there are more.  We’re excited about this survey, and hope to find time and funds to review existing surveys that assess various types of literacies so we can build a good tool to help people measure these types of literacies in various communities!

Choosing the Right Visualization for Your Audience

Scan_Jan_15_pdf__page_2_of_5_.pngWe have a vast, and growing array of visualization techniques available to us, but few guidelines on how to use them appropriately for different audiences.  This is problematic, and a responsible version of data visualization should respect where and audience is coming from and their visual literacy.  With that in mind, we propose to create a library of case studies where each one creates different visualizations from the same dataset, making the same argument, for different audiences.

For example, we sketched out ways to argue that police violence is endemic in the US, based on a theoretical dataset that captures all police-related killings.  For a low visual literacy individual (maybe a 10-year old kid) you could start by showing a face of one victim, and then zoom out to a grid of all the victims to show scale of the problem while still humanizing it. For the medium literacy audience (those that watch the evening news each night on tv), you could show a line chart of killings by year.  For a high literacy audience (reading the New York Times) you could do an interactive map that shows killings around the reader’s location as they compare to nation-wide trends.

You could imagine a library of many of these, which we think would help people think about what is appropriate for various audiences.  I’m excited to assign this to students in my Data Storytelling Studio course as an assignment!

Learning to Read A Data Visualization

Scan_Jan_15_pdf__page_4_of_5_.pngOur idea here was to create a quick how-to guide that lists things you should ask when reading a data visualization.  Imagine a listicle called “15 Things to Check in any Data Visualization”!  The problem here is that people aren’t being introduced to the critical techniques for reading visualization, to identify when one is being irresponsible.

Some things that might on this list include:

  • Is the data source identified?
  • Are the axes labelled correctly?
  • What is the level of aggregation?

This list could expose some of the common techniques for creating misleading visualizations.  Next steps?  We’d like to crowd source the completion of the list to make sure we don’t miss any important ideas.

Helping Non-Experts Learn to Make Data Visualizations

Scan_Jan_15_pdf__page_5_of_5_.pngThis is a huge problem.  The hype around data visualization continues to grow, and more and more tools are being created to help non-experts make them.  Unfortunately, the materials we use to help these newcomers into the field haven’t kept pace with the huge rise in interest!

We proposed to address this by better defining what these new audience need to know.  They include:

  • human rights organizations
  • community groups
  • social movements

And more!  A brief brainstorm resulted in this list of things they are trying to learn:

  • how to select the right data to visualize?
  • what types of charts are best suited to understand what types of data?
  • what cultural assumptions are reflected in what types of dataviz?
  • how do design decisions (eg. color) impact on how readers will understand your data visualization?

This is just a preliminary list of course.

Rounding it Up

Problem solved!

Just kidding… we have a lot of work to do if we want to build a responsible approach to literacies about data visualization. These four suggestions from our small working group at the RDFViz event are just that – suggestions. However, the space to approach this from a responsible point of view, and the conversations and disagreements were invaluable!


Many thanks to the organizers and funders, including our facilitator Mushon Zer-Aviv, our organizers at the Engine Room, our hosts at ThoughtWorks, Data & Society and Data-Pop Alliance, and our sponsors at Open Society Foundations and Tableau Foundation.  This is cross-posted to the MIT Center for Civic Media website.

Civic Visualization: Student Sketches

I just wrapped up teaching a 3-week, 5 session module for MIT undergraduates on Data Scraping and Civic Visualizations (read more posts about it).  As their final project I asked students to use some Boston-centric data to sketch a civic visualization.  Here’s a quick overview of their final projects, which I think are a wonderful example of the diversity of things folks can produce in a short amount of time.  Remember, these are sketches the students produced as their final projects… consider them prototypes and works-in-progress.  I think you’ll agree they did amazing work in such a short amount of time!

1.5 Million Hubway Trips


Ben Eysenbach and Yunjie Li dove into the Hubway bicycle sharing data release.  They wanted to understand how people perceive biking and help planners and bike riders make smart decisions to support the urban biking system. Ben and Yunjie found that Google bicycle time estimates are significantly off for female riders, and built some novel warped maps to show distances as-the-bike-rides across the city.  See more on their project website.

The Democratic Debate on Tumblr


Alyssa Smith, Claire Zhang, and Karliegh Moore collected and analyzed Tumbler posts about the first 2015 Democratic presidential debate.  They wanted to help campaigns understand how to use Tumblr as a social media platform, and delve into how tags are used as comments vs. classification.  Alyssa, Claire and Karliegh found Bernie Sanders, Hillary Clinton, and Donald Trump were the most discussed, with a heavy negative light on Trump.

Crime and Income Rates in Boston


Arinze Okeke, Benjamin Reynolds and Christopher Rogers explored data sets about crime and income in Boston from the city’s open data portal and the US Census.  They wanted to motivate people to think harder about income disparity and inform political debate to change policies to lower crime rates.  Arinze, Ben and Chris created a novel map and data sculpture to use as a discussion piece in a real-world setting, stacking pennies to represent income rates on top of a printed heatmap of crime data.

Should Our Children Be in Jail?


Andres Nater, Janelle Wellons and Lily Westort dug into data about children in actual prisons.  They wanted to argue to people that juveniles are being placed in prisons at an alarming rate in many states in the US.  Andres, Janelle and Lily created an inforgraphic that told a strong story about the impact of the cradle-to-prison pipeline.