UN Data Forum: Data and Algorithm (Live Blog)

This is a liveblog written by Rahul Bhargava at the 2017 UN World Data Forum.  This serves as a summary of what the speakers spoke about, not an exact recording.  With that in mind, any errors or omissions are likely my fault, not the speakers.

Capturing the 21st Century through Data and Algorithm

Dan Runde shares some guiding questions for the panel: Why do we measure stuff?  Do we have the tools to measure the right things? How do we handle changes in technology and methdology?  What about private data? What’s trustable?

Ola and Hans Rosling – President and Co-Founder of Gap Minder

Ola runs the educational non-profit Gap Minder. He begins with a live audience poll to check some facts. They have been asking these fact-based questions across the world. In different places people respond differently. For instance, on average women have far more schooling than people in Sweden, the US, and at a TED event think.  South Africa actually was closest to the real data. They call this the “ignorance project”.  They bring in Hans Rosling

Hans explains that just being famous wasn’t enough to change people’s beliefs. It turns out the big CEOs know the world best. Those that deal with big money have stronger instincts of learning how the world really is. This was shocking. There is no way to communicate the SDGs if we don’t measure the impacts of our communication.  Most women have access to contraceptives.  Most children receive the basic vaccines. The data statistical bureaus generate is to generate investment and GDP growth, not for just political decision making. We have to broaden who is intended to use this data. Media is a bad way to change their world view; they have to be taught it in school.

Pali Lehohla – Statistics South Africa

Minister Lehohla is the Statistician General for Statistics South Africa. He will connect migration, death, and longevity in South Africa.  He shares an interactive map of migration across the provinces.  He shows paths such as the Indians who worked at sugar plantations in the south east, moving to Gauteng. The white population makes money in Gauteng, and then moves to the Western Cape to enjoy their money.  These connect to the death rates in each of these provinces; for instance they are lower in Gauteng.  Death is exported from there.  Death rates are a function of how society is organized.

flvikk1l

Minister Lehohla walks through a Gap Minder chart of South African life expectancy. In 2008 or 2009 life expentency in South Africa rose very quickly, though income per person was flat.  In Gauteng and the Western Cape people live longer. You must avoid Free State because you’ll die younger.

Switching to child mortality, Minister Lehohla argues for geographic breakdowns of data to understand it better. In this animation after 2004 a lot of the data dissapears.  This is because municipalities changes, so they can’t compare the data well.  These political decisions cause statistical problems.

Talking about complexity is the task of statisticians. You have to project value-add.  Putting it in a narrative and explaining it is the task of the chief statistician of a country.  We have to organize ourselves in a way that helps us measure the SDGs.

Emmanuel Letouze – DataPop alliance

Manu is the director of the DataPop Alliance.  Manu will talk about statistical measurement and societal development in the age of data abundance and algorithmic analysis. There are number of rationales for measuring things. We think that measuring something means we care about it, and can have an effect on it. Is better data really the problem?

Manu doesn’t really measure his two children directly.  Even when you care deeply about something, it doesn’t mean you measure it. This is an important caveat in the theory of measurement. GDP was invented in the 1930s as a measurement of production.  This is a good example of something you measure because you want to change it. There are negative consequences to this of course. This was invented in an industrial, data-poor era. In the age of algorithm this makes little sense. For instance, GDP doesn’t capture the consumption of free data.

Now we know we need to measure other things. With data like hundreds of millions of credit card transactions you can identify cultures of people who behave similarly (ie. tribes). Manu believes in open algorithms to get around the worry of leaked data.  The OPAL architecture is an attempt to send open algorithms to operate on private sector data.

The outcomes and processes of measurement have to be more meaningful in this day and age.

Anne Jellema – World Wide Web Foundation

Ann is the director of the World Wide Web Foundation.  Gap Minder’s Ignorance Project shows just how disconnected people are from official data.   This can lead to apathy, distrust, resentment. For instance, people overestimate vastly the number of refugees that have entered their countries. It can lead to denial, like in South Africa in relation to the AIDS epidemic.  For instance, one of the outcomes of this conference is to include women’s unpaid care work in counts.  This will value women’s contributions in policy decisions.  Another example is including data on climate change.

Date can help improve people’s lives and improve the SDGs. The experience at the WWW Foundation shows that the benefits are far greater when people participate.  When they are involved in designing, collecting, and using data.  A project in Ivory Coast, with UN, Data2x, and Millennium Foundation showed this. They worked cross-sector to use data to tackle the real problems facing women there.  They not only used existing data, but found gaps in the data that would help if filled and openly available. For example, if clinic and hospitals could share information about shortages they could shift pregnant women to places where resources would be available, so pregnant women wouldn’t be turned away.  In the process of sharing and discussing data trust was built between government and NGO groups.

These are example of how CSOs can engage with government with data to solve problems and meet the SDG goals.  Unfortunately, the collection of data has been monopolized by the state, with no participation. The chief reason is accountability.  Technology allows a shift towards more participatory techniques.  However, the rise of big data could make this worse – Manu’s “elite capture”. The majority of data capture is controlled by the private sector now. This is our data, but it belongs to the companies now, and they are not accountable. This is a challenge we need to confront.

We have to open government data to a data commons.  Only 10% of non personally identifiable government data is fully open (source). The numbers are similarly low for sector-soecific basic data (health and education, environment, etc).  Government spending data is one of the least-open in the world. A lot is abilalbe online, but little of it is “fully open”.

In the US many civic decisions are being left to algorithms now.  We need to be able to interrogate and challenge thse, just as we can for standard governmental statistics.  This is critical for informed citizenship.

What does trust mean?

Manu: This is trust within society between different groups.  Another is the trust you build as you engage in data collection processes.  This is a strong rationale for national statistics.  Third is the trust in statistics themselves; in the outcomes. This allows a democratic debate about a shared agreement.

Pali: Trust is about integrity. Trust is also about justice.  We know we are fallible.  In the statistics community we are too gentle with each other. We need to confront our failures.  That is what builds trust.

Ola: Trust is a feeling, and emotion. I trust Pali, but I’m not sure why. This is also called confidence.  The over-confidence in this room is enormous.  We trust ourselves, even when we shouldn’t.  I know this because, as a white Caucasian male I speak to others and we trust each other.  This group just performed worse on my quiz than chimpanzees.

Anne: The latest Edelman global trust barometer indicate there is a implosion. This is at an all time low.  We have to hold ourselves responsible for starting to restore some of that.  We just saw the damage this can do.  So how do we rebuild trust.  One thing we learn form the open-source community is that the more people can be involved in interrogating something, the greater their trust.   This is the opposite of how statisticians think about process.  We should welcome contributions from others.

If you had a magic wand, what would you want to measure?

Manu: It is a matter of finding out what people care about. We don’t have good processes for this.  This matters as much, or more, than the outcome itself.

Pali: Public opinion is very flimsy, but it counts. It reflects inner-being and skepticism.  We need to understand this. In the last local government election in South Africa they measured physical things. When asked for opinions of satisfaction, they showed deep levels of dissatisfaction, out of line with the growth in physical things.

Ola: Knowledge. We’re not measuring the impact of our communication.  Asking voters how to do it is giving up our responsibility.  Measure yourself and your staff, and what you know. The activist score worse than anyone else in their own fields.  They exaggerate their world view of the problem. In the US 5% got a question about the extreme poverty rate of the world. They didn’t know it was decreasing. We need to point our fingers at ourselves first.

Anne: Gender data is vitally important.  Secondly I’d ask for joining up the existing data we already have.  This is how you unlock the power of data. This is a therapy session for us to confess our mistakes.

Q&A

(Missed it, sorry)