Data for Change
Five years have passed since former President Barack Obama has introduced Dr. DJ Patil as the first ever Chief Data Scientist within the Federal Government and called out to data scientists to join the efforts carried out for changing the country and the world for the better (keynote video).
But how can the world really be changed for the better through Data Science?
Most frequently, I found examples regarding how its methods and techniques can be helpful in the business environment. However, as my background is in evaluating public policies and programs, I wanted to learn more about how Data Science can be used in determining the impact of a program or policy, or in designing evidence-based public policies.
So I embarked on a search mission to find out what resources are out there, how do real use cases look like and how is Data Science being used for social good. Here is what I found, in terms of resources, use case examples, and opportunities, for those who wish to improve their skills in the field and work for greater social impact.
The Center for Data Science and Public Policy from the Carnegie Mellon University: On their website, you can find examples of projects in which data science was used with big impact in social science and public policy. And the fields in which they have managed to bring contributions are: criminal justice, education, economic and workforce development, energy, environment, public health, transportation and infrastructure, public safety. Some real use case examples are: early intervention system for preventing adverse police incidents, preventing housing violations in San Jose, reducing water shutoffs through behavioral and data analysis, reducing HIV infections and improving engagement in HIV medical care etc. Each project is described in detail on their website, so that they can serve as great use cases for those who wish to get inspired.
What's even greater about them is that, each year, they organize The Data Science for Social Good Summer Fellowships, through which they bring together aspiring data scientists from across the world at Carnegie Mellon University to work for 3 months on data science projects in partnership with nonprofits and government agencies and learn from experienced mentors and project managers. During the program duration, all fellows are paid and housing is provided to them. For registration, follow their website, as their registration process takes place at the beginning of each year.
The successful program was replicated also by the Alan Turing Institute, the UK's national institute for data science and AI. It seems that, this year, due to COVID-19, DSSGx is taking place remotely.
The Alan Turing Institute's website and blog also provide interesting and useful materials related to data science applications for social impact. And, starting from this year, they have begun implementing a new internship program — Turing Internship Network, through which they pair industry with doctoral students. Unfortunately, these internships are addressed only to those who hold the right to work in the UK.
DataKind, which leading statements are 'Harnessing the power of data science in the service of humanity' and '…use data to not only make better decisions about what kind of movie we want to see, but what kind of world we want to see..' : smile. If you are new to the field and wish to gain experience , you can apply for their volunteering program. And in the case of more experienced data scientists, volunteering can serve as a way of giving back to society and doing meaningful work. One advantage in their case is that, besides their headquarter from New York, they have Chapters in Bangalore, San Francisco, Singapore, UK, and Washington DC. Among their advisors is also DJ Patil, and many other prominent data scientists.
DataforPolicy.org, a non-profit organization from the UK that describes itself as a global forum for interdisciplinary and cross-sector discussions around the impact and potentials of the digital revolution in the government sector. They organize annual conferences and the presented papers are published open access in a community on the Zenodo platform. Some examples of topics covered are: data-driven urban systems for sustainable smart city development; reducing corruption in public procurement using machine learning; fraud detection models; how to machine-extract archival data, etc.
Microsoft Research Data Science Summer School — I am adding this opportunity last because, even though it is exceptional, it only addresses college students from the New York City area. The Summer School consists of an intensive, four-week hands-on introduction to data science, and the selected students also receive a stipend for participating.
Evaluation and Data Science
Agriculture, health, and education are among the sectors in which big data has had the most impact.
For those interested in evaluation of public policies and program, a good read is the one published this year by Pete York, Chief Data Scientist at BCT Partners, and Michael Bamberger, Development Evaluation Advisor, which can be found freely online, as its publishing was supported by the Rockefeller Foundation. What is nice about is that this report takes a closer look at how big data and data science can serve in impact evaluation, at methodological aspects, the necessary conditions for integrating data science and evaluation, areas of convergence, as same as points of disagreement. It also includes a case study of how the performance of a child welfare program was improved by using machine learning, predictive analytics, and other big data techniques.
Moreover, a very useful article I have came across while researching on how data science can be of use in the public sector, was the one written by Alex Engler on 'What all policy analysts need to know about data science'. Alex is a Fellow at the Brookings Institute who studies the implications of AI and emerging data technologies on society and governance. And I have found his work worth following.
Sources of Government Data
Let's talk now about sources of data, in case you want to check them out and even try your hand on them.
The great news is that, more and more, governments are opening their data, and offering free access for anyone to download and use it.
Below, we can see a graph published by the OECD showing us how countries are doing in terms of data availability, data accessibility, and government support data re-use. It comes at no surprise that Korea is leading the way, as it has showed the world even recently how it can use Big Data and AI in fighting COVID-19 (a good description is given in the article here, published on TDS).
- US Government's Open Data (over 211 609 datasets): https://www.data.gov/
- EU Open Data Portal(15 399 datasets): https://data.europa.eu/euodp/en/data/
- European Data Portal (with metadata harvested for 1 076 894 datasets from 36 countries): https://www.europeandataportal.eu/en
- Open Government Data Korea: https://open.go.kr/
These are only some examples. If you wish to see which countries have adhered to the Open Government Partnership, you can check their list of members.
If you know of any other useful resources and examples, please write me in a comment, and I will add them to the list.