Population Health Data Science (PHDS)

Population health data science (PHDS) is the art and science of transforming data into actionable knowledge to improve health. Actionable knowledge is knowledge that influences, informs, or optimizes decision making. PHDS supports decision quality.

PHDS has five analytic domains (see Figure): (1) description: measuring the burden of risk factors and outcomes; (2) prediction: early targeting of prevention and response strategies; (3) explanation: testing causal pathways for designing prevention strategies, and discovering and testing new causal pathways; (4) simulation: modeling processes for epidemiologic and decision insights; and (5) optimization: optimizing decision-making, priority-setting, and resource allocation.

Figure 1: The population health data science landscape (source: http://www.bayesia.com.)

PDHS is a rapidly growing field that emerged from solving public health problems. In public health practice, we need to influence, guide, and advise decision makers in a relevant and timely way. Decision makers include patients, providers, policy makers, colleagues, and community stakeholders. When possible, timeliness should be in real time. Peer-reviewed scientific publications are often ineffective and too slow. The bottom line challenge is this: the transformation of data into actionable knowledge means improving decision-making in the setting of complex envi- ronments, uncertainty, limited information, multiple objectives, competing trade-offs, and time constraints.

PHDS integrates the expertise from public health, epidemiology, medicine, statistics, computer science, decision sciences, health and behavioral economics, and human-centered design. PHDS is the future of public health data analysis and synthesis, and knowledge integration. Knowledge integration is the man- agement, synthesis, and translation of knowledge into decision support systems to improve policy, practice, and—ultimately—population health.

Figure 1 summarizes the data science landscape. The general idea is to design human-centered decision support systems and practices to improve and optimize decision-making from community residents to policy makers. Examples of PHDS approaches familiar to public health include: (a) health impact assessment, (b) decision analysis, (c) cost-effectiveness analysis, and (d) cost-benefit analysis. Less familiar to public health include the following: (a) operations research, (b) Bayesian networks, (c) machining learning, and (d) artificial intelligence. “Big data” are the availability of huge data systems with multi-dimensional, longitudinal data on individuals and their environments that enable us (through computer algorithms) to describe, predict, explain, and optimize the human experience—primarily by influencing human choices (decisions), by targeting public health interventions, and by conducting causal research.

Biostatistics and epidemiology, the quantitative sciences of public health, are essential components of PHDS. Epidemiologists deploy causal inference, risk assessments, and decision analysis; and join data science teams. Biostatisticians contribute through statistical learning methods and research.

PDHS with R (book): Transforming data into actionable knowledge

I am writing this book to introduce R—a programming language and environment for statistical computing and graphics—to public health epidemiologists and health care analysts conducting population health analyses. Recent graduates come prepared with a solid foundation in epidemiological and statistical concepts and skills. However, what is sometimes lacking is the ability to implement new methods and approaches they did not learn in school. This is more apparent today with the emergence of data science and the new field of population health data science (PHDS)—the art and science of transforming data into actionable knowledge to improve health.