265519 New frontiers in data analytics: How big data technologies enable novel applications in public health

Monday, October 29, 2012 : 10:30 AM - 10:50 AM

Matthew Dollacker, PMP , CSC, Atlanta, GA
Relational databases, one of the foundational technologies of the information revolution, are beginning to give way to new approaches to store, manage and analyze data. These new ‘Big Data' technologies are enabling use-cases that were difficult to impossible to achieve and extremely costly to maintain only a few years ago. Centered around the concept of linear horizontal scalability on commodity compute hardware, these database and information analysis systems build upon approaches formalized in part by Google and Amazon. Within the past few years, new open source tools (Hadoop, MongoDB, Mahout) and commercial services (EC2, DynamoDB) have emerged that provide the ability to leverage these innovations simply and cheaply, revolutionizing the way industry and academia work with data.

These new approaches require new thinking in how to structure and analyze data, but have the ability to open new frontiers for the discipline of public health informatics. These frontiers revolve around three main dimensions: 1) the ability to efficiently store and query immense data sets of terabytes or even petabytes in size, 2) the ability to perform computationally intense algorithms on this data, including artificial intelligence and machine learning techniques such as cluster analysis, pattern mining, and Bayesian analysis, and 3) the ability to more easily combine, compare and share extremely large data sets. This last dimension in particular allows for dramatic shifts in the types of analysis and applications public health institutions can perform, freeing large data sets from the confines of the institutional firewall where they can be more readily combined and ‘mashed-up' with other large data sets.

This talk will focus on exploring the important considerations in using these technologies for novel applications in the public health arena.

Learning Areas:
Administration, management, leadership
Biostatistics, economics
Communication and informatics
Systems thinking models (conceptual and theoretical models), applications related to public health

Learning Objectives:
1. List the three frontiers of data analysis enabled through Big Data technologies 2. Describe the fundamentals of the MapReduce distributed computing pattern 3. Discuss public health informatics use-cases enabled through Big Data technologies 4. Describe the differences between relational and NoSQL database technologies

Keywords: Data/Surveillance, Data Collection

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I have been the principal investigator for research into distributed data analytics frameworks geared to the healthcare arena. In addition, I have led four major business intelligence and data warehousing programs for major healthcare and public health institutions. As a key technology and strategy leader in CSC's Federal Health practice, I have advised clients on the applicability of Cloud and Big Data strategies to their businesses and mission areas.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.