A scalable platform for meta-genomic analysis

Edlund, Stefan

334559
A scalable platform for meta-genomic analysis

Tuesday, November 3, 2015

Stefan Edlund, Public Health Research, IBM Almaden Research Center, San Jose, CA

David Chambliss, PhD, IBM Almaden Research Center, San Jose, CA

James Kaufman, PhD, Public Health Research, IBM Almaden Research Center, San Jose, CA

Dylan Storey, PhD, School of Veterinary Medicine, Davis, CA

Bart Weimer, PhD, School of Veterinary Medicine UC Davis, Davis, CA

In support of a new consortium to enhance food safety, IBM Research has developed a software platform, the Meta-genomics Compute and Analytics Workbench (MCAW), to perform rapid and scalable informatics. The system objective is to transform large volumes of raw DNA and RNA sequence data quickly into valid conclusions—with the ultimate goal that potential threats of food-borne illness are detected before they impact public health. MCAW is being used to establish the baseline microbiome of active organisms in samples of food ingredients, to build and share metagenomic libraries from such samples, and ultimately to detect ingredients that fall outside the normal range.

A MCAW workflow runs informatics programs such as de novo genome assembly on a dataset comprising data files (e.g., raw DNA sequence for a single sample) and laboratory metadata. MCAW automatically parallelizes across cluster nodes, and generates aggregate statistics and visualization of data from multiple samples, to aid in trend and factor analysis with respect to time, temperature, and biochemical metadata of samples. Processing history is fully recorded so the provenance of results is clear and processing can be reproduced.

We will show MCAW results from assembling and cataloguing genomes for 4,100 Salmonella entericaisolates sequenced in the 100K Foodborne Pathogen Genome Project[1]. The number of distinct proteins appears to grow as a power (α = 0.465) of the number of isolates, which affirms the importance of integrating a large number of samples in building a pan-genome.

[1] http://100kgenome.vetmed.ucdavis.edu/

Learning Areas:

Public health biology
Public health or related research

Learning Objectives:
Explain how genomics measurements can reveal threats to food safety. List the functional requirements for an online metagenomics analysis service. Compare the scalability and efficiency of manually driven informatics with that achieved using an automated framework.

Keyword(s): Genetics, Food Safety

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am a senior software engineer in the Public Health Research team at IBM Almaden Research Center, CA. My current research interests include building scalable platforms for high performance bioinformatics algorithms, as well as epidemiology and modeling of infectious diseases. Stefan has a MS degree in computer science from the Royal Institute of Technology in Stockholm.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.

Back to: 4373.0: Posters on Understanding Health Disparities in Transgender Populations, mHealth Technology, Chronic Disease Registry, Medicare Informaion, and More

2015 APHA Annual Meeting & Expo

Online Program

334559
A scalable platform for meta-genomic analysis

2015 APHA Annual Meeting & Expo

Online Program

334559 A scalable platform for meta-genomic analysis

334559
A scalable platform for meta-genomic analysis