A scalable platform for meta-genomic analysis
A MCAW workflow runs informatics programs such as de novo genome assembly on a dataset comprising data files (e.g., raw DNA sequence for a single sample) and laboratory metadata. MCAW automatically parallelizes across cluster nodes, and generates aggregate statistics and visualization of data from multiple samples, to aid in trend and factor analysis with respect to time, temperature, and biochemical metadata of samples. Processing history is fully recorded so the provenance of results is clear and processing can be reproduced.
We will show MCAW results from assembling and cataloguing genomes for 4,100 Salmonella entericaisolates sequenced in the 100K Foodborne Pathogen Genome Project. The number of distinct proteins appears to grow as a power (α = 0.465) of the number of isolates, which affirms the importance of integrating a large number of samples in building a pan-genome.
Learning Areas:Public health biology
Public health or related research
Explain how genomics measurements can reveal threats to food safety. List the functional requirements for an online metagenomics analysis service. Compare the scalability and efficiency of manually driven informatics with that achieved using an automated framework.
Keyword(s): Genetics, Food Safety
Qualified on the content I am responsible for because: I am a senior software engineer in the Public Health Research team at IBM Almaden Research Center, CA. My current research interests include building scalable platforms for high performance bioinformatics algorithms, as well as epidemiology and modeling of infectious diseases. Stefan has a MS degree in computer science from the Royal Institute of Technology in Stockholm.
Any relevant financial relationships? No
I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.