CCIB Seminar Series presents Dr. Tony Hu, Drexel University

October 25, 2016 @ 12:20pm in the Science Lecture Hall

Dr. Xiaohua Tony Hu is a Professor and the founding Director of the Data Mining and Bioinformatics Lab at Drexel University.

Xiaohua Tony Hu is also serving as the founding  Co-Director of the NSF Center (I/U CRC) on Visual and Decision Informatics (NSF CVDI), IEEE Computer Society Bioinformatics and Biomedicine Steering Committee Chair, and IEEE Computer Society Big Data Steering Committee Chair.  Tony is a scientist, teacher and entrepreneur. He joined Drexel University in 2002.  He founded the International Journal of Data Mining and Bioinformatics (SCI indexed) in 2006. Earlier, he worked as a research scientist in the world-leading R&D centers such as Nortel Research Center, and Verizon Lab (the former GTE labs). In 2001, he founded the DMW Software in Silicon Valley, California. He has a lot of experience and expertise to convert original ideas into research prototypes, and eventually into commercial products, many of his research ideas have been integrated into commercial products and applications in data mining fraud detection, database marketing.

Tony’s current research interests are in data/text/web mining, big data, bioinformatics, information retrieval social network analysis, healthcare informatics. He has published more than 270 peer-reviewed research papers in various journals, conferences and books His research projects are funded by the National Science Foundation (NSF), US Dept. of Education, the PA Dept. of Health.  He has obtained more than US$8.0 million research grants in the past 10 years as PI or Co-PI, and  has graduated 17 Ph.D. students from 2006 to 2016 and is currently supervising 10 Ph.D. students.

 For more information about Dr. Hu you can visit his webpage at:

Dr. Hu will be speaking on the following topic:

Title: Big Data Analysis for Microbiome Data

Abstract:  We know little about the microbial world. Microbiome sequencing (i.e. metagenome, 16s rRNA) extracts DNA directly from a microbial environment without culturing  any species. Recently, huge amount of data are generated from many micorbiome projects such as Human Microbiome Project (HMP), Metagenomics of the Human Intestinal Tract (MetaHIT), et al. Analyzing these data will help us to better understand the function and structure of microbial community of human body, earth and other environmental eco-systems. However, the huge data volume, the complexity of microbial community and the intricate data properties have created a lot of opportunities and challenges for data analysis and mining.  For example, it is estimate that in the microbial eco-system of human gut, there are about 1000 kinds of bacteria with 10 billion bacteria and more than 4 million genes in more than 6000 orthologous gene family. The challenges are due to the complex properties of microbiome: large-scale, complicated, diversity, correlation, composition, hierarchy, incompleteness etc. Current microbiomes data analysis methods seldom consider these data properties and often make some assumptions such as linear, Euclidean space, metric-space, continue data type, which conflict with the true data properties. For example, some similarities are non-metric because the prevalent existence of some species; and the interactions among species and environment are complex in high order. Thus it is urgent to develop novel computational methods to overcome these assumptions and consider the microbiome data properties in the analysis procedure. In this talk, we will some computational methods to analyze and visualize microbiome big data. Our studies are focusing on the following 4 tasks: 1) novel machine learning and computational technologies for dimension reduction and visualization of microbiome data based on non-Euclidean spaces (manifold learning) to discover nonlinear intrinsic features and patterns in these data to overcome the linear assumptions, 2) new probabilistic models and non-metric visualization methods to discover signatures and components in microbiome to overcome the difficulties of analyzing compositional data, 3) novel statistical methods for variable selection in microbiome data by integrating group information among variables, 4) novel nonlinear methods for network reconstruction for microbial co-occurrence data and time series data to complicate microbial interactions.