Advances in networks, sensors, storage, computing, and high throughput data acquisition, have led to a proliferation of autonomous, distributed data sources in many areas of human activity. New discoveries in biological, physical, and social sciences and engineering are being driven by our ability to discover, share, integrate and analyze disparate types of data. Statistically-based machine learning algorithms offer some of the most cost-effective approaches to discovery of experimentally testable predictive models and hypotheses from data. However, the large size, distributed nature, and autonomy of the data sources (and the attendant differences in access, queries allowed, processing capabilities, structure, organization, and underlying data models and data semantics) present hurdles to effective utilization of machine learning. This research aims to overcome these hurdles by developing efficient, resource-aware distributed algorithms and software services to support collaborative, integrative knowledge acquisition such a setting. The research team will implement, deploy, and evaluate the resulting algorithms using benchmark data sets, associated data models and ontologies, and user-specified inter-ontology mappings on a distributed test-bed of networked databases and services at Iowa State University and Kansas State University. The resulting open-source software can potentially transform collaborative e-science in the same way that Web has transformed information sharing. Broader impacts of this research include enhanced opportunities for research-based training of graduate and undergraduate students, interdisciplinary collaborations, participation of under-represented groups, and development of increasingly sophisticated software to support collaborative, integrative e-science. The ISU project web site (http://www.cild.iastate.edu/projects/indus.html) together with the KSU web site (http://people.cis.ksu.edu/~dcaragea/mlb/doku.php?id=indus) provide access to information about the project, benchmark data, publications, software, and documentation.
Research Grant #0711356 - Collaborative Research: Learning Classifiers from Autonomous, Semantically Heterogeneous, Distributed Data, National Science Foundation (2007-2010). Vasant Honavar (PI-ISU) and Doina Caragea (PI-KSU).
This project is supported by the National Science Foundation under Grant No. 0711356. Any opinions, findings, and conclusions or recommendations expressed on this website are those of the authors and do not necessarily reflect the views of the National Science Foundation.