1994 Reports
Toward Scalable and Parallel Inductive Learning: A Case Study in Splice Junction Prediction
Much of the research in inductive learning concentrates on problems with relatively small amounts of training data. With the steady progress of the Human Genome Project, it is likely that orders of magnitude more data in sequence databases will be available in the near future for various learning problems of biological importance. Thus, techniques that provide the means of scaling machine learning algorithms requires considerable attention. Meta-learning is proposed as a general technique to integrate a number of distinct learning processes that aims to provide a means of scaling to large problems. This paper details several meta-learning strategies for integrating independently learned classifiers on subsets of training data by the same learner in a parallel and distributed computing environment. Our strategies are particularly suited for massive amounts of data that main-memory-based learning algorithms cannot handle efficiently. The strategies are also independent of the particular learning algorithm used and the underlying parallel and distributed platform. Preliminary experiments using different learning algorithms in a simulated parallel environment demonstrate encouraging results: parallel learning by meta-learning can achieve comparable prediction accuracy in less space and time than serial learning.
Subjects
Files
-
cucs-032-94.pdf application/pdf 209 KB Download File
More About This Work
- Academic Units
- Computer Science
- Publisher
- Department of Computer Science, Columbia University
- Series
- Columbia University Computer Science Technical Reports, CUCS-032-94
- Published Here
- February 3, 2012