Ensemble-based classifiers

Rokach, Lior

doi:10.1007/s10462-009-9124-7

Ensemble-based classifiers

Published: 19 November 2009

Volume 33, pages 1–39, (2010)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Lior Rokach¹

8662 Accesses
1732 Citations
23 Altmetric
2 Mentions
Explore all metrics

Abstract

The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is well-known that ensemble methods can be used for improving prediction performance. Researchers from various disciplines such as statistics and AI considered the use of ensemble methodology. This paper, review existing ensemble techniques and can be served as a tutorial for practitioners who are interested in building ensemble based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Empirical Analysis of Classifiers Using Ensemble Techniques

Significant Improvement in Classification Performance Metrics by Ensemble Approach

A Study on Ensemble Methods for Classification

References

Arbel R, Rokach L (2006) Classifier evaluation under limited resources. Pattern Recognit Lett 27(14): 1619–1631
Article Google Scholar
Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1): 173–180
Article Google Scholar
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 35: 1–38
Google Scholar
Bay S (1999) Nearest neighbor classification from multiple feature subsets. Intell Data Anal 3(3): 191–209
Article Google Scholar
Biermann AW, Faireld J, Beres T (1982) Signature table systems and learning. IEEE Trans Syst Man Cybern 12(5): 635–648
Article MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140
MATH MathSciNet Google Scholar
Breiman L (1998) Arcing classifiers. Ann Stat 26(3): 801–849
Article MATH MathSciNet Google Scholar
Breiman L (1999) Pasting small votes for classification in large databases and on-line. Mach Learn 36(2): 85–103
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45: 5–32
Article MATH Google Scholar
Brodley CE (1995) Recursive automatic bias selection for classifier construction. Mach Learn 20: 63–94
Google Scholar
Bryll R, Gutierrez-Osuna R, Quek F (2003) Bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognit 36: 1291–1302
Article MATH Google Scholar
Brown G, Wyatt JL (2003) Negative correlation learning and the ambiguity family of ensemble methods. Proceedings of 4th international workshop, Mult Classifier Syst 2003, Guilford, UK, June 11–13, 2003, Lecture Notes in Computer Science, vol 2709, pp 266–275
Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1): 5–20
Article Google Scholar
Buchanan BG, Shortliffe EH (1984) Rule based expert systems. Addison-Wesley, Reading 272–292
Google Scholar
Buntine W (1990) A theory of learning classification rules. Doctoral Dissertation. School of Computing Science, University of Technology. Sydney. Australia
Buntine W (1996) Graphical models for discovering knowledge. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, Cambridge, pp 59–82
Google Scholar
Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models, twenty-first international conference on Machine learning, July 04–08, 2004, Banff, Alberta, Canada
Chan PK, Stolfo SJ (1993) Toward parallel and distributed learning by meta-learning. In: AAAI Workshop in Knowledge Discovery in Databases, pp 227–240
Chan PK, Stolfo SJ (1995) A comparative evaluation of voting and meta-learning on partitioned data, Proceeding of 12th international conference On machine learning ICML-95
Chan PK, Stolfo SJ (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 8: 5–28
Article Google Scholar
Charnes A, Cooper WW, Rhodes E (1978) Measuring the efficiency of decision making units. Eur J Oper Res 2(6): 429–444
Article MATH MathSciNet Google Scholar
Chawla NV, Hall LO, Bowyer KW, Kegelmeyer WP (2004) Learning ensembles from bites: a scalable and accurate approach. J Mach Learn Res Arch 5: 421–451
MathSciNet Google Scholar
Chen K, Wang L, Chi H (1997) Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. Intern J Pattern Recognit Artif Intell 11(3): 417–445
Article Google Scholar
Cherkauer KJ (1996) Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In: Working notes, integrating multiple learned models for improving and scaling machine learning algorithms workshop, thirteenth national conference on artificial intelligence. AAAI Press, Portland, OR
Clark P, Boswell R (1991) Rule induction with CN2: some recent improvements. In: Proceedings of the European working session on learning, Pitman, pp 151–163
Cohen S, Rokach L, Maimon O (2007) Decision tree instance space decomposition with grouped gain-ratio. Inf Sci 177: 3592–3612
Article Google Scholar
Dasarathy BV, Sheela BV (1979) Composite classifier system design: concepts and methodology. Proc IEEE 67(5): 708–713
Article Google Scholar
Derbeko P, El-Yaniv R, Meir R (2002) Variance optimized bagging. Eur Conf Mach Learn
Džeroski S, Ženko B (2004) Is combining classifiers with stacking better than selecting the best one?. Mach Learn 54(3): 255–273
Article MATH Google Scholar
Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2: 263–286
MATH Google Scholar
Dimitriadou E, Weingessel A, Hornik K (2003) A cluster ensembles framework, Design and application of hybrid intelligent systems. IOS Press, Amsterdam
Google Scholar
Domingos P (1996) Using partitioning to speed up specific-to-general rule induction. In: Proceedings of the AAAI-96 workshop on integrating multiple learned models. AAAI Press, Cambridge, pp 29–34
Frelicot C, Mascarilla L (2001) Reject strategies driver combination of pattern classifiers
Freund S (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2): 256–285
Article MATH MathSciNet Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Machine learning: proceedings of the thirteenth international conference, pp 325–332
Friedman J, Hastie T, Tibshirani R (1998) Additive logistic regression: a statistical view of boosting
Gama J (2004) A linear-bayes classifier. In: Monard C (ed) Advances on artificial intelligence—SBIA2000. LNAI 1952, Springer, Berlin, pp 269–279
Gams M (1989) New measurements highlight the importance of redundant knowledge. In: European working session on learning, Montpeiller, France, Pitman
Garcia-Pddrajas N, Garcia-Osorio C, Fyfe C (2007) Nonlinear boosting projections for ensemble construction. J Mach Learn Res 8: 1–33
MathSciNet Google Scholar
Hampshire JB, Waibel A (1992) The meta-Pi network—building distributed knowledge representations for robust multisource pattern-recognition. Pattern Anal Mach Intell 14(7): 751–769
Article Google Scholar
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10): 993–1001
Article Google Scholar
Hansen J (2000) Combining predictors. Meta machine learning methods and bias, variance & ambiguity decompositions. PhD Dissertation. Aurhus University
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8): 832–844
Article Google Scholar
Holmstrom L, Koistinen P, Laaksonen J, Oja E (1997) Neural and statistical classifiers—taxonomy and a case study. IEEE Trans Neural Netw 8: 5–17
Article Google Scholar
Hsu CW, Lin CJ (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2): 415–425
Article Google Scholar
Hu Q, Yu D, Xie Z, Li X (2007) EROS: ensemble rough subspaces. Pattern Recognit 40: 3728–3739
Article MATH Google Scholar
Hu X (2001) Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. ICDM01. pp 233–240
Islam MM, Yao X, Murase K (2003) A constructive algorithm for training cooperative neural network ensembles. IEEE Trans Neural Netw 14(4): 820–834
Article Google Scholar
Jenkins R, Yuhas BPA (1993) Simplified neural network solution through problem decomposition: The case of Truck backer-upper. IEEE Trans Neural Netw 4(4): 718–722
Article Google Scholar
Johansen TA, Foss BA (1992) A narmax model representation for adaptive control based on local model—Modeling. Identif Control 13(1): 25–39
Google Scholar
Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6: 181–214
Article Google Scholar
Kamath C, Cantu-Paz E (2001) Creating ensembles of decision trees through sampling, Proceedings, 33rd symposium on the interface of computing science and statistics, Costa Mesa, CA, June 2001
Kamath C, Cantú E, Littau D (2002) Approximate splitting for ensembles of trees using histograms. In: Second SIAM international conference on data mining (SDM-2002)
Kohavi R (1996) Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 114–119
Kolen JF, Pollack JB (1991) Back propagation is sesitive to initial conditions. In: Advances in neural information processing systems, vol 3. Morgan Kaufmann, San Francisco, pp 860–867
Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation and active learning. Adv Neural Inf Process Syst 7: 231–238
Google Scholar
Kuncheva L (2005) Combining pattern classifiers. Wiley Press, New York
Google Scholar
Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach Learn 51(2):181–207
Article MATH Google Scholar
Kuncheva LI (2005) Diversity in multiple classifier systems (Editorial). Inf Fusion 6(1): 3–4
Article MathSciNet Google Scholar
Kusiak A (2000) Decomposition in data mining: an industrial case study. IEEE Trans Electron Packaging Manuf 23(4): 345–353
Article Google Scholar
Langdon WB, Barrett SJ, Buxton BF, (2002) Combining decision trees and neural networks for drug discovery. In: Genetic programming, proceedings of the 5th European conference, EuroGP 2002, Kinsale, Ireland, pp 60–70
Liu Y (2005) Generate different neural networks by negative correlation learning. ICNC 1: 149–156
Google Scholar
Liu H, Mandvikar A, Mody J (2004) An empirical study of building compact ensembles. WAIM 622–627
Long C (2003) Bi-decomposition of function sets using multi-valued logic, Eng Doc Dissertation, Technischen Universitat Bergakademie Freiberg
Maimon O, Rokach L (2005) Decomposition methodology for knowledge discovery and data mining: theory and applications. World Scientific, Singapore
MATH Google Scholar
Maimon O, Rokach L (2002) Improving supervised learning by feature decomposition. In: Proceedings of foundations of information and knowledge systems, Salzan Castle, Germany, pp 178–196
Margineantu D, Dietterich T (1997) Pruning adaptive boosting. In: Proceedings of fourteenth international conference machine learning, pp 211–218
Melville P, Mooney RJ (2003) Constructing diverse classifier ensembles using artificial training examples. IJCAI 505–512
Merler S, Caprile B, Furlanello C (2007) Parallelizing AdaBoost by weights dynamics. Comput Stat Data Anal 51: 2487–2498
Article MATH MathSciNet Google Scholar
Merz CJ (1999) Using correspondence analysis to combine classifier. Mach Learn 36(1–2): 33–58
Article Google Scholar
Michalski RS, Tecuci G (1994) Machine learning, a multistrategy approach. Morgan Kaufmann, San Francisco
Google Scholar
Michie D (1995) Problem decomposition and the learning of skills, In: Proceedings of the European conference on machine learning, Springer, Berlin, pp 17–31
Mitchell T (1980) The need for biases in learning generalizations. Technical Report CBM-TR-117, Rutgers University, Department of Computer Science, New Brunswick
Nowlan SJ, Hinton GE (1991) Evaluation of adaptive mixtures of competing experts. In: Lippmann RP, Moody JE, Touretzky DS (eds) Advances in neural information processing systems, vol 3. Morgan Kaufmann Publishers, San Francisco, pp 774–780
Ohno-Machado L, Musen MA (1997) Modular neural networks for medical prognosis: quantifying the benefits of combining neural networks for survival prediction. Connect Sci 9(1): 71–86
Article Google Scholar
Opitz D (1999) Feature selection for ensembles. In: Proceedings of 16th National Conference on Artificial Intelligence, AAAI. pp 379–384
Opitz D, Shavlik J (1996) Generating accurate and diverse members of a neuralá1network ensemble. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. The MIT Press, Cambridge, pp 535–541
Parmanto B, Munro PW, Doyle HR (1996) Improving committee diagnosis with resampling techinques. In: Touretzky DS, Mozer MC, Hesselmo ME (eds) Advances in neural information processing systems, vol 8. MIT Press, Cambridge, pp 882–888
Google Scholar
Partridge D, Yates WB (1996) Engineering multiversion neural-net systems. Neural Comput 8(4): 869–893
Article Google Scholar
Passerini A, Pontil M, Frasconi P (2004) New results on error correcting output codes of kernel machines. IEEE Trans Neural Netw 15: 45–54
Article Google Scholar
Peng F, Jacobs RA, Tanner MA (1995) Bayesian inference in mixtures-of-experts and hierarchical mixturesof-experts models with an application to speech recognition. J Am Stat Assoc 91(435):953–960
Article Google Scholar
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3): 21–45
Article Google Scholar
Prodromidis AL, Stolfo SJ, Chan PK (1999) Effective and efficient pruning of metaclassifiers in a distributed Data Mining system. Technical report CUCS-017-99, Columbia University
Prodromidis AL, Stolfo SJ (2001) Cost complexity-based pruning of ensemble classifiers. Knowl Inf Syst 3(4): 449–469
Article MATH Google Scholar
Provost FJ, Kolluri V (1997) A survey of methods for scaling up inductive learning algorithms. In: Proceeding of 3rd international conference on knowledge discovery and data mining
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Los Altos
Google Scholar
Quinlan JR (1996) Bagging, Boosting, and C4.5. In: Proceedings of the thirteenth national conference on artificial intelligence, pp 725–730
Rahman AFR, Fairhurst MC (1997) A new hybrid approach in combining multiple experts to recognize handwritten numerals. Pattern Recognit Lett 18: 781–790
Article Google Scholar
Ramamurti V, Ghosh J (1999) Structurally adaptive modular networks for non-stationary environments. IEEE Trans Neural Netw 10(1): 152–160
Article Google Scholar
Rokach L (2006) Decomposition methodology for classification tasks—A meta decomposer framework. Pattern Anal Appl 9: 257–271
Article MathSciNet Google Scholar
Rokach L (2008) Genetic algorithm-based feature set partitioning for classification problems. Pattern Recognit 41(5): 1676–1700
Article MATH Google Scholar
Rokach L (2009) Collective-agreement-based pruning of ensembles. Comput Stat Data Anal 53(4): 1015–1026
Article MATH Google Scholar
Rokach L, Arbel R, Maimon O (2006) Selective voting—getting more for less in sensor fusion. Intern J Pattern Recognit Artif Intell 20(3): 329–350
Article Google Scholar
Rokach L, Maimon O (2005a) Top down induction of decision trees classifiers: A survey. IEEE SMC Trans Part C 35(4):476–487
Article Google Scholar
Rokach L, Maimon O (2005b) Feature Set decomposition for decision trees. J Intell Data Anal 9(2): 131–158
Google Scholar
Rokach L, Maimon O (2008) Data mining with decision trees: theory and applications. World Scientific Publishing, Singapore
MATH Google Scholar
Rokach L, Maimon O, Arad O (2005) Improving supervised learning by sample decomposition. Int J Comput Intell Appl 5(1): 37–54
Article Google Scholar
Rokach L, Maimon O, Lavi I (2003) Space decomposition in data mining: A clustering approach. In: Proceedings of the 14th international symposium on methodologies for intelligent systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer, Berlin, pp 24–31
Rudin C, Daubechies I, Schapire RE (2004) The dynamics of Adaboost: cyclic behavior and convergence of margins. J Mach Learn Res 5: 1557–1595
MathSciNet Google Scholar
Rosen BE (1996) Ensemble learning using decorrelated neural networks. Connect Sci 8(3): 373–384
Article Google Scholar
Samuel A (1967) Some studies in machine learning using the game of checkers II: Recent progress. IBM J Res Develop 11: 601–617
Article Google Scholar
Saaty X (1996) The analytic hierarchy process: A 1993 overview. Cent Eur J Oper Res Econ 2(2): 119–137
Google Scholar
Schaffer C (1993) Selecting a classification method by cross-validation. Mach Learn 13(1): 135–143
Google Scholar
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2): 197–227
Google Scholar
Seewald AK, Fürnkranz J (2001) Grading classifiers, Austrian research institute for Artificial intelligence
Sharkey A (1996) On combining artificial neural nets. Connect Sci 8: 299–313
Article Google Scholar
Sharkey N, Neary J, Sharkey A (1995) Searching weight space for backpropagation solution types, current trends in connectionism: Proceedings of the 1995 Swedish conference on connectionism, pp 103–120
Shilen S (1990) Multiple binary tree classifiers. Pattern Recognit 23(7): 757–763
Article Google Scholar
Skurichina M, Duin RPW (2002) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Appl 5(2): 121–135
Article MATH MathSciNet Google Scholar
Sohn SY, Choi H (2001) Ensemble based on data envelopment analysis. ECML Meta Learning workshop
Tamon C, Xiang J (2000) On the boosting pruning problem. In: Proceedings of the 11th European conference on machine learning, pp 404–412
Towell G, Shavlik J (1994) Knowledge-based artificial neural networks. Artif Intell 70: 119–165
Article MATH Google Scholar
Tsoumakas G, Partalas I, Vlahavas I (2008) A taxonomy and short review of ensemble selection. In: ECAI 2008, workshop on supervised and unsupervised ensemble methods and their applications
Tsymbal A, Puuronen S (2002) Ensemble feature selection with the simple bayesian classification in medical diagnostics, In: Proceedings of 15th IEEE symposium on Computer-Based Medical Systems CBMS2002, IEEE CS Press, Maribor, Slovenia, pp 225–230
Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading
MATH Google Scholar
Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connection science, special issue on combining artificial neural networks: ensemble approaches. 8(3–4): 385–404
Google Scholar
Tumer K, Ghosh J (2000) Robust Order Statistics based Ensembles for Distributed Data Mining. In: Kargupta H, Chan P (eds) Advances in distributed and parallel knowledge discovery. AAAI/MIT Press, Cambridge, pp 185–210
Google Scholar
Tumer K, Oza CN (2003) Input decimated ensembles. Pattern Anal Appl 6: 65–77
Article MATH MathSciNet Google Scholar
Wang W, Jones P, Partridge D (2000) Diversity between neural networks and decision trees for building multiple classifier systems, In: Proceeding of international workshop on multiple classifier systems (LNCS 1857), Springer, Calgiari, pp 240–249
Weigend AS, Mangeas M, Srivastava AN (1995) Nonlinear gated experts for time-series—discovering regimes and avoiding overfitting. Int J Neural Syst 6(5): 373–399
Article Google Scholar
Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In: Verleysen M (ed) Proceedings of the 7th European symposium on artificial neural networks (ESANN-99), Bruges, Belgium, pp 219–224
Windeatt T, Ardeshir G (2001) An Empirical comparison of pruning methods for ensemble classifiers, IDA2001, LNCS 2189, pp 208–217
Woods K, Kegelmeyer W, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 19: 405–410
Article Google Scholar
Wolpert DH (1992) Stacked generalization. In: (eds) Neural Networks, vol 5. Pergamon Press, Oxford, pp 241–259
Google Scholar
Yanim S, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition (40): 3358–3378
Yates W, Partridge D (1996) Use of methodological diversity to improve neural network generalization. Neural Comput Appl 4(2): 114–128
Article Google Scholar
Zhang Y, Burer S, Street WN (2006) Ensemble pruning via semi-definite programming. J Mach Learn Res 7: 1315–1338
MathSciNet Google Scholar
Zhang CX, Zhang JS (2008) A local boosting algorithm for solving classification problems. Comput Stat Data Anal 52(4): 1928–1941
Article MATH Google Scholar
Zhou ZH, Tang W (2003) Selective ensemble of decision trees. In: Wang G, Liu Q, Yao Y, Skowron A (eds) Rough sets, fuzzy sets, data mining, and granular computing, 9th international conference, RSFDGrC, Chongqing, China, Proceedings. Lecture Notes in Computer Science 2639, pp 476–483
Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137: 239–263
Article MATH MathSciNet Google Scholar
Zupan B, Bohanec M, Demsar J, Bratko I (1998) Feature transformation by function decomposition. IEEE Intell Syst Appl 13: 38–43
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information System Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Lior Rokach

Authors

Lior Rokach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lior Rokach.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rokach, L. Ensemble-based classifiers. Artif Intell Rev 33, 1–39 (2010). https://doi.org/10.1007/s10462-009-9124-7

Download citation

Published: 19 November 2009
Issue Date: February 2010
DOI: https://doi.org/10.1007/s10462-009-9124-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble-based classifiers

Abstract

Access this article

Similar content being viewed by others

An Empirical Analysis of Classifiers Using Ensemble Techniques

Significant Improvement in Classification Performance Metrics by Ensemble Approach

A Study on Ensemble Methods for Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ensemble-based classifiers

Abstract

Access this article

Similar content being viewed by others

An Empirical Analysis of Classifiers Using Ensemble Techniques

Significant Improvement in Classification Performance Metrics by Ensemble Approach

A Study on Ensemble Methods for Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation