Skip to main content
Log in

Ensemble-based classifiers

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is well-known that ensemble methods can be used for improving prediction performance. Researchers from various disciplines such as statistics and AI considered the use of ensemble methodology. This paper, review existing ensemble techniques and can be served as a tutorial for practitioners who are interested in building ensemble based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Arbel R, Rokach L (2006) Classifier evaluation under limited resources. Pattern Recognit Lett 27(14): 1619–1631

    Article  Google Scholar 

  • Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1): 173–180

    Article  Google Scholar 

  • Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 35: 1–38

    Google Scholar 

  • Bay S (1999) Nearest neighbor classification from multiple feature subsets. Intell Data Anal 3(3): 191–209

    Article  Google Scholar 

  • Biermann AW, Faireld J, Beres T (1982) Signature table systems and learning. IEEE Trans Syst Man Cybern 12(5): 635–648

    Article  MATH  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140

    MATH  MathSciNet  Google Scholar 

  • Breiman L (1998) Arcing classifiers. Ann Stat 26(3): 801–849

    Article  MATH  MathSciNet  Google Scholar 

  • Breiman L (1999) Pasting small votes for classification in large databases and on-line. Mach Learn 36(2): 85–103

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45: 5–32

    Article  MATH  Google Scholar 

  • Brodley CE (1995) Recursive automatic bias selection for classifier construction. Mach Learn 20: 63–94

    Google Scholar 

  • Bryll R, Gutierrez-Osuna R, Quek F (2003) Bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognit 36: 1291–1302

    Article  MATH  Google Scholar 

  • Brown G, Wyatt JL (2003) Negative correlation learning and the ambiguity family of ensemble methods. Proceedings of 4th international workshop, Mult Classifier Syst 2003, Guilford, UK, June 11–13, 2003, Lecture Notes in Computer Science, vol 2709, pp 266–275

  • Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1): 5–20

    Article  Google Scholar 

  • Buchanan BG, Shortliffe EH (1984) Rule based expert systems. Addison-Wesley, Reading 272–292

    Google Scholar 

  • Buntine W (1990) A theory of learning classification rules. Doctoral Dissertation. School of Computing Science, University of Technology. Sydney. Australia

  • Buntine W (1996) Graphical models for discovering knowledge. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, Cambridge, pp 59–82

    Google Scholar 

  • Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models, twenty-first international conference on Machine learning, July 04–08, 2004, Banff, Alberta, Canada

  • Chan PK, Stolfo SJ (1993) Toward parallel and distributed learning by meta-learning. In: AAAI Workshop in Knowledge Discovery in Databases, pp 227–240

  • Chan PK, Stolfo SJ (1995) A comparative evaluation of voting and meta-learning on partitioned data, Proceeding of 12th international conference On machine learning ICML-95

  • Chan PK, Stolfo SJ (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 8: 5–28

    Article  Google Scholar 

  • Charnes A, Cooper WW, Rhodes E (1978) Measuring the efficiency of decision making units. Eur J Oper Res 2(6): 429–444

    Article  MATH  MathSciNet  Google Scholar 

  • Chawla NV, Hall LO, Bowyer KW, Kegelmeyer WP (2004) Learning ensembles from bites: a scalable and accurate approach. J Mach Learn Res Arch 5: 421–451

    MathSciNet  Google Scholar 

  • Chen K, Wang L, Chi H (1997) Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. Intern J Pattern Recognit Artif Intell 11(3): 417–445

    Article  Google Scholar 

  • Cherkauer KJ (1996) Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In: Working notes, integrating multiple learned models for improving and scaling machine learning algorithms workshop, thirteenth national conference on artificial intelligence. AAAI Press, Portland, OR

  • Clark P, Boswell R (1991) Rule induction with CN2: some recent improvements. In: Proceedings of the European working session on learning, Pitman, pp 151–163

  • Cohen S, Rokach L, Maimon O (2007) Decision tree instance space decomposition with grouped gain-ratio. Inf Sci 177: 3592–3612

    Article  Google Scholar 

  • Dasarathy BV, Sheela BV (1979) Composite classifier system design: concepts and methodology. Proc IEEE 67(5): 708–713

    Article  Google Scholar 

  • Derbeko P, El-Yaniv R, Meir R (2002) Variance optimized bagging. Eur Conf Mach Learn

  • Džeroski S, Ženko B (2004) Is combining classifiers with stacking better than selecting the best one?. Mach Learn 54(3): 255–273

    Article  MATH  Google Scholar 

  • Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2: 263–286

    MATH  Google Scholar 

  • Dimitriadou E, Weingessel A, Hornik K (2003) A cluster ensembles framework, Design and application of hybrid intelligent systems. IOS Press, Amsterdam

    Google Scholar 

  • Domingos P (1996) Using partitioning to speed up specific-to-general rule induction. In: Proceedings of the AAAI-96 workshop on integrating multiple learned models. AAAI Press, Cambridge, pp 29–34

  • Frelicot C, Mascarilla L (2001) Reject strategies driver combination of pattern classifiers

  • Freund S (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2): 256–285

    Article  MATH  MathSciNet  Google Scholar 

  • Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Machine learning: proceedings of the thirteenth international conference, pp 325–332

  • Friedman J, Hastie T, Tibshirani R (1998) Additive logistic regression: a statistical view of boosting

  • Gama J (2004) A linear-bayes classifier. In: Monard C (ed) Advances on artificial intelligence—SBIA2000. LNAI 1952, Springer, Berlin, pp 269–279

  • Gams M (1989) New measurements highlight the importance of redundant knowledge. In: European working session on learning, Montpeiller, France, Pitman

  • Garcia-Pddrajas N, Garcia-Osorio C, Fyfe C (2007) Nonlinear boosting projections for ensemble construction. J Mach Learn Res 8: 1–33

    MathSciNet  Google Scholar 

  • Hampshire JB, Waibel A (1992) The meta-Pi network—building distributed knowledge representations for robust multisource pattern-recognition. Pattern Anal Mach Intell 14(7): 751–769

    Article  Google Scholar 

  • Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10): 993–1001

    Article  Google Scholar 

  • Hansen J (2000) Combining predictors. Meta machine learning methods and bias, variance & ambiguity decompositions. PhD Dissertation. Aurhus University

  • Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8): 832–844

    Article  Google Scholar 

  • Holmstrom L, Koistinen P, Laaksonen J, Oja E (1997) Neural and statistical classifiers—taxonomy and a case study. IEEE Trans Neural Netw 8: 5–17

    Article  Google Scholar 

  • Hsu CW, Lin CJ (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2): 415–425

    Article  Google Scholar 

  • Hu Q, Yu D, Xie Z, Li X (2007) EROS: ensemble rough subspaces. Pattern Recognit 40: 3728–3739

    Article  MATH  Google Scholar 

  • Hu X (2001) Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. ICDM01. pp 233–240

  • Islam MM, Yao X, Murase K (2003) A constructive algorithm for training cooperative neural network ensembles. IEEE Trans Neural Netw 14(4): 820–834

    Article  Google Scholar 

  • Jenkins R, Yuhas BPA (1993) Simplified neural network solution through problem decomposition: The case of Truck backer-upper. IEEE Trans Neural Netw 4(4): 718–722

    Article  Google Scholar 

  • Johansen TA, Foss BA (1992) A narmax model representation for adaptive control based on local model—Modeling. Identif Control 13(1): 25–39

    Google Scholar 

  • Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6: 181–214

    Article  Google Scholar 

  • Kamath C, Cantu-Paz E (2001) Creating ensembles of decision trees through sampling, Proceedings, 33rd symposium on the interface of computing science and statistics, Costa Mesa, CA, June 2001

  • Kamath C, Cantú E, Littau D (2002) Approximate splitting for ensembles of trees using histograms. In: Second SIAM international conference on data mining (SDM-2002)

  • Kohavi R (1996) Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 114–119

  • Kolen JF, Pollack JB (1991) Back propagation is sesitive to initial conditions. In: Advances in neural information processing systems, vol 3. Morgan Kaufmann, San Francisco, pp 860–867

  • Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation and active learning. Adv Neural Inf Process Syst 7: 231–238

    Google Scholar 

  • Kuncheva L (2005) Combining pattern classifiers. Wiley Press, New York

    Google Scholar 

  • Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach Learn 51(2):181–207

    Article  MATH  Google Scholar 

  • Kuncheva LI (2005) Diversity in multiple classifier systems (Editorial). Inf Fusion 6(1): 3–4

    Article  MathSciNet  Google Scholar 

  • Kusiak A (2000) Decomposition in data mining: an industrial case study. IEEE Trans Electron Packaging Manuf 23(4): 345–353

    Article  Google Scholar 

  • Langdon WB, Barrett SJ, Buxton BF, (2002) Combining decision trees and neural networks for drug discovery. In: Genetic programming, proceedings of the 5th European conference, EuroGP 2002, Kinsale, Ireland, pp 60–70

  • Liu Y (2005) Generate different neural networks by negative correlation learning. ICNC 1: 149–156

    Google Scholar 

  • Liu H, Mandvikar A, Mody J (2004) An empirical study of building compact ensembles. WAIM 622–627

  • Long C (2003) Bi-decomposition of function sets using multi-valued logic, Eng Doc Dissertation, Technischen Universitat Bergakademie Freiberg

  • Maimon O, Rokach L (2005) Decomposition methodology for knowledge discovery and data mining: theory and applications. World Scientific, Singapore

    MATH  Google Scholar 

  • Maimon O, Rokach L (2002) Improving supervised learning by feature decomposition. In: Proceedings of foundations of information and knowledge systems, Salzan Castle, Germany, pp 178–196

  • Margineantu D, Dietterich T (1997) Pruning adaptive boosting. In: Proceedings of fourteenth international conference machine learning, pp 211–218

  • Melville P, Mooney RJ (2003) Constructing diverse classifier ensembles using artificial training examples. IJCAI 505–512

  • Merler S, Caprile B, Furlanello C (2007) Parallelizing AdaBoost by weights dynamics. Comput Stat Data Anal 51: 2487–2498

    Article  MATH  MathSciNet  Google Scholar 

  • Merz CJ (1999) Using correspondence analysis to combine classifier. Mach Learn 36(1–2): 33–58

    Article  Google Scholar 

  • Michalski RS, Tecuci G (1994) Machine learning, a multistrategy approach. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Michie D (1995) Problem decomposition and the learning of skills, In: Proceedings of the European conference on machine learning, Springer, Berlin, pp 17–31

  • Mitchell T (1980) The need for biases in learning generalizations. Technical Report CBM-TR-117, Rutgers University, Department of Computer Science, New Brunswick

  • Nowlan SJ, Hinton GE (1991) Evaluation of adaptive mixtures of competing experts. In: Lippmann RP, Moody JE, Touretzky DS (eds) Advances in neural information processing systems, vol 3. Morgan Kaufmann Publishers, San Francisco, pp 774–780

  • Ohno-Machado L, Musen MA (1997) Modular neural networks for medical prognosis: quantifying the benefits of combining neural networks for survival prediction. Connect Sci 9(1): 71–86

    Article  Google Scholar 

  • Opitz D (1999) Feature selection for ensembles. In: Proceedings of 16th National Conference on Artificial Intelligence, AAAI. pp 379–384

  • Opitz D, Shavlik J (1996) Generating accurate and diverse members of a neuralá1network ensemble. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. The MIT Press, Cambridge, pp 535–541

  • Parmanto B, Munro PW, Doyle HR (1996) Improving committee diagnosis with resampling techinques. In: Touretzky DS, Mozer MC, Hesselmo ME (eds) Advances in neural information processing systems, vol 8. MIT Press, Cambridge, pp 882–888

    Google Scholar 

  • Partridge D, Yates WB (1996) Engineering multiversion neural-net systems. Neural Comput 8(4): 869–893

    Article  Google Scholar 

  • Passerini A, Pontil M, Frasconi P (2004) New results on error correcting output codes of kernel machines. IEEE Trans Neural Netw 15: 45–54

    Article  Google Scholar 

  • Peng F, Jacobs RA, Tanner MA (1995) Bayesian inference in mixtures-of-experts and hierarchical mixturesof-experts models with an application to speech recognition. J Am Stat Assoc 91(435):953–960

    Article  Google Scholar 

  • Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3): 21–45

    Article  Google Scholar 

  • Prodromidis AL, Stolfo SJ, Chan PK (1999) Effective and efficient pruning of metaclassifiers in a distributed Data Mining system. Technical report CUCS-017-99, Columbia University

  • Prodromidis AL, Stolfo SJ (2001) Cost complexity-based pruning of ensemble classifiers. Knowl Inf Syst 3(4): 449–469

    Article  MATH  Google Scholar 

  • Provost FJ, Kolluri V (1997) A survey of methods for scaling up inductive learning algorithms. In: Proceeding of 3rd international conference on knowledge discovery and data mining

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Los Altos

    Google Scholar 

  • Quinlan JR (1996) Bagging, Boosting, and C4.5. In: Proceedings of the thirteenth national conference on artificial intelligence, pp 725–730

  • Rahman AFR, Fairhurst MC (1997) A new hybrid approach in combining multiple experts to recognize handwritten numerals. Pattern Recognit Lett 18: 781–790

    Article  Google Scholar 

  • Ramamurti V, Ghosh J (1999) Structurally adaptive modular networks for non-stationary environments. IEEE Trans Neural Netw 10(1): 152–160

    Article  Google Scholar 

  • Rokach L (2006) Decomposition methodology for classification tasks—A meta decomposer framework. Pattern Anal Appl 9: 257–271

    Article  MathSciNet  Google Scholar 

  • Rokach L (2008) Genetic algorithm-based feature set partitioning for classification problems. Pattern Recognit 41(5): 1676–1700

    Article  MATH  Google Scholar 

  • Rokach L (2009) Collective-agreement-based pruning of ensembles. Comput Stat Data Anal 53(4): 1015–1026

    Article  MATH  Google Scholar 

  • Rokach L, Arbel R, Maimon O (2006) Selective voting—getting more for less in sensor fusion. Intern J Pattern Recognit Artif Intell 20(3): 329–350

    Article  Google Scholar 

  • Rokach L, Maimon O (2005a) Top down induction of decision trees classifiers: A survey. IEEE SMC Trans Part C 35(4):476–487

    Article  Google Scholar 

  • Rokach L, Maimon O (2005b) Feature Set decomposition for decision trees. J Intell Data Anal 9(2): 131–158

    Google Scholar 

  • Rokach L, Maimon O (2008) Data mining with decision trees: theory and applications. World Scientific Publishing, Singapore

    MATH  Google Scholar 

  • Rokach L, Maimon O, Arad O (2005) Improving supervised learning by sample decomposition. Int J Comput Intell Appl 5(1): 37–54

    Article  Google Scholar 

  • Rokach L, Maimon O, Lavi I (2003) Space decomposition in data mining: A clustering approach. In: Proceedings of the 14th international symposium on methodologies for intelligent systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer, Berlin, pp 24–31

  • Rudin C, Daubechies I, Schapire RE (2004) The dynamics of Adaboost: cyclic behavior and convergence of margins. J Mach Learn Res 5: 1557–1595

    MathSciNet  Google Scholar 

  • Rosen BE (1996) Ensemble learning using decorrelated neural networks. Connect Sci 8(3): 373–384

    Article  Google Scholar 

  • Samuel A (1967) Some studies in machine learning using the game of checkers II: Recent progress. IBM J Res Develop 11: 601–617

    Article  Google Scholar 

  • Saaty X (1996) The analytic hierarchy process: A 1993 overview. Cent Eur J Oper Res Econ 2(2): 119–137

    Google Scholar 

  • Schaffer C (1993) Selecting a classification method by cross-validation. Mach Learn 13(1): 135–143

    Google Scholar 

  • Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2): 197–227

    Google Scholar 

  • Seewald AK, Fürnkranz J (2001) Grading classifiers, Austrian research institute for Artificial intelligence

  • Sharkey A (1996) On combining artificial neural nets. Connect Sci 8: 299–313

    Article  Google Scholar 

  • Sharkey N, Neary J, Sharkey A (1995) Searching weight space for backpropagation solution types, current trends in connectionism: Proceedings of the 1995 Swedish conference on connectionism, pp 103–120

  • Shilen S (1990) Multiple binary tree classifiers. Pattern Recognit 23(7): 757–763

    Article  Google Scholar 

  • Skurichina M, Duin RPW (2002) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Appl 5(2): 121–135

    Article  MATH  MathSciNet  Google Scholar 

  • Sohn SY, Choi H (2001) Ensemble based on data envelopment analysis. ECML Meta Learning workshop

  • Tamon C, Xiang J (2000) On the boosting pruning problem. In: Proceedings of the 11th European conference on machine learning, pp 404–412

  • Towell G, Shavlik J (1994) Knowledge-based artificial neural networks. Artif Intell 70: 119–165

    Article  MATH  Google Scholar 

  • Tsoumakas G, Partalas I, Vlahavas I (2008) A taxonomy and short review of ensemble selection. In: ECAI 2008, workshop on supervised and unsupervised ensemble methods and their applications

  • Tsymbal A, Puuronen S (2002) Ensemble feature selection with the simple bayesian classification in medical diagnostics, In: Proceedings of 15th IEEE symposium on Computer-Based Medical Systems CBMS2002, IEEE CS Press, Maribor, Slovenia, pp 225–230

  • Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading

    MATH  Google Scholar 

  • Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connection science, special issue on combining artificial neural networks: ensemble approaches. 8(3–4): 385–404

    Google Scholar 

  • Tumer K, Ghosh J (2000) Robust Order Statistics based Ensembles for Distributed Data Mining. In: Kargupta H, Chan P (eds) Advances in distributed and parallel knowledge discovery. AAAI/MIT Press, Cambridge, pp 185–210

    Google Scholar 

  • Tumer K, Oza CN (2003) Input decimated ensembles. Pattern Anal Appl 6: 65–77

    Article  MATH  MathSciNet  Google Scholar 

  • Wang W, Jones P, Partridge D (2000) Diversity between neural networks and decision trees for building multiple classifier systems, In: Proceeding of international workshop on multiple classifier systems (LNCS 1857), Springer, Calgiari, pp 240–249

  • Weigend AS, Mangeas M, Srivastava AN (1995) Nonlinear gated experts for time-series—discovering regimes and avoiding overfitting. Int J Neural Syst 6(5): 373–399

    Article  Google Scholar 

  • Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In: Verleysen M (ed) Proceedings of the 7th European symposium on artificial neural networks (ESANN-99), Bruges, Belgium, pp 219–224

  • Windeatt T, Ardeshir G (2001) An Empirical comparison of pruning methods for ensemble classifiers, IDA2001, LNCS 2189, pp 208–217

  • Woods K, Kegelmeyer W, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 19: 405–410

    Article  Google Scholar 

  • Wolpert DH (1992) Stacked generalization. In: (eds) Neural Networks, vol 5. Pergamon Press, Oxford, pp 241–259

    Google Scholar 

  • Yanim S, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition (40): 3358–3378

  • Yates W, Partridge D (1996) Use of methodological diversity to improve neural network generalization. Neural Comput Appl 4(2): 114–128

    Article  Google Scholar 

  • Zhang Y, Burer S, Street WN (2006) Ensemble pruning via semi-definite programming. J Mach Learn Res 7: 1315–1338

    MathSciNet  Google Scholar 

  • Zhang CX, Zhang JS (2008) A local boosting algorithm for solving classification problems. Comput Stat Data Anal 52(4): 1928–1941

    Article  MATH  Google Scholar 

  • Zhou ZH, Tang W (2003) Selective ensemble of decision trees. In: Wang G, Liu Q, Yao Y, Skowron A (eds) Rough sets, fuzzy sets, data mining, and granular computing, 9th international conference, RSFDGrC, Chongqing, China, Proceedings. Lecture Notes in Computer Science 2639, pp 476–483

  • Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137: 239–263

    Article  MATH  MathSciNet  Google Scholar 

  • Zupan B, Bohanec M, Demsar J, Bratko I (1998) Feature transformation by function decomposition. IEEE Intell Syst Appl 13: 38–43

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lior Rokach.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rokach, L. Ensemble-based classifiers. Artif Intell Rev 33, 1–39 (2010). https://doi.org/10.1007/s10462-009-9124-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-009-9124-7

Keywords

Navigation