Fish recruitment prediction, using robust supervised classification methods

Improving our ability to predict recruitment is a key element in fisheries management. However, the interactions between population dynamics and different environmental factors are complex and often non-linear, making it difficult to produce robust predictions. ‘Machine-learning’ techniques (in particular, supervised classification methods) have been proposed as useful tools, to overcome such difficulties. In this study, a methodology is proposed to build a robust classifier for fish recruitment prediction with sparse and noisy data. The methodology consists of 4 steps: (1) a semi-automated recruitment discretization method; (2) supervised discretization of predictors; (3) multivariate and non-redundant predictors selection; (4) learning a probabilistic classifier. In terms of fisheries management, the classifier estimated performance has important consequences and, to be useful, the manager needs to know the risk that is being taken when using this number. Probabilistic classifiers such as ‘naive Bayes’, have the advantage that, in addition to the predictions, estimate also the probability of each possible outcome. Anchovy (Engraulis encrasicolus) and hake (Merluccius merluccius) recruitments are used as application examples. ‘Two-intervals’ recruitment discretization accomplishes 70% accuracies and Brier scores of around 0.10, for both anchovy and hake recruitment. In comparison, ‘three-intervals’ recruitment discretization accomplishes 50% accuracies; and Brier scores of around 0.25 for anchovy and 0.30 for hake recruitment. These statistics are the result of validating not only the classifier, but also the previous steps, as a whole methodology.

[1]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[2]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[3]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[4]  Asad Mohsin,et al.  Hamilton, New Zealand , 2008 .

[5]  Lorenzo Motos,et al.  Distribution and abundance of European hake Merluccius merluccius (L.), eggs and larvae in the North East Atlantic waters in 1995 and 1998 in relation to hydrographic conditions , 2004 .

[6]  Thomas Brunel,et al.  Long‐term trends in fish recruitment in the north‐east Atlantic related to climate change , 2007 .

[7]  Jon Sáenz,et al.  Climate, oceanography, and recruitment: the case of the Bay of Biscay anchovy (Engraulis encrasicolus) , 2008 .

[8]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[9]  S. Larson The shrinkage of the coefficient of multiple correlation. , 1931 .

[10]  George C. Reid,et al.  Solar total irradiance variations and the global sea surface temperature record , 1991 .

[11]  Eberhard Hagen,et al.  Long‐term climate forcing of European herring and sardine populations , 1997 .

[12]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[13]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[14]  Richard J. Beamish,et al.  Have there been recent changes in climate? Ask the fish , 2000 .

[15]  Pedro Larrañaga,et al.  Machine Learning : Editorial , 2005 .

[16]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[17]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[18]  John T. Lehman,et al.  The first 25 years of Journal of Plankton Research: looking to the future , 2004 .

[19]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[20]  Valerio Bartolino,et al.  Modelling recruitment dynamics of hake, Merluccius merluccius, in the central Mediterranean in relation to key environmental variables , 2008 .

[21]  Pedro Larrañaga,et al.  Learning Bayesian networks in the space of structures by estimation of distribution algorithms , 2003, Int. J. Intell. Syst..

[22]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[23]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[24]  Pedro Larrañaga,et al.  Bioinformatics Advance Access published August 24, 2007 A review of feature selection techniques in bioinformatics , 2022 .

[25]  J. Lean,et al.  Reconstruction of solar irradiance since 1610: Implications for climate change , 1995 .

[26]  Beatriz A. Roel,et al.  Potential improvements in the management of Bay of Biscay anchovy by incorporating environmental indices as recruitment predictors , 2005 .

[27]  Laura Uusitalo,et al.  Advantages and challenges of Bayesian networks in environmental modelling , 2007 .

[28]  R. Beverton,et al.  On the dynamics of exploited fish populations , 1993, Reviews in Fish Biology and Fisheries.

[29]  J. J. Colbert,et al.  Interannual changes in sablefish (Anoplopoma fimbria) recruitment in relation to oceanographic conditions within the California Current System , 2006 .

[30]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[31]  T. Benner Central England temperatures: long‐term variability and teleconnections , 1999 .

[32]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[33]  Claude E. Shannon,et al.  The Mathematical Theory of Communication , 1950 .

[34]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[35]  Judea Pearl,et al.  Bayesian Networks , 1998, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[36]  Xabier Irigoien,et al.  Egg and larval distributions of seven fish species in north-east Atlantic waters , 2007 .

[37]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[38]  Ángel Borja,et al.  Relationships between anchovy (Engraulis encrasicolus recruitment and environment in the Bay of Biscay (1967-1996) , 1998 .

[39]  Michel Dreyfus-León,et al.  Recruitment prediction for Pacific herring (Clupea pallasi) on the west coast of Vancouver Island, Canada , 2008, Ecol. Informatics.

[40]  Constantin F. Aliferis,et al.  Causal Feature Selection , 2007 .

[41]  David Cushing,et al.  Climate and fisheries , 1982 .

[42]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[43]  Zhi-Hua Zhou,et al.  Three perspectives of data mining , 2003, Artif. Intell..

[44]  Meiners Mandujano,et al.  Importancia de la variabilidad climática en las pesquerías y biología de la merluza europea merluccius merluccius (linnaeus, 1758) de la costa noroccidental africana , 2007 .

[45]  David G. Stork,et al.  Pattern Classification , 1973 .

[46]  R.I.C. Chris Francis,et al.  Measuring the strength of environment–recruitment relationships: the importance of including predictor screening within cross-validations , 2006 .

[47]  Ángel Borja,et al.  Relationships between anchovy (Engraulis encrasicolus L.) recruitment and the environment in the Bay of Biscay , 1996 .

[48]  Pierre Petitgas,et al.  The influence of mesoscale ocean processes on anchovy (Engraulis encrasicolus) recruitment in the Bay of Biscay estimated with a three‐dimensional hydrodynamic mode , 2001 .

[49]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[50]  W. Ricker Stock and Recruitment , 1954 .

[51]  Ding-Geng Chen,et al.  A neural network model for forecasting fish stock recruitment , 1999 .

[52]  Pedro Larrañaga,et al.  Wrapper discretization by means of estimation of distribution algorithms , 2007, Intell. Data Anal..

[53]  Geoffrey I. Webb,et al.  Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[54]  Eibe Frank,et al.  Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.

[55]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[56]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[57]  Gabriel Navarro,et al.  A Bayesian model for anchovy (Engraulis encrasicolus): the combined forcing of man and environment , 2009 .

[58]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[59]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[60]  Ding-Geng Chen,et al.  Recruitment prediction with genetic algorithms with application to the Pacific Herring fishery , 2007 .

[61]  Pierre Petitgas,et al.  Historical fluctuations in spawning location of anchovy (Engraulis encrasicolus) and sardine (Sardina pilchardus) in the Bay of Biscay during 1967-73 and 2000-2004 , 2007 .

[62]  A. Bakun Patterns in the ocean: Ocean processes and marine population dynamics , 1996 .

[63]  George C. Reid,et al.  Influence of solar variability on global sea surface temperatures , 1987, Nature.

[64]  Jiashun Jin,et al.  Impossibility of successful classification when useful features are rare and weak , 2009, Proceedings of the National Academy of Sciences.

[65]  Lorenzo Motos,et al.  Distribution and abundance of European hake Merluccius merluccius (L.), eggs and larvae in the North East Atlantic waters in 1995 and 1998 in relation to hydrographic conditions Running title: Merluccius merluccis eggs and larvae distribution and hydrographic conditions. , 2004 .

[66]  Luís Torgo,et al.  Search-Based Class Discretization , 1997, ECML.

[67]  Leonard E. Trigg,et al.  Technical Note: Naive Bayes for Regression , 2000, Machine Learning.

[68]  J. Kennedy,et al.  Improved Analyses of Changes and Uncertainties in Sea Surface Temperature Measured In Situ since the Mid-Nineteenth Century: The HadSST2 Dataset , 2006 .

[69]  Nils Chr. Stenseth,et al.  Recruitment of walleye pollock in a physically and biologically complex ecosystem: A new perspective , 2005 .

[70]  A. Barnston,et al.  Classification, seasonality and persistence of low-frequency atmospheric circulation patterns , 1987 .

[71]  Jose A. Lozano,et al.  A sensitivity study of bias and variance of k-fold cross-validation in prediction error estimation , 2009 .

[72]  Antonio Bode,et al.  Recent changes in the pelagic ecosystem of the Iberian Atlantic in the context of multidecadal variability , 2006 .

[73]  Pierre Geurts,et al.  Investigation and Reduction of Discretization Variance in Decision Tree Induction , 2000, ECML.

[74]  Jan Horbowy,et al.  Incorporating environmental variability in stock assessment: predicting recruitment, spawner biomass, and landings of sprat (Sprattus sprattus) in the Baltic Sea , 2008 .

[75]  Kate Revoredo,et al.  Search-Based Class Discretization for Hidden Markov Model for Regression , 2004, SBIA.

[76]  Dag L. Aksnes,et al.  Modelling the influence of light, turbulence and ontogeny on ingestion rates in larval cod and herring , 1998 .

[77]  Pierre Petitgas,et al.  The influence of environment and spawning distribution on the survival of anchovy (Engraulis encrasicolus) larvae in the Bay of Biscay (NE Atlantic) investigated by biophysical simulations , 2007 .

[78]  Øyvind Fiksen,et al.  Could Biscay Bay Anchovy recruit through a spatial loophole , 2007 .

[79]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[80]  Benjamin Planque,et al.  Quantile regression models for fish recruitment-environment relationships : four case studies , 2008 .

[81]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[82]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[83]  Concha Bielza,et al.  Comparison of Bayesian networks and artificial neural networks for quality detection in a machining process , 2009, Expert Syst. Appl..

[84]  S. Fiske,et al.  The Handbook of Social Psychology , 1935 .

[85]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[86]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.