Predicting the h-index with cost-sensitive naive Bayes

Bibliometric indices are an increasingly important topic for the scientific community nowadays. One of the most successful bibliometric indices is the well-known h-index. In view of the attention attracted by this index, our research is based on the construction of several prediction models to forecast the h-index of Spanish professors (with a permanent position) for a four-year time horizon. We built two different types of models (junior models and senior models) to differentiate between professors' seniority. These models are learnt from bibliometric data using a cost-sensitive naive Bayes approach that takes into account the expected cost of instances predictions at classification time. Results show that it is easier to predict the h-index of the one-year time horizon than the others, that is, it has a higher average accuracy and lower average total cost than the others. Similarly, it is easier to predict the h-index of junior professors than senior professors.

[1]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[2]  Ronald Rousseau,et al.  The power law model and total career h-index sequences , 2008, J. Informetrics.

[3]  Pedro Larrañaga,et al.  Bioinformatics Advance Access published August 24, 2007 A review of feature selection techniques in bioinformatics , 2022 .

[4]  Francisco Herrera,et al.  q2-Index: Quantitative and qualitative evaluation based on the number and impact of papers in the Hirsch core , 2010, J. Informetrics.

[5]  Yannis Manolopoulos,et al.  Generalized Hirsch h-index for disclosing latent facts in citation networks , 2007, Scientometrics.

[6]  F. J. Cabrerizoa,et al.  q 2-Index : Quantitative and qualitative evaluation based on the number and impact of papers in the Hirsch core , 2009 .

[7]  L. Egghe An improvement of the h-index: the g-index , 2006 .

[8]  José M. Soler A rational indicator of scientific creativity , 2007, J. Informetrics.

[9]  Lutz Bornmann,et al.  Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine , 2008, J. Assoc. Inf. Sci. Technol..

[10]  Concha Bielza,et al.  Predicting citation count of Bioinformatics papers within four years of publication , 2009, Bioinform..

[11]  Alexander von Eye,et al.  Forecasting trends of development of psychology from a bibliometric perspective , 2011, Scientometrics.

[12]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[13]  Mônica G. Campiteli,et al.  Is it possible to compare researchers with different scientific interests? , 2006, Scientometrics.

[14]  Francisco Herrera,et al.  h-Index: A review focused in its variants, computation and standardization for different scientific fields , 2009, J. Informetrics.

[15]  Francisco Herrera,et al.  hg-index: a new index to characterize the scientific output of researchers based on the h- and g-indices , 2010, Scientometrics.

[16]  Leo Egghe,et al.  Dynamic h-index: The Hirsch index in function of time , 2007, J. Assoc. Inf. Sci. Technol..

[17]  Leo Egghe,et al.  An informetric model for the Hirsch-index , 2006, Scientometrics.

[18]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[19]  Oguz K. Baskurt,et al.  Time series analysis of publication counts of a university: what are the implications? , 2011, Scientometrics.

[20]  Leo Egghe,et al.  The Hirsch index and related impact measures , 2010, Annu. Rev. Inf. Sci. Technol..

[21]  Richard S. J. Tol,et al.  Rational (successive) h-indices: An application to economics in the Republic of Ireland , 2008, Scientometrics.