The Need for Open Source Software in Machine Learning

Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for diverse applications. However, the true potential of these methods is not used, since existing implementations are not openly shared, resulting in software with low usability, and weak interoperability. We argue that this situation can be significantly improved by increasing incentives for researchers to publish their software under an open source model. Additionally, we outline the problems authors are faced with when trying to publish algorithmic implementations of machine learning methods. We believe that a resource of peer reviewed software accompanied by short articles would be highly valuable to both the machine learning and the general scientific community.

[1]  D. Kronick The origins and development of the scientific and technological periodical press, 1665-1790 , 1956 .

[2]  L. S. King : A History of Scientific and Technical Periodicals: The Origins and Development of the Scientific and Technological Press, 1665-1790 , 1962 .

[3]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[4]  D. Kronick A history of scientific & technical periodicals: The origins and development of the scientific and technical press, 1665-1790 , 1976 .

[5]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[6]  Ann C. Schaffer The Future of Scientific Journals : Lessons from the Past , 1995 .

[7]  Eric S. Raymond,et al.  The cathedral and the bazaar , 1998, First Monday.

[8]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[9]  James Demmel,et al.  LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[10]  Nikolai Bezroukov,et al.  Open Source Software Development as a Special Type of Academic Research (Critique of Vulgar Raymondism) , 1999, First Monday.

[11]  Eric S. Raymond,et al.  The Cathedral & the Bazaar , 1999 .

[12]  G. Franck Scientific Communication--A Vanity Fair? , 1999, Science.

[13]  P. Salus The Cathedral and the Bazaar , 2000 .

[14]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[15]  Brian Fitzgerald,et al.  Legal Issues Relating to Free and Open Source Software , 2001 .

[16]  Christopher M. Kelty,et al.  Free Software/Free Science , 2001, First Monday.

[17]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[18]  Harold W. Thimbleby,et al.  Explaining code for publication , 2003, Softw. Pract. Exp..

[19]  Eric S. Raymond,et al.  The Art of Unix Programming , 2003 .

[20]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[21]  Budi Arief,et al.  focus developing with open source software The Many Meanings of Open Source , 2022 .

[22]  Blaz Zupan,et al.  Orange: From Experimental Machine Learning to Interactive Data Mining , 2004, PKDD.

[23]  Michelle Levesque,et al.  Fundamental issues with open source software development , 2004, First Monday.

[24]  Mikko Välimäki,et al.  The rise of open source licensing : a challenge to the use of intellectual property in the software industry , 2005 .

[25]  Let data speak to data , 2005, Nature.

[26]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[27]  J. Mokyr,et al.  The Intellectual Origins of Modern Economic Growth , 2005, The Journal of Economic History.

[28]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[29]  Kwei-Jay Lin,et al.  Open Source Licenses and the Creative Commons Framework: License Selection and Comparison , 2006, J. Inf. Sci. Eng..

[30]  Luca Zanni,et al.  Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems , 2006, J. Mach. Learn. Res..

[31]  Hilmar Lapp,et al.  Open source tools and toolkits for bioinformatics: significance, and where are we? , 2006, Briefings Bioinform..

[32]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[33]  I. Tsang,et al.  Authors' Reply to the "Comments on the Core Vector Machines: Fast SVM Training on Very Large Data Sets" , 2007 .

[34]  Jessica Coates Creative Commons : the next generation : CreativeCommons licence use five years on , 2007 .

[35]  Dirk Riehle,et al.  The Economic Motivation of Open Source Software: Stakeholder Perspectives , 2007, Computer.

[36]  Stéphane Canu,et al.  Comments on the "Core Vector Machines: Fast SVM Training on Very Large Data Sets" , 2007, J. Mach. Learn. Res..