Proceedings of the IJCAI 2017 Workshop on Learning in the Presence of Class Imbalance and Concept Drift (LPCICD'17)

With the wide application of machine learning algorithms to the real world, class imbalance and concept drift have become crucial learning issues. Class imbalance happens when the data categories are not equally represented, i.e., at least one category is minority compared to other categories. It can cause learning bias towards the majority class and poor generalization. Concept drift is a change in the underlying distribution of the problem, and is a significant issue specially when learning from data streams. It requires learners to be adaptive to dynamic changes. Class imbalance and concept drift can significantly hinder predictive performance, and the problem becomes particularly challenging when they occur simultaneously. This challenge arises from the fact that one problem can affect the treatment of the other. For example, drift detection algorithms based on the traditional classification error may be sensitive to the imbalanced degree and become less effective; and class imbalance techniques need to be adaptive to changing imbalance rates, otherwise the class receiving the preferential treatment may not be the correct minority class at the current moment. Therefore, the mutual effect of class imbalance and concept drift should be considered during algorithm design. The aim of this workshop is to bring together researchers from the areas of class imbalance learning and concept drift in order to encourage discussions and new collaborations on solving the combined issue of class imbalance and concept drift. It provides a forum for international researchers and practitioners to share and discuss their original work on addressing new challenges and research issues in class imbalance learning, concept drift, and the combined issues of class imbalance and concept drift. The proceedings include 8 papers on these topics.

[1]  Kathleen Daly Volume 7 , 1998 .

[2]  Thomas Gärtner,et al.  Efficient co-regularised least squares regression , 2006, ICML.

[3]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[4]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[5]  E. Silerova,et al.  Knowledge and information systems , 2018 .

[6]  Soma Bandyopadhyay,et al.  Heart-trend: An affordable heart condition monitoring system exploiting morphological pattern , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Soma Bandyopadhyay,et al.  Analysis of phonocardiogram signals through proactive denoising using novel self-discriminant learner , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[8]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[9]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[10]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[11]  Christoph Adami,et al.  Darwin inside the machines: Malware evolution and the consequences for computer security , 2011, ArXiv.

[12]  Gregory E. Fasshauer,et al.  On choosing “optimal” shape parameters for RBF approximation , 2007, Numerical Algorithms.

[13]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  C. A. Murthy,et al.  An unsupervised learning for robust cardiac feature derivation from PPG signals , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[15]  Aderemi Oluyinka Adewumi,et al.  Stock Price Prediction Using the ARIMA Model , 2014, 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation.

[16]  Nitesh V. Chawla,et al.  Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains , 2011, J. Artif. Intell. Res..

[17]  Zhi-Hua Zhou,et al.  Semi-Supervised Regression with Co-Training Style Algorithms , 2007 .

[18]  Sepideh Hajipour Sardouie,et al.  Denoising of Ictal EEG Data Using Semi-Blind Source Separation Methods Based on Time-Frequency Priors , 2015, IEEE journal of biomedical and health informatics.

[19]  Proceedings of the 25th ACM International on Conference on Information and Knowledge Management , 2016 .

[20]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[21]  Qin Lu,et al.  A Novel Class Noise Estimation Method and Application in Classification , 2015, CIKM.

[22]  Lior Rokach,et al.  Unknown malware detection using network traffic classification , 2015, 2015 IEEE Conference on Communications and Network Security (CNS).

[23]  Sungzoon Cho,et al.  Semi-supervised support vector regression based on self-training with label uncertainty: An application to virtual metrology in semiconductor manufacturing , 2016, Expert Syst. Appl..

[24]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[25]  João Gama,et al.  On evaluating stream learning algorithms , 2013, Machine Learning.

[26]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[27]  Gianluca Bontempi,et al.  On the Use of Variable Complementarity for Feature Selection in Cancer Classification , 2006, EvoWorkshops.

[28]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[29]  Yun Sing Koh,et al.  Proactive drift detection: Predicting concept drifts in data streams using probabilistic networks , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[30]  Burr Settles,et al.  From Theories to Queries: Active Learning in Practice , 2011 .

[31]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[32]  Changsheng Li,et al.  MORES: Online Incremental Multiple-Output Regression for Data Streams , 2014, ArXiv.

[33]  Wouter Joosen,et al.  Evolutionary algorithms for classification of malware families through different network behaviors , 2014, GECCO.

[34]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[35]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[36]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[37]  Adrian D. C. Chan,et al.  Automated Biosignal Quality Analysis for Electromyography Using a One-Class Support Vector Machine , 2014, IEEE Transactions on Instrumentation and Measurement.

[38]  Ingrid Renz,et al.  Adaptive Information Filtering: Learning in the Presence of Concept Drifts , 1998 .

[39]  Kangbin Yim,et al.  Malware Obfuscation Techniques: A Brief Survey , 2010, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications.

[40]  Yang Liu,et al.  Adaptive and scalable Android malware detection through online learning , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[41]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[42]  H. Brendan McMahan,et al.  A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..

[43]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[44]  Arpan Pal,et al.  Classification of normal and abnormal heart sound recordings through robust feature selection , 2016, 2016 Computing in Cardiology Conference (CinC).

[45]  Henk A. van der Vorst,et al.  Numerical Algorithms , 2011, Encyclopedia of Parallel Computing.

[46]  C. A. Murthy,et al.  3S: Sensing Sensor Signal: Demo Abstract , 2016, SenSys.

[47]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[48]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[49]  Xinming Yan,et al.  Adapting ELM to Time Series Classification: A Novel Diversified Top-k Shapelets Extraction Method , 2016, ADC.

[50]  Elvan Ceyhan,et al.  Classification of Imbalanced Data with a Geometric Digraph Family , 2019, J. Mach. Learn. Res..

[51]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[52]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[53]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[54]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[55]  João Gama,et al.  Multi-target regression from high-speed data streams with adaptive model rules , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[56]  Ali A. Ghorbani,et al.  Automated malware classification based on network behavior , 2013, 2013 International Conference on Computing, Networking and Communications (ICNC).

[57]  Alice M. Obenchain-Leeson,et al.  Volume 6 , 1998 .

[58]  Lakhmi C. Jain,et al.  Handbook on Neural Information Processing , 2013, Handbook on Neural Information Processing.

[59]  Shuliang Wang,et al.  Data Mining and Knowledge Discovery , 2012, Springer Handbook of Geographic Information.

[60]  João Gama,et al.  Adaptive Model Rules From High-Speed Data Streams , 2014, BigMine.

[61]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[62]  A. Sayed,et al.  Foundations and Trends ® in Machine Learning > Vol 7 > Issue 4-5 Ordering Info About Us Alerts Contact Help Log in Adaptation , Learning , and Optimization over Networks , 2011 .

[63]  Terry Benzel,et al.  Deterlab testbed for cybersecurity research and education , 2013 .

[64]  David A. Clifton,et al.  Signal-Quality Indices for the Electrocardiogram and Photoplethysmogram: Derivation and Applications to Wireless Monitoring , 2015, IEEE Journal of Biomedical and Health Informatics.

[65]  Philip S. Yu,et al.  2014 IEEE International Conference on Data Mining , 2014 .

[66]  A AdesegunOreoluwa A Review of the Effectiveness of Malware Signature Databases against Metamorphic Malwares , 2015 .

[67]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[68]  Daniel Rudoy,et al.  One-Class Support Vector Machines: Methods and Applications , 2008 .

[69]  Zaid Chalabi,et al.  Time series regression model for infectious disease and weather. , 2015, Environmental research.

[70]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[71]  Jun-Ming Xu,et al.  OASIS: Online Active Semi-Supervised Learning , 2011, AAAI.

[72]  Jian Pei,et al.  Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining , 2012, KDD 2012.

[73]  João Gama,et al.  Random rules from data streams , 2013, SAC '13.

[74]  Gilles Cohen,et al.  One-Class Support Vector Machines with a Conformal Kernel. A Case Study in Handling Class Imbalance , 2004, SSPR/SPR.

[75]  Aziz Makandar,et al.  Malware analysis and classification using Artificial Neural Network , 2015, 2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15).

[76]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[77]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[78]  Michael Carl Tschantz,et al.  Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels , 2015, AISec@CCS.