Computationally efficient scoring of activity using demographics and connectivity of entities

Consider a collection of entities, where each may have some demographic properties, and where the entities may be linked in some kind of, perhaps social, network structure. Some of these entities are “of interest”—we call them active. What is the relative likelihood of each of the other entities being active? AFDL, Activity from Demographics and Links, is an algorithm designed to answer this question in a computationally-efficient manner. AFDL is able to work with demographic data, link data (including noisy links), or both; and it is able to process very large datasets quickly. This paper describes AFDL’s feature extraction and classification algorithms, gives timing and accuracy results obtained for several datasets, and offers suggestions for its use in real-world situations.

[1]  George E. P. Box,et al.  Empirical Model‐Building and Response Surfaces , 1988 .

[2]  Andrew W. Moore,et al.  Memory-based Stochastic Optimization , 1995, NIPS.

[3]  Artur Dubrawski,et al.  Stochastic validation for automated tuning of neural network's hyper-parameters , 1997, Robotics Auton. Syst..

[4]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[5]  Yiming Yang,et al.  Stochastic link and group detection , 2002, AAAI/IAAI.

[6]  Corinna Cortes,et al.  Communities of interest , 2001, Intell. Data Anal..

[7]  Pedro M. Domingos Prospects and challenges for multi-relational data mining , 2003, SKDD.

[8]  Andrew W. Moore,et al.  Logistic regression for data mining and high-dimensional classification , 2004 .

[9]  David Wai-Lok Cheung,et al.  Effect of Data Distribution in Parallel Mining of Associations , 1999, Data Mining and Knowledge Discovery.

[10]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[11]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[12]  Foster Provost,et al.  Suspicion scoring of networked entities based on guilt-by-association, collective inference, and focused data access 1 , 2005 .

[13]  Jennifer Neville,et al.  Using relational knowledge discovery to prevent securities fraud , 2005, KDD '05.

[14]  Foster J. Provost,et al.  A Brief Survey of Machine Learning Methods for Classification in Networked Data and an Application to Suspicion Scoring , 2006, SNA@ICML.

[15]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..