Self-Organization and Identification of Web Communities

The vast improvement in information access is not the only advantage resulting from the increasing percentage of hyperlinked human knowledge available on the Web. Additionally, much potential exists for analyzing interests and relationships within science and society. However, the Web's decentralized and unorganized nature hampers content analysis. Millions of individuals operating independently and having a variety of backgrounds, knowledge, goals and cultures author the information on the Web. Despite the Web's decentralized, unorganized, and heterogeneous nature, our work shows that the Web self-organizes and its link structure allows efficient identification of communities. This self-organization is significant because no central authority or process governs the formation and structure of hyperlinks.

[1]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[2]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[3]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[4]  Frank Harary,et al.  Graph Theory , 2016 .

[5]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[6]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[7]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[8]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[9]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[10]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[11]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[12]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[13]  Eugene Garfield,et al.  Citation indexing: its theory and application in science , 1979 .

[14]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[15]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[16]  Donna L. Hoffman,et al.  Bridging the Digital Divide: The Impact of Race on Computer Access and Internet Use. , 1998 .

[17]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[18]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.