ABSTRACT Title of dissertation: THE DYNAMICS OF MULTI-MODAL NETWORKS Hossam Sharara, Doctor of Philosophy, 2012 Dissertation directed by: Professor Lise Getoor Department of Computer Science The widespread study of networks in diverse domains, including social, technological, and scientific settings, has increased the interest in statistical and machine learning techniques for network analysis. Many of these networks are complex, involving more than one kind of entity, and multiple relationship types, both changing over time. While there have been many network analysis methods proposed for problems such as network evolution, community detection, infor- mation diffusion and opinion leader identification, the majority of these methods assume a single entity type, a single edge type and often no temporal dynamics. One of the main shortcomings of these traditional techniques is their inadequacy for capturing higher-order dependencies often present in real, complex networks. To address these shortcomings, I focus on analysis and inference in dy- namic, multi-modal, multi-relational networks, containing multiple entity types (such as people, social groups, organizations, locations, etc.), and different re- lationship types (such as friendship, membership, affiliation, etc.). An example from social network theory is a network describing users, organizations and in- terest groups, where users have different types of ties among each other, such as friendship, family ties, etc., as well as affiliation and membership links with organizations and interest groups. By considering the complex structure of these networks rather than limiting the analysis to a single entity or relationship type, I show how we can build richer predictive models that provide better understand- ing of the network dynamics, and thus result in better quality predictions. In the first part of my dissertation, I address the problems of network evolu- tion and clustering. For network evolution, I describe methods for modeling the interactions between different modalities, and propose a co-evolution model for social and affiliation networks. I then move to the problem of network clustering, where I propose a novel algorithm for clustering multi-modal, multi-relational data. The second part of my dissertation focuses on the temporal dynamics of in- teractions in complex networks, from both user-level and network-level perspec- tives. For the user-centric approach, I analyze the dynamics of user relationships with other entity types, proposing a measure of the ?loyalty? a user shows for a given group or topic, based on her temporal interaction pattern. I then move to macroscopic-level approaches for analyzing the dynamic processes that occur on a network scale. I propose a new differential adaptive diffusion model for in- corporating diversity and trust in the process of information diffusion on multi- modal, multi-relational networks. I also discuss the implications of the proposed diffusion model on designing new strategies for viral marketing and influential detection. I validate all the proposed methods on several real-world networks from multiple domains. THE DYNAMICS OF MULTI-MODAL NETWORKS by Hossam Samy Elsaid Ibrahim Sharara Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2012 Advisory Committee: Professor Lise Getoor, Chair/Advisor Professor Ashok Agrawala Professor Carl Kingsford Professor William Rand Professor Mark Austin c Copyright by Hossam Samy Elsaid Ibrahim Sharara 2012 Foreword Portions of this dissertation are derived from research and publications co- authored by the candidate and published elsewhere. Chapter 2 is based on the paper Co-evolution of social and affiliation networks[122]. The multi-relational clus- tering work in Chapter 3 is based on the paper Multi-relational Affinity Propaga- tion[96]. The loyalty measure proposed in Chapter 4 is based on the journal article Understanding Actor Loyalty to Event-Based Groups in Affiliation Networks[99]. The information diffusion and viral marketing work in Chapters 5 and 6 are exten- sions of the work in Differential Adaptive Diffusion: Understanding Diversity and Learning whom to Trust in Viral Marketing[98]. Finally, the active surveying model proposed in Chapter 7 is based on the paper Active Surveying: A Probabilistic Ap- proach for Identifying Key Opinion Leaders[97]. ii Acknowledgments First and foremost, I would like to sincerely thank my advisor, Prof. Lise Getoor, for her continuous help and support throughout my PhD research. She is the one who introduced me to the field of relational machine learning and network analysis, and opened up a whole new spectrum of ideas and research opportunities in this very interesting domain that build up upon my prior interest in data mining and machine learning. She has been an excellent mentor and tutor who always guided me for structuring and organizing my ideas, setting high standards to achieve the best quality for my work, and enhancing my skills to be a good researcher, bearing my best interest in mind. On the personal side, her moral support, appreciation for my work, and continuous encouragement have been of extreme value for helping me getting through my PhD studies. I would also like to thank my colleagues in the LINQs group - Galileo Namata, Elena Zheleva, Mustafa Bilgic, Walaa Moustafa, Lilyana Mihalkova, Stanley Kok, Bert Huang, Stephen Bach, Jay Pujara, Ben London, Alex Memory, and Theodoros Rekatsinas, for their useful discussions, and providing a fruitful collaborative work atmosphere. I would like to specially thank Elena, Galileo, Mustafa, and Walaa for their invaluable friendship and continuous support. I thank all my collaborators, especially Prof. William Rand from Robert H. Smith School of Business, who opened up a whole new perspective on the appli- cations of my research in marketing domains. Prof. Rand has always provided me with helpful insights throughout our collaborative research, and with career iii advice and opportunities during my job-hunting process. I also thank Prof. Lisa Singh from Georgetown University for our fruitful collaboration. I owe my deepest gratitude to my wife, Hend, for taking good care of our little family, and always being there for me. I couldn?t have made it through with- out her patience throughout my prolonged working hours and her continuous support through stressful times. I also thank our little son, Youssef, for always bringing joy and happiness to our life. I would also like to thank my Egyptian friends here in College Park, espe- cially my social support group who provided me with a warm, loving environ- ment which helped me and my family settle in College Park, endure our home- sickness, and and get through various tough times. Finally, I would like to thank my parents, who have continuously supported my decisions, and encouraged me to pursue the path I desire in life. Without them, this journey would have not been possible. iv Table of Contents List of Figures viii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Scientific Networks . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 General Notation and Definitions . . . . . . . . . . . . . . . . . . . . 9 1.3 Contributions and Organization . . . . . . . . . . . . . . . . . . . . 10 I Multi-Modal Networks Structure and Organization 13 2 The Co-evolution of Social and Affiliation Networks 14 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Evolution of social networks . . . . . . . . . . . . . . . . . . 17 2.2.2 Evolution of affiliation networks . . . . . . . . . . . . . . . . 18 2.3 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1 Group size distribution . . . . . . . . . . . . . . . . . . . . . 20 2.3.2 Node degree vs. Average number of group affiliations . . . 20 2.3.3 Distribution of the number of group affiliations . . . . . . . 21 2.3.4 Properties of group members . . . . . . . . . . . . . . . . . . 22 2.4 Co-evolution properties and model . . . . . . . . . . . . . . . . . . . 24 2.4.1 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.2 Desired properties . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.3 Co-evolution model . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5.1 Synthetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5.2 Real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.3 Comparison with the na??ve model . . . . . . . . . . . . . . . 39 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3 Multi-Relational Affinity Propagation 42 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3.2 Message Derivation . . . . . . . . . . . . . . . . . . . . . . . 53 3.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.4.2 Social Media Data . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 v II The Temporal Dynamics of Multi-Modal Networks 66 4 Understanding Actor Loyalty to Groups in Affiliation Networks 67 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2 Loyalty Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3 Modeling time-varying event-based groups . . . . . . . . . . . . . . 73 4.4 Loyalty of Individuals to Affiliation Groups . . . . . . . . . . . . . . 77 4.5 Loyalty Analysis on Individual Data Sets . . . . . . . . . . . . . . . 84 4.5.1 Scientific Publication Network . . . . . . . . . . . . . . . . . 84 4.5.2 Senate Bill Sponsorship Network . . . . . . . . . . . . . . . . 86 4.5.3 Dolphin Social Network . . . . . . . . . . . . . . . . . . . . . 90 4.6 Comparative Loyalty Analysis . . . . . . . . . . . . . . . . . . . . . 93 4.7 Comparison with centrality measures . . . . . . . . . . . . . . . . . 95 4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5 Differential Adaptive Diffusion: Understanding Diversity and Trust Dy- namics in Complex Networks 99 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3 Case Study: Digg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.3.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.4 Differential Adaptive Diffusion . . . . . . . . . . . . . . . . . . . . . 110 5.5 Influentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.6 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.6.1 Predicting Adoptions . . . . . . . . . . . . . . . . . . . . . . 116 5.6.2 Identifying Influentials . . . . . . . . . . . . . . . . . . . . . . 119 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6 Adaptive Viral Marketing 122 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.3 Conceptual Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.4.1 Fully Observable Mode . . . . . . . . . . . . . . . . . . . . . 131 6.4.2 Learning Preferences Mode . . . . . . . . . . . . . . . . . . . 132 6.4.3 Effect of Spammers . . . . . . . . . . . . . . . . . . . . . . . . 133 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7 Active Surveying: A Probabilistic Approach for Identifying Key Opinion Leaders 136 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.2.1 Opinion Leader Identification . . . . . . . . . . . . . . . . . . 139 7.2.2 Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 vi 7.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8 Conclusion and Future Directions 155 8.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 155 8.1.1 Network Evolution . . . . . . . . . . . . . . . . . . . . . . . . 155 8.1.2 Multi-relational Clustering . . . . . . . . . . . . . . . . . . . 156 8.1.3 Bi-modal Interaction Dynamics . . . . . . . . . . . . . . . . . 157 8.1.4 Information Diffusion . . . . . . . . . . . . . . . . . . . . . . 158 8.1.5 Adaptive Viral Marketing . . . . . . . . . . . . . . . . . . . . 159 8.1.6 Active Surveying . . . . . . . . . . . . . . . . . . . . . . . . . 159 8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 8.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Bibliography 162 vii List of Figures 1.1 Social network example . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 A dynamic, multi-modal, multi-relational view of the social net- work example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Scientific network example . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Distribution of group sizes on a log-log scale. . . . . . . . . . . . . . 21 2.2 Node degree vs. average number of group affiliations . . . . . . . . 22 2.3 Distribution of the number of group affiliations for nodes with dif- ferent degrees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4 Ratio of the number of singletons to the group size (upper series) and ratio of the maximum degree to the group size (lower series). 24 2.5 Degree distribution in a synthetic network . . . . . . . . . . . . . . 33 2.6 Densification in a synthetic network . . . . . . . . . . . . . . . . . . 34 2.7 Degree vs. average number of group affiliations on varying the parameter ( ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.8 Group size distribution on varying the parameter ( ) . . . . . . . . 36 2.9 Group size vs. member attributes on varying the parameter ( ) (dashed line: % ratio of singletons to group size, solid line: % ratio of maximum degree to group size). . . . . . . . . . . . . . . . . . . . 38 2.10 The affiliation properties produced by the na??ve model . . . . . . . 40 3.1 Sample bimodal network . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2 Multi-relational Affinity Propagation Model . . . . . . . . . . . . . 50 3.3 The performance of different clustering approaches for varying the levels of network assortativity. . . . . . . . . . . . . . . . . . . . . . 60 3.4 The effect of varying the cost parameters ( , !) on the net similarity and the modularity of the output clustering . . . . . . . . . . . . . . 62 4.1 center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2 Single actor dynamic affiliation example . . . . . . . . . . . . . . . . 79 4.3 The evolution of loyalty over time for the affiliation network example 82 4.4 The effect of the smoothing factor in calculating group loyalty . . . 83 4.5 The average topic loyalty for the scientific publication network . . 85 4.6 The average topic loyalty grouped by institution type for the sci- entific publication network. . . . . . . . . . . . . . . . . . . . . . . . 86 4.7 Average topic loyalty across all topics in the senator bill sponsor- ship network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.8 Changing loyalty over time for Edward Kennedy in the senate bill sponsorship network . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.9 Average topic loyalty of 2008 presidential candidates in the senate bill sponsorship network . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.10 Average location loyalty for dolphins . . . . . . . . . . . . . . . . . 92 4.11 Average location loyalty for different dolphins? age groups . . . . . 93 viii 4.12 Loyalty Comparison Across Data Sets . . . . . . . . . . . . . . . . . 94 4.13 Loyalty vs. Centrality for Scientific Publication Network . . . . . . 96 4.14 Author publications in ?Information Visualization? topic . . . . . . 97 5.1 Topic distribution of stories in Digg dataset . . . . . . . . . . . . . . 107 5.2 KL-divergence between the topic distribution of users? submissions and diggs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.3 KL-divergence between uniform topic distribution and users? sub- missions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4 Heat map of the average number of diggs for different values of topic divergence between peers across time. . . . . . . . . . . . . . . 112 5.5 ROC performance of two comparison models (Bernoulli and Bernoulli- PC) and the proposed model (Adaptive) on the basis of the False Positive Rate (FPR) and True Positive Rate (TPR) for each model. . 118 5.6 Average number of diggs/post for the top 10% influential users in Digg.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.1 Fully observable mode: Varying the conservation parameter . . . 131 6.2 Learning preferences mode: Varying the conservation parameter 133 6.3 Varying the percentage of spammers at ( = 0:5) . . . . . . . . . . . 134 7.1 Example candidates and leaders sets . . . . . . . . . . . . . . . . . . . 144 7.2 Feature generation for an example author network . . . . . . . . . . 145 7.3 The percentage of respondents (y-axis) needed to identify k% of the opinion leaders (x-axis) at ( = 2) . . . . . . . . . . . . . . . . . . 152 7.4 The percentage reduction in required respondents to identify k% of the opinion leaders at ( = 2) . . . . . . . . . . . . . . . . . . . . . 153 ix Chapter 1 Introduction The unprecedented growth in the availability of network data has recently drawn the attention of researchers from multiple disciplines to network analysis. For instance, with the proliferation of online social networks, researchers are now able to observe and analyze social interactions between individuals on a massive scale. Other examples include biological networks, scientific collaboration net- works, and transportation networks. Analyzing these networks enables us to understand the underlying factors that govern the structures and the behavior of the entities involved, and in some cases allow us to predict future interactions. Much of the existing literature limits the analysis to a static snapshot of the network, focusing on a single type of relationship, or single-mode of interactions, between the target entities. However, networks are dynamic by nature, and of- ten encompass different types of entities and relationships, allowing for complex structures. Thus, limiting the analysis to static, single-mode snapshots of the net- work interactions results in the loss of a wealth of information that could lead to better understanding and prediction. In my dissertation, I focus on reasoning about the dynamics of multi-modal, multi-relational networks, analyzing and modeling the different types of interac- tions that occur within this type of networks, and understanding how these in- 1 teractions evolve over time. My hypothesis is that incorporating the additional network modalities will enhance the capability of different network models in both interpreting existing phenomena in complex networks and predicting fu- ture interactions. 1.1 Motivation Until recently, much of the research effort in statistics, machine learning and data mining has focused on problems in which data is assumed to be independent and identically distributed (iid). However, as the underlying systems became more complex, especially with the widespread use of the internet, there was a growing need for more advanced methods that can take into account the inherent dependencies between different instances. Hence, statistical relational learning (SRL) [35] methods were developed to leverage these relationships in order to improve the performance of learning and mining methods. Although leveraging these relationships resulted in significant performance gains over the traditional methods, reasoning about different network modalities in isolation loses a wealth of information present in both the dynamics of different relationships, as well as the mutual effects across different modes. Next, I discuss examples from two domains to illustrate the utility of ana- lyzing networks at different abstraction levels, taking into account both the dy- namic aspects and the different modalities involved at both the entity and the interaction levels. 2 John Jay Janet Mike Alan Lisa Marc Bob Mary Jane Alice Jude Figure 1.1: Social network example 1.1.1 Social Networks Human networks are complex systems of interactions, encompassing differ- ent types of social relationships among individuals, such as friendship, familial, or collegiate relationships. There are numerous factors governing the behavior of individuals in these networks, some of which are dependent on the type of relationships, while others are cross-relational. Consider a simple example of a friendship network illustrated in Figure 1.1. This social network represents the ego-network of ?John,? showing all the friendship relationships he has with his peers. As noted in this representation, the network is static and includes only friendship links. The conclusions that can be drawn from this representation are limited to some structural observations, with neither a clear interpretation of when or how these friendships were created, nor whether they have any effect on any other hidden interactions ?John? is involved in. By introducing the temporal dimension, as illustrated in Figure 1.2(a), the 3 (a) Temporal evolution of the friendship links (b) Temporal evolution of family ties and organizational affiliation links Figure 1.2: A dynamic, multi-modal, multi-relational view of the social network example temporal evolution of this friendship network can be observed. Figure 1.2(a) shows three snapshots of the friendship network at years 2000, 2005, and 2010. By tracking the evolution of ?John?s? social network, we can see that he started with only four friends in the year 2000, two of which are also friends themselves. Five years later, ?John? befriended all the friends of ?Jane,? while ?Jane? became a friend to both ?Mary? and ?Bob.? This observation can be attributed to different factors such as the typical evolution of social networks, where ?John? is increas- ing his social circle, or that ?Jane? brought in her social connections to strengthen the friendship with ?John,? or for any latent factors that are not observed in the 4 data. In 2010, ?John? introduced three additional individuals to his social net- work that weren?t connected to him or any of his friends before that point. Adding in the temporal aspect helped to understand the steps that the so- cial network has undergone to reach its present structure. This dynamic analysis can then be used to predict which friendship links are probable to occur in the future, as well as gaining insights in studying the diffusion dynamics and the influence between different individuals in the social network. However, this is still insufficient to understand the causal mechanism for the formation of these friendship links, or to study the effects of these relationships on other types of interactions. Next, I consider other relationships that exist among the target set of indi- viduals and how they change over time. Figure 1.2(b) shows both the family and organizational affiliation relationships among individuals in ?John?s? social net- work. First, note that ?Mary? and ?John? are siblings, and hence the friendship relationship between them is caused by this family tie. By investigating the evolu- tion of ?John?s? family network, we discover that ?John? and ?Jane? got married in 2005, as well as ?John?s? friend ?Bob? and his sister ?Mary.? In the light of this additional information, we can now hypothesize that the friendship relation- ships that ?John? created with ?Jane?s? friends were caused by their marriage, which also interprets the new friendship links that occurred between ?Jane? and her now sister-in-law, ?Mary,? and her husband. We might also suggest that the friendship relationship between ?John? and ?Bob? might have had an effect on having ?Bob? and ?John?s? sister, ?Mary,? getting married. Finally, the additional 5 friendship links that ?John? introduced with ?Jane,? ?Mike,? and ?Alan? in 2010, are directly correlated with the employed-by relationships they share with the same company, which represent a different node type in the example network. We can then presume that ?John?s? new affiliation is the reason behind forming these new friendship links. This simple example illustrates how different types of relationships between different entities have mutual effects on each other, and how these relationships progress over time. The hypotheses and conclusions that could be drawn by in- corporating different modalities and the dynamics of the social network, are sig- nificantly different from the ones derived from the static, single mode network snapshot. Doing this type of analysis on social networks gives us better insights into the interaction dynamics and the causal mechanism underlying the network evolution. 1.1.2 Scientific Networks Next, I consider another example from a different domain: scientific net- works. These types of networks include scientific and academic collaboration networks, such as citation networks, authorship networks, and scientific collab- oration networks. Scientific networks are multi-modal by nature, including dif- ferent types of entities, such as researchers and publications in authorship net- works, and different types of relationships, such as co-authorships and citations between researchers. One of the commonly used methods for modeling scientific 6 A5 A2 A1 A3 A4 (a) Co-authorship network example A5 A2 A1 A3 A4 D1 D1 D1 D1 advise 2005 advis e 201 0 advis e 200 5 advis e 200 8 2007 200 7 200 7 2008 200 8 200 9 2009 2011 20 11 20 11 cites cite s (b) Multi-modal view of authorship, ci- tation and academic advisory networks Figure 1.3: Scientific network example networks is projecting the network modalities of interest onto a single mode, then analyzing the resulting single mode network. Figure 1.3(a) shows a simple co-authorship network between five authors. The edges between authors in the co-authorship network indicate that the cor- responding authors have co-authored a publication together. The preliminary analysis of this example network shows that author A1 is probably working in a different research area than authors A2, A3, and A4, indicated by the fact that there does not exist any collaboration between them. There is little that can be concluded from the information provided by this network. However, by extending our analysis to the dynamics of different network modalities, shown in Figure 1.3(b), we discover the existence of numerous fac- tors that were not observable in the simpler, single mode version of the network. First, by analyzing the academic advisory network, indicated by the directed solid edges, we note that authors A1, A2 and A3 are actually students advised 7 by author A5 at different points in time. Also, the recent advice relationship from author A2 to author A4 suggest that A2 has already graduated and is now hold- ing an academic position, with A4 being her student. The authorship network, indicated by the dashed line between the authors and the corresponding publi- cations, disambiguates the 3-cliques in Figure 1.3(a), showing that both of them represent the corresponding authors working on a single publication. Finally, the citation relationship between publication D1, and both publications D2 and D3, suggests that our judgement that authorA1 is working in a different research area is probably wrong. The missing collaboration between author A1 and authors A2 and A3 can be attributed to the fact that A1 has recently joined the group, indi- cated by the date of the advisory relationship between her and author A5. This small example of a co-authorship network in a typical research group illustrates the deficiencies in analyzing a projected, single mode network. An- alyzing the full network, taking into consideration both the temporal dynamics as well as the different modalities, is capable of revealing numerous factors that directly impact the accuracy of the predicted model. The previous examples show the utility of the different network modali- ties as well as their temporal dynamics in understanding the dependencies that need to be captured in causal, predictive and discriminative network models. In my dissertation, I focus on incorporating this additional information in the latter two types of network models, as well as other tasks related to general network analysis. Next, I discuss the general notations and definitions used to represent multi-modal, multi-relational networks. 8 1.2 General Notation and Definitions Network data is often represented as a graph G(V;E), where entities are represented by nodes (v 2 V ), and relationships are represented by correspond- ing edges (e 2 E). Both nodes and edges can be associated with a set of fea- tures describing the corresponding entity or relationship. For representing time in dynamic networks Gt(Vt; Et), the graph elements are often associated with a temporal variable representing the creation time of the corresponding entity or relationship. Multi-modal networks refer to networks comprised of multiple entity types, while multi-relational networks refer to different types of relationships across the underlying entities. For simplicity, I use the term multi-modal networks hereafter to refer to multi-modal, multi-relational networks by generalizing the network modalities to include both node and edge types. These networks can be repre- sented by adding a type construct to the graph nodes and edges. Thus, a multi- modal network can be represented a graph G(V ; E), where V = fV 1; V 2; : : : ; V ng, and E = fE1; E2; : : : ; Emg. Each set V i 2 V represents a set of nodes of type i, and each set Ej 2 E represent a set of edges of type j. This representation enables us to account for the heterogeneity in both the entities and relationships. An exam- ple of this type of networks is social and affiliation network, where the network is represented as a multi-modal graph among individuals and organizations. The nodes are represented as V = fV person; V organizationg, while the social links exist among the ?person?-type nodes (Esocial V person V person), and the affiliation 9 links exist only across node types (Eaffiliation V person V organization). To account for the network dynamics, I associate a temporal variable with the elements of the complex graph Gt(Vt; Et), denoting the creation time of the associated entity. Thus, a dynamic, multi-modal network can be represented as a set of graphs G = fG1; G2; : : : ; GTg, where each graph corresponds to a static, multi-modal network at the corresponding time point. This representation gives us the required flexibility to both analyze the dynamics of each type of relation- ship separately, as well as investigate the cross-relation effects as dependencies between the corresponding graphs. 1.3 Contributions and Organization This thesis is organized into two parts that cover different aspects of multi- modal networks. In the first part, I focus on the construction and organization of multi-modal networks. I start by investigating the evolution of multi-modal networks over time in Chapter 2. As an example setting, I analyze the growth patterns and relationships that occur in social and affiliation networks. Based on the observed growth patterns, I then propose a coupled generative model that captures the statistical properties of complex networks. I show that the proposed co-evolution model is able to mimic the evolution of real multi-modal networks, bringing new insights about the role of friendship in joining social groups. After modeling the evolution process in multi-modal networks, I move in Chapter 3 to investigating network clustering as an initial data exploration ap- 10 proach for characterizing the organization of complex networks. Though there have been numerous approaches for clustering, they are often limited to data from the same type, and focus on either attribute similarity or structural connec- tivity. To address these shortcomings, I propose a multi-relational affinity prop- agation model for clustering complex networks. My proposed approach extends the message-passing affinity propagation clustering framework by encoding ad- ditional soft relational constraints to capture the dependencies across different node types. This formulation facilitates the exploration of the middle ground be- tween feature-based similarity clustering, community detection and block mod- eling in complex networks. I evaluate the performance of the algorithm qualita- tively and quantitatively using a variety of evaluation measures, and show that it outperforms a number of other baselines. In the second part of my dissertation, I focus on the temporal dynamics of the interactions that occur in multi-modal networks, from both user-centric and network-level perspectives. For the user-centric approach, I analyze the dynam- ics of user relationships with other entity types in Chapter 4, proposing a measure of the ?loyalty? a user shows for a given group or topic based on her temporal interaction pattern. I evaluate the proposed measure on networks from various domains, and show the utility of the loyalty measure for analyzing the dynamic behavior of users and quantifying the degree of their commitment to different time-varying affiliation groups. I then move to macroscopic-level approaches for dynamic processes that oc- cur on a network scale. As an example of such processes, I analyze the diffusion 11 of information in complex networks in Chapter 5. I propose a novel information diffusion model which captures diversity across different product campaigns and provides a means for incorporating the trust among users in the network and their effects on the adoption process. I show that the proposed adaptive diffu- sion model is able to better predict future adoptions compared to other diffusion models. I also propose a method of influential identification based on the pro- posed differential adaptive diffusion model, and show that it outperforms exist- ing structure-based approaches. I then discuss the implications of the proposed model on viral marketing strategies in Chapter 6. In cases where user preferences and historical interactions are unavailable, analysts often resort to either primary methods, such as surveys, or secondary methods, such as proxy interactions, for inferring the influence network among users. Primary methods rely on surveys and questionnaires that are directly sent to the people in the target population, while secondary methods rely on the net- work properties, such as different centrality measures, of alternative ?proxy? net- works that are assumed to reflect influence, such as collaboration, co-authorship, and citation networks. To augment my study of influential detection, in Chap- ter 7 I propose an active surveying method for leveraging the different modal- ities of the secondary networks, to guide the surveying process and minimize the amount of primary data required. I show that active surveying result in sig- nificant cost reduction in identifying opinion leaders, without sacrificing the in- tegrity of the process. 12 Part I Multi-Modal Networks Structure and Organization 13 Chapter 2 The Co-evolution of Social and Affiliation Networks Most studies on network evolution focus on proposing generative models, which capture the statistical properties of real-world networks related only to a single type of link formation. There have been very few studies that analyze the evolution process of multi-modal networks, where different types of nodes and edges exist. In this chapter, I analyze the evolution process of both social and affiliation links in a network comprised of people and social groups. I pro- pose a novel generative model which captures the co-evolution of both social and affiliation networks. I show that coupling the evolution process between dif- ferent network modes better captures the statistical properties observed in real multi-modal networks than a more simplistic approach that handles each mode separately. 2.1 Introduction In recent years, there has been a proliferation of online social networks. Many of the networks have millions of users, and allow complex interactions through linking to friends, public messaging, photo commenting, participating in groups of interest, and many others. Studies have been performed to charac- terize and explain the behavior of users, and most of them concentrate on mod- 14 eling how users join the network and form links to each other. Little is known about how different types of interaction influence each other. In this chapter, I address the problem of modeling social network generation explaining both link and group formation. In social networks, users are linked to each other based on a binary rela- tionship such as friendship, co-working relation, business contact, etc. Social net- works often contain relationships other than friendship, such as affiliation links, in which users are linked to groups of interest, and groups are linked to their members. In this study, I use three large datasets from online social and affilia- tion networks, and discover a number of interesting properties. The datasets are from Flickr, LiveJournal and YouTube collected by Mislove et al. [75]. Using the previously studied and newly observed statistical properties of these networks, I propose a generative model for social and affiliation networks. The model explains the complex process of network formation, and captures a number of affiliation network properties which have not been captured by a model before: power-law group size distribution, large number of singletons (group members without friends in the group), power-law relation between the node degree and the average number of group affiliations, and exponential dis- tribution of the number of group affiliations for nodes of a particular degree. My findings are important for understanding the evolution of real-world net- works and suggest that the process is more complex than a na??ve model in which groups are added to a fully evolved social network. They also show that users join groups for different reasons and having friends in the group is often not nec- 15 essary. This suggests that information spreads in the network through channels other than friendship links, and this observation has implications on information diffusion and group recommendation models. In addition, this model can be used for synthetic network generation. This is an important application because real-world network datasets are often propri- etary and hard to obtain. Controlling network parameters allows the generation of datasets with different properties which can be used for thorough exploration and evaluation of network analysis algorithms. My contributions include the following: I discover a number of new properties in social and affiliation networks. I propose the first generative model for network evolution which captures the properties of both real-world social and affiliation networks. I provide a thorough evaluation of the model which shows its flexibility for synthetic data generation. Notation. I study the interactions of two graphs, the social network graph, Gs, and the affiliation graph, Ga. For the purposes of the study, a social network is a graph Gs = fV;Esg which has one type of node corresponding to the users that participate in it. Nodes can form links which can be directed or undirected; es(vi; vj; t) denotes the link that vi and vj have formed at time t. A directed link is formed whenever one user links to another. An undirected link requires the approval of both parties in order to be formed. 16 In an affiliation network Ga = fV;H;Eag, there are two types of nodes, the social network users V and the groups H that they have formed. I represent the network as a bipartite graph in which undirected links ea(vi; hj; t) are formed between user vi and group hj at time t when this user becomes a member of the group. There are a number of reasons why groups are formed. For example, groups can exist because of a common interest, such as philately or book-reading clubs; they can be based on common business relation, such as an employing company; or they can be based on common personal traits, such as geographic location. What is common between the groups that I study is that users have voluntarily chosen to be part of the group, as opposed to clustered together by a group detection algorithm. 2.2 Related work The evolution of social and affiliation networks exhibits a number of prop- erties previously studied in the literature. I describe some of them in more detail in Section 2.4.2. 2.2.1 Evolution of social networks The majority of literature on analyzing network properties has focused on friendship networks or actor-actor networks in general. Studying the static snap- shots of graphs has led to discovering properties such as the ?small-world? phe- nomenon [118] and the power-law degree distributions [6, 29]. Time-evolving 17 graphs have also attracted attention recently, where interesting properties have been discovered, such as shrinking diameters, and densification power law [63]. There have been a number of models proposed to capture these proper- ties. For a survey, one can consult the work by Chakrabarti and Faloutsos [15]. For example, unlike the random graph model, the preferential attachment model proposed by Barabasi et al. [6] captures power-law degree distributions. The forest fire model [63] also captures the power-law degree distribution together with densification and shrinking diameters over time. A more recently proposed, microscopic evolution model [62] is based on properties observed in large, tem- poral network data, providing insight into the node and edge arrival processes. Another recent model, the butterfly model [73], concentrates on capturing the evolution of connected components in a graph. In this work, I extend the micro- scopic evolution model by including processes of forming and joining groups of interest. 2.2.2 Evolution of affiliation networks To the best of my knowledge, there is no model that captures the evolution of affiliation networks in online communities. However, there are studies that describe the relationship between friendship links and group formation proper- ties [4, 75]. They show that the probability of a user joining a group increases with the number of friends already in the group [4], and that higher degree nodes tend to belong to a higher number of groups [75]. 18 Group detection is a related problem (for a survey, see [34]). Its goal is to find new communities based on node features and structural attributes. Unlike group detection work, my work concentrates on unraveling the process govern- ing the formation of existing communities. 2.3 Observations Though affiliation groups constitute a major part of many social networks, very little work in the literature focuses on analyzing group memberships and evolution. In this section, I analyze different affiliation networks and try to char- acterize some properties of affiliation groups that are consistent across various datasets. For my analysis, I used three large real-world datasets from LiveJour- nal, Flickr and YouTube. LiveJournal is a popular blogging website whose users form a social net- work through friendship links and form affiliation links to various ?communi- ties,? which are groups of users having similar interests. The LiveJournal dataset considered contains over 5:2 million users, 72 million links, and over 7:4 million affiliation groups. Flickr is a photo-sharing site based on a social network with friends and family links. Groups in Flickr are also based on users with common interest. The Flickr dataset used in the experiments contains over 1:8 million users, 22 million links, and around 100; 000 groups. The third dataset is from YouTube, which is a popular video-sharing site that includes a social network based on user-defined contacts, and an affiliation network based on the category 19 of videos that users post. The YouTube dataset contains over 1:1 million users, 4:9 million links and around 30; 000 groups. The full dataset descriptions can be found in the work of Mislove et al. [75]. Now, I describe the observations that I discovered by analyzing the datasets, and relate them to previously observed properties. 2.3.1 Group size distribution I begin by characterizing the relationship between the size of the affiliation group and its frequency of occurrence. The main observation is that, analogous to the degree distribution, the group size distribution follows a power law with a large number of small groups and a smaller number of large ones. This has also been observed by Mislove et al. [75]. The results are illustrated in Figure 2.1. 2.3.2 Node degree vs. Average number of group affiliations Looking at the relationship between the degree of a node and the number of its group affiliations, I observe that the nodes of lower degree tend to be members of fewer groups than the nodes with higher degree. However, the relation starts declining after a certain point, yielding a lower number of group memberships for very high degree nodes. The relationship is illustrated in Figure 2.2, where the x-axis represents the node degree and the y-axis represents the average number of group affiliations for nodes with that degree. The nodes in the declining part represent a very small portion of the overall number of nodes (< 1% of the size 20 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 Group Size C ou nt (a) LiveJounral 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 Group size C ou nt (b) Flickr 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 Group Size C ou nt (c) Youtube Figure 2.1: Distribution of group sizes on a log-log scale. of the network in all cases), which is why I fitted only the increasing part of the data points to a function. I evaluated over 55 different distributions including lo- gistic, Dagum and Laplace, using EasyFit 1, a software for distribution fitting. A power-law relation was the best fit according to the Kolmogorov-Smirnov rank- ing coefficient. 2.3.3 Distribution of the number of group affiliations The previous observation considered the average number of group affilia- tions for nodes with different degrees. Here, I look at the actual distribution of the number of group affiliations with respect to the node degree. It turns out that the number of group affiliations for nodes of a certain degree k follows an expo- 1At http://www.mathwave.com 21 0 10 20 30 40 50 60 70 80 90 100 1 51 10 1 15 1 20 1 25 1 30 1 35 1 40 1 45 1 50 1 55 1 60 1 65 1 70 1 75 1 Node Degree A vg n um be r of g ro up a ff ili at io ns (a) LiveJounral 0 100 200 300 400 500 1 10 1 20 1 30 1 40 1 50 1 60 1 70 1 80 1 90 1 10 01 11 05 12 11 13 17 14 27 15 46 16 74 18 13 19 50 21 34 23 51 26 14 30 31 36 66 64 24 Node Degree A vg n um be r of g ro up a ff ili at io ns (b) Flickr 0 10 20 30 40 50 1 51 10 1 15 1 20 1 25 1 30 1 35 1 40 1 45 4 51 2 57 1 64 8 74 0 84 4 10 06 12 29 16 27 22 01 48 71 Node Degree A vg n um be r of g ro up a ff ili at io ns (c) Youtube Figure 2.2: Node degree vs. average number of group affiliations nential distribution. Figure 2.3 reports on k = 50 for LiveJournal and Flickr, and on k = 25 for YouTube but this was true for other degrees as well. 2.3.4 Properties of group members According to Backstrom et al. [4], nodes are more likely to join groups in which they have more friends. However, it turns out that, in the datasets, there is a large portion of group members without friends in the group (singletons), meaning that they did not join the group because of a friend. This is surprising because it shows that users join groups for various reasons with friendship being only one of them. I measure the maximum node degree within groups of various sizes in the 22 0 20 40 60 80 100 120 140 160 180 200 0 12 24 36 48 60 72 84 96 10 8 12 0 13 2 14 4 15 6 16 8 18 0 19 2 20 4 21 6 22 8 24 0 25 2 26 4 Number of group affiliations per node N um be r of n od es (a) LiveJounral - Degree = 50 0 10 20 30 40 50 60 0 7 14 21 28 35 43 50 57 64 71 78 85 92 10 0 11 1 12 2 13 1 14 1 15 2 17 5 20 0 35 6 Number of group affiliations per node N um be r of n od es (b) Flickr - Degree = 50 0 50 100 150 200 250 0 2 4 6 8 10 12 14 16 18 20 24 31 40 47 46 9 Number of group affiliations per node N um be r of n od es (c) Youtube - Degree = 25 Figure 2.3: Distribution of the number of group affiliations for nodes with differ- ent degrees. datasets. For all groups of a given size, I measure the average maximum degree per group and the average number of singletons (nodes with no friends within this group) as a percent of the group size. The results show a large number of singletons overall, especially in small groups, indicating that a large percentage of the members of a specific group do not have any friends within this group. This conclusion was confirmed by analyzing the average maximum degree per group. It turned out that the friends of the maximum-degree node within a group do not constitute a large percentage of the group size, even in small groups. The numbers are illustrated in Figure 2.4, where the upper series shows the average ratio of the number of singletons to the group size, and the lower series represents 23 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 501 1001 1502 2016 2570 3247 4066 5179 6683 8953 13432 24526 Group Size R at io (a) LiveJounral 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 51 101 151 201 251 301 351 401 451 502 557 613 676 760 870 1120 Group Size R at io (b) Flickr 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 Group Size R at io (c) Youtube Figure 2.4: Ratio of the number of singletons to the group size (upper series) and ratio of the maximum degree to the group size (lower series). the average ratio of the maximum degree to the group size. This result shows that the larger the group a user belongs to, the more likely it is for him/her to have a friend in the group. For example, in Flickr, 76% of the members of groups of size 50 are singletons, while for groups of size 500, this number drops to 29%. 2.4 Co-evolution properties and model A model which describes the evolution of a social network together with the evolution of an affiliation network needs to capture a number of simple events, as well as statistical properties of both networks. Here, I present the events of the co-evolution model and desired properties, some of which have been presented in other work. Then, I present the co-evolution model, which extends the node 24 arrival and link formation processes of the microscopic evolution model [62] to dynamic social and affiliation networks. 2.4.1 Events The possible events that the model allows are: a node joins the network and links to someone a new group is formed with one member a node joins an existing group a new link is formed between existing users 2.4.2 Desired properties A co-evolution model needs to capture properties of both social and affili- ation networks. Here, I show three types of properties: properties of the social network alone, properties of the affiliation network alone, and properties of both. Properties of the social network. power law degree distribution - the node degrees are distributed according to a power law with a heavy tail. This property has been observed in many other studies. network densification - the density of the network increases with time [63]. shrinking diameter - the effective diameter of the network decreases as more nodes join the network [63]. 25 Properties of the affiliation network. power law group size distribution - the group sizes are distributed according to a power law with a heavy tail. Properties involving both the social and affiliation networks. large number of singletons - many nodes do not have any friends inside the groups they are affiliated with. power-law relation between the node degree and the average number of group affil- iations - see Section 2.3.2. exponential distribution of the number of group affiliations for a particular node degree - see Section 2.3.3. 2.4.3 Co-evolution model I now propose a co-evolution model which captures the desired properties. The model is undirected, and it has two different sets of parameters: one is con- cerned with the evolution of the social network, and the other determines the factors of development of the affiliation network. I also present a na??ve model which assumes that the evolution of the affiliation network is independent of the evolution of the social network. Both models utilize the microscopic evolu- tion model [62] for generating the social network because that model is based on observing the temporal properties of large social networks. I present its main components first. 26 Microscopic evolution model. The main ideas behind the microscopic evo- lution model are that nodes join the social network following a node arrival func- tion, and each node has a lifetime a, during which it wakes up multiple times and forms links to other nodes. These are the set of parameters needed for the microscopic evolution model: N(:) is the node arrival function, is the parame- ter of the exponential distribution of the lifetime, and ; are the parameters of the power law with exponential cut-off distribution for the node sleep time gap. Further details of the model can be found in the paper by Leskovec et al. [62]. I utilize these parts: Node arrival. New nodes Vt;new arrive at time t according to a pre-defined arrival process N(:). Lifetime sampling. At arrival time t, v samples lifetime a from :e :a: v be- comes inactive after time tend(v) = t+ a. First social linking. v picks a friendw with probability proportional to degree(w) and forms edge es(v; w; t). Sleep time sampling. v decides on a discrete sleep time by sampling from 1 Z :( ):e :degree(v): . If the node is scheduled to wake up before the end of its lifetime (t + tend(v)), then it is added to the set of nodes Vt+ that will wake up at time t+ . Social linking. At wake up time t, v creates an edge es(v; w; t) by closing a triad two random steps away (i.e., befriends a friend w of a friend). Na??ve model. Before I present the proposed co-evolution model, I first present a na??ve model which assumes that the evolutions of the social network 27 and the affiliation network are two independent processes. As a first step, it cre- ates the social network using the model of Leskovec et al. [62]. Then, it generates and populates groups in such a way that their sizes follow a power-law distribu- tion with an exponent k. Algorithm 2.1 presents the na??ve model in detail. I use this model as a baseline. Algorithm 2.1 Na??ve model 1: Set of nodes V = ; 2: for each time period t 2 T do 3: Set of active nodes at time t, Vt = ; 4: end for 5: for each time period t 2 T do 6: Node arrival. V = V [ Vt;new 7: for each new node v 2 Vt;new do 8: Lifetime sampling 9: First social linking 10: end for 11: for each node v 2 Vt do 12: Social linking 13: end for 14: for each node v 2 Vt [ Vt;new do 15: Sleep time sampling 16: end for 17: end for 18: Set of groups H = ;. 19: for i=1:number of groups do 20: Group creation. New group hi is created and its size s is sampled from s k. H = H [ fhig. 21: for j=1:s do 22: Group joining. Pick a random node v 2 V and form an affiliation link to it ea(v; hi; null). 23: end for 24: end for Co-evolution model. In this model, the affiliation network evolution co- occurs and depends on the social network evolution. When a node wakes up, besides linking to another node, it also decides on a number of groups to join. 28 Algorithm 2.2 Co-evolution model 1: Set of nodes V = ; 2: Set of groups H = ; 3: for each time period t 2 T do 4: Set of active nodes at time t, Vt = ; 5: end for 6: for each time period t 2 T do 7: Node arrival. V = V [ Vt;new 8: for each new node v 2 Vt;new do 9: Lifetime sampling 10: First social linking 11: end for 12: for each node v 2 Vt do 13: Social linking 14: Affiliate linking. v determines nh, the number of groups to join, sam- pled from an exponential distribution 0e 0nh with a mean 0 = 1 0 = :degree(v) . 15: for i = 1 : nh do 16: if rand() < then 17: Group creation. v creates group h, and forms edge ea(v; h; t). H = H [ fhig. 18: else 19: Group joining. v forms edge ea(v; h; t). Group h is picked through a friend with probability pv; otherwise, or if no friends? groups are available, it joins a random group with prob. proportional to the size of h. 20: end if 21: end for 22: end for 23: for each node v 2 Vt [ Vt;new do 24: Sleep time sampling 25: end for 26: end for With probability , it creates a new group, else, it joins an existing group. There are two mechanisms by which it picks a group to join. In the first one, it joins the group of one of its friends. In the second one, it picks a group at random. Algorithm 2.2 presents the co-evolution model in detail. Here, I present the parameters of the affiliation network evolution part in 29 more detail. The first parameter, , represents a tuning parameter that controls the density of the affiliation links in the network. The second parameter, , is the exponent of the power law that relates node degree with number of group affiliations. The last parameter to the model, , represents the probability by which an actor creates a new group at each time point. All the parameter values range over the interval [0; 1] except which ranges between 0 and the average number of group affiliations per node. I provide some guidelines for picking the right parameter values in the experiments section. As noted in Section 2.4.2, the relationship between node degree and average number of affiliations is a power-law relation. Even though one can vary the exponent of this function, for simplicity, I fixed its value to 0.5, utilizing a square root function to compute this average. It is also worth noting that other, more sophisticated techniques can be uti- lized in both social and affiliation aspects of the model that might be able to cap- ture stronger correlation between the evolution of both kinds of networks. One possible modification for the social link creation is considering random steps but with group bias, such that the probability of choosing a node u to close the triad is proportional to the number of groups the two nodes share. Another possible modification is to specify the number of groups a node will join in advance us- ing the estimated power-law function. A disadvantage of such approach is that the approximated degree is hard to compute because it depends on the expected value of a function which changes with the degree. A thorough investigation of the different alternatives is left as future work. 30 In the group joining step of the algorithm, a node decides to join a group and it has two choices for picking that group. One is through a friend, and the second is by picking a random group with probability proportional to the size of that group. It follows the first choice with some probability pv, else it resorts to the second one. The intuition behind this is that some nodes in each group are singletons while others have friends in it. The second choice is also based on the observation that the size of the groups follows a power-law distribution; on the principle of ?rich get richer,? groups with larger size should have a larger probability of getting picked. There are many options for computing the probability pv such as making it a constant or dependent on the node degree. One can test which is most appro- priate in the presence of temporal data for affiliation networks. Since such data is hard to obtain, I try different possibilities in the model. It turns out that using a constant for pv yields a relationship between the group size and the singleton ratio that decreases at first but then stabilizes around 1 pv at higher group sizes. In contrast, what was observed initially is a relationship which decreases with increasing group sizes (see Figure 2.4). When a pv which is correlated with the degree is used, the resulting relationship becomes closer to the desired one. In particular, I compute: pv = 8 >>< >>: degree(v) if degree(v) < 1 1 (2.1) 31 though other functions of the degree may be more appropriate. The parameter represents the friends? influence on the actor?s decision to join a group; i.e. the likelihood of an actor joining one of the groups of his/her friends increases by increasing the value of . The main intuition behind using a degree-correlated probability is the fact that as a node has more friends, the probability that one of its friends belongs to one of the larger size groups increases. Thus, utilizing the friendship bias parameter actually increases its chances of joining this larger size group of its friend, thus leading to the decreasing relationship noted in the observations. 2.5 Experiments I present three sets of experiments. The first set shows that the model is able to produce a dataset very similar to one of the real-world datasets, and the second set observes the properties of data generated by the co-evolution model. I also present results for the na??ve model which adds groups on top of a social network, showing that this model is not able to produce the real-world affiliation network properties. 2.5.1 Synthetic data In the first set of experiments, I vary the parameters of the model in order to generate a few synthetic datasets. Then, I check whether each dataset has the properties described in Section 2.4.2. 32 I have fixed the parameters of the social evolution part throughout this set of experiments, and varied the parameters of the affiliation part of the network. I assume an exponential node arrival function to achieve higher growth rate in the generated network, which is in accordance with what Leskovec et al. [62] showed in some social networks, such as Flickr. However, other arrival functions can also be utilized within the model. The other parameters of the social evolution aspect were fixed as reported by Leskovec et al. for Flickr data: = 0:0092, = 0:84, and = 0:002. I also fix the value of the second parameter to the affiliation model, , to 0.5. 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+00 1.E+01 1.E+02 1.E+03 Node Degree C ou nt Figure 2.5: Degree distribution in a synthetic network I first illustrate the results for the social network generated using the spec- ified parameters. The model was run for 400 time steps, resulting in a network with 140,158 actors and 245,043 social links. The degree distribution in the result- ing network follows a power-law as Figure 2.5 shows. The network densification property also holds, as illustrated in Figure 2.6 which represents the number of 33 nodes and number of edges at each time point on a log-log scale. 3.7 4 4.3 4.6 4.9 5.2 5.5 3.7 4 4.3 4.6 4.9 5.2 5.5 Node Count E dg e C ou nt Figure 2.6: Densification in a synthetic network In order to test the affiliation aspect of the evolution model, I investigated the effect of each parameter in the model on the properties of the resulting affili- ation network. I start with the first parameter , which represents a tuning factor of the affiliation links? density. The main properties that are affected by varying the value of are the total number of affiliations and the distribution between the node degree and average number of group affiliations. Figure 2.7 illustrates that the general power distribution persists among different values of , but the main effect is the scale of the distribution; as increasing the value of , more affiliation links are created, and correspondingly increasing the average number of group affiliations per node. Theoretically, the values for this parameter can vary from 0, where no affiliation links are created in the network, to the maximum number of groups, where fully connected affiliation network emerges. Practical values for varies between 0 and 25. The total number of affiliation links for each value of 34 is reported in Table 2.1. 0 20 40 60 80 100 120 140 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 Node Degree A vg n um be r of g ro up a ff ili at io ns (a) = 3 0 20 40 60 80 100 120 140 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 Node Degree A vg n um be r of g ro up a ff ili at io ns (b) = 10 0 20 40 60 80 100 120 140 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 Node Degree A vg n um be r of g ro up a ff ili at io ns (c) = 20 Figure 2.7: Degree vs. average number of group affiliations on varying the pa- rameter ( ). Affiliation Count 3 285,536 10 2,411,710 20 4,771,072 Table 2.1: Number of affiliation links on varying the parameter ( ) The next parameter, , represents the probability with which a node creates a new group. This parameter directly affects the number of groups in the result- ing network, as well as the group size distribution. As illustrated in Figure 2.8, we note that although the power law distribution of the group sizes holds for 35 various values of (which is one of the desired properties), the maximum group size decreases significantly with increasing the value of . This decline in the maximum group size is caused by the fact that for higher values of , nodes tend to create new groups more often than joining existing ones, which leads to the existence of a large number of groups with relatively small sizes. This conclu- sion is also clear in the results illustrated in Table 2.2, where the resulting number of groups in the network and the maximum group size vary significantly with changing the parameter value. 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 Group Size C ou nt (a) = 0:1 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 Group Size C ou nt (b) = 0:5 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+00 1.E+01 1.E+02 Group Size C ou nt (c) = 0:9 Figure 2.8: Group size distribution on varying the parameter ( ) Finally, I investigate the parameter , upon which pv depends. represents the extent to which friends influence the decision of a node to join groups. The 36 outcome of increasing the value of this parameter is a decreasing number of sin- gletons and an increasing relative degree of the nodes within different groups. Figure 2.9 shows that the general distribution captures the desired properties and the observations in real data. The value of is highly dependent on the social net- work structure properties, such as the average node degree in the social network and the desired influence of friends on a node?s decision. For instance, if the value of = 0:1 is used in a setting where the expected value for the average node degree is around 10, then we expect to see high percentage of nodes in the network being affected by their friends. 2.5.2 Real data In this set of experiments, I look for the model parameters that will produce a network similar to one of the real-world datasets used in the observations of Section 2.3. I searched for parameters that will produce an affiliation network resembling Flickr since the social network evolution parameters for Flickr have already been reported by Leskovec et al. [62]. In order to get an initial seed of the search space for the evolution parameters of the affiliation network, I analyze the affiliation network properties of Flickr as observed in Section 2.3. A summary of the affiliation network statistics of Flickr is given in Table 2.3. Groups Count Max Group Size 0.1 66,887 39,753 0.5 245,143 560 0.9 332,437 32 Table 2.2: Number of groups on varying the parameter ( ) 37 0 0.2 0.4 0.6 0.8 1 1 5 9 13 17 21 25 29 33 37 42 47 51 56 60 70 84 10 6 26 6 10 58 Group Size R at io (a) = 0:01 0 0.2 0.4 0.6 0.8 1 1 5 9 13 17 21 25 29 33 37 42 47 51 56 60 70 84 10 6 26 6 10 58 Group Size R at io (b) = 0:05 0 0.2 0.4 0.6 0.8 1 1 5 9 13 17 21 25 29 33 37 41 45 49 54 59 65 78 89 11 4 20 7 Group Size R at io (c) = 0:09 Figure 2.9: Group size vs. member attributes on varying the parameter ( ) (dashed line: % ratio of singletons to group size, solid line: % ratio of maximum degree to group size). The Flickr dataset is characterized by a relatively small number of groups in comparison to the number of users, where the actual ratio between the group count and the user count is 0:056. As a result, I expect to have a small value of close to this ratio. On the other hand, the average number of group affiliations per user in the real dataset is 4:62, and I assign this value to . Finally, as observed in Figure 2.4, the average percentage of singletons in each group is lower than the average for the other datasets, indicating more friendship bias, thus increasing the value of . There are other factors to consider when specifying the affiliation network evolution parameters, such as the rate of node arrival and the probabilistic na- 38 Real Network (Flickr) Synthetic Network Users count 1,846,198 1,707,475 Groups count 103,648 88,749 Affiliations count 8,529,435 7,813,910 Avg groups per user 4.62 4.58 Table 2.3: Statistics of a real network (Flickr) vs. a synthetic one ( = 2:5, = 0:5, = 0:1, = 0:03) ture of the node?s lifetime and sleep time gaps. For example, in Flickr?s case, the exponential node arrival rate means that more nodes are created at later times. In this case, the distribution parameters should be a bit lower than desired because many nodes will join towards the end of the evolution process but they will not have sufficient time to create many links and affiliations. By utilizing all these pieces of information to guide the parameter search, I was able to generate a net- work that has similar attributes to Flickr?s, illustrated in table Figure 2.3. I argue that using a similar procedure for parameter selection can result in generating synthetic networks that have many of the properties of a real one. 2.5.3 Comparison with the na??ve model In this set of experiments, I was interested to learn whether the desired net- work properties can be produced by utilizing the na??ve evolution model. The model can clearly capture the social network properties since the process of creat- ing it is the same as in the co-evolution model. In terms of the affiliation network properties, I used the na??ve model to produce a social network similar to Flickr, as described in the previous experiment. Then I created the desired number of groups and picked the size of each one from a power-law distribution with the 39 0 0.2 0.4 0.6 0.8 1 1 38 75 11 2 14 9 18 6 22 3 26 0 29 7 33 8 38 7 43 4 48 3 55 0 63 9 74 6 86 1 10 22 12 84 15 74 20 16 27 49 42 11 72 28 24 68 9 R at io Group Size (a) Average number of singletons (dashed line) and average maxi- mum degree (solid line) 0 1 2 3 4 5 6 7 8 1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 Node Degree A vg n um be r of g ro up a ff ili at io ns (b) Degree vs. avg number of affiliation groups Figure 2.10: The affiliation properties produced by the na??ve model parameters observed in Flickr. Each group was populated by picking random users from the social network. As a result, the na??ve model is able to capture the group size distribution. However, Figure 2.10(a) shows that it is not able to capture the average number of singletons and the average maximum degree as a percent of the group size. By picking random members, almost all members in 40 each group end up being singletons (except for groups with very large sizes), and the average maximum degree is close to 0. Figure 2.10(b) shows that the model is also not able to capture the relation between degree and average number of group affiliations for nodes with lower degrees. The na??ve model generates a relation between them which is closer to linear than a power law. 2.6 Conclusion I presented a generative model for creating social and affiliation networks. The model captures important statistical properties of these networks, and pro- vides new insights into the evolution of networks with both social and affiliation links. It shows that groups can be formed for various reasons and friendship links are not the only propagators of influence. I believe that this observation not only affects the design of network evolution models but it may have broader implica- tions on other mechanism designs, such as group recommendation, information diffusion and viral marketing strategies. 41 Chapter 3 Multi-Relational Affinity Propagation After analyzing the evolution process of multi-modal networks, I now move to characterizing the structural patterns that exist in these networks. I focus on cluster analysis as one of the common initial data exploration approaches. In this chapter, I propose a multi-relational clustering approach for identifying the patterns and grouping structures that occur in complex networks. My proposed approach extends the affinity propagation clustering framework for encoding dif- ferent types of relational constraints to capture the dependencies between differ- ent node types, and across various relationship types. This formulation allows for combining information from both the features of different entities as well as their inter-relational structure to explore different clusterings of the network. The out- put clustering can then be used for further analysis of different network-related tasks. 3.1 Introduction Cluster analysis is one of the foundational components of unsupervised learning and exploratory data analysis. It has long attracted the attention of re- searchers from multiple disciplines. The classical clustering approaches focus on feature similarity for finding latent groupings in the data. However, with the 42 emergence of data that is naturally described in more complex ways, particularly in the form of heterogeneous graphs or networks, these classical approaches are no longer sufficient. In order to address the challenges in this structured data, a number of graph clustering and community detection approaches have been proposed [28, 37, 82]. The methods find groups of nodes that are tightly connected to each other, and loosely connected to nodes in different clusters. Similar ideas for graph partition- ing are also used for feature-based clustering, by constructing a network among the data points based on their attribute similarity rather than intrinsic structure (e.g., [101]). In addition, block modeling approaches have also been proposed for grouping nodes that link to similar collections of other nodes [54, 120, 1]. However, many real-world problems include rich, structured relationships that include multiple dependencies among different entity types. Clustering het- erogeneous (multiple node types), multi-relational (multiple edge types) networks poses a number of challenges that the proposed algorithms should be able to ad- dress. First, the proposed algorithm should be able to account for the structural dependencies between nodes of the same type, as well as the information con- tained in their descriptive features. Second, the algorithm should be able able to model the relationships across different entity types and incorporate them in the clustering process. To motivate the multi-relational clustering problem, consider the task of customer segmentation for marketing purposes. Using only the customer de- mographics as features, the best achievable segmentation is a one based on age, 43 gender, etc. Although this demographic profiling might help determine suitable products or appropriate marketing design strategy, it does not provide insight into the social structure, which may be important for predicting product adop- tion or collections of customers to target. By incorporating the social network structure, a relational clustering algorithm can produce segments that are based on both the demographics and the connectivity of the users in their correspond- ing social communities. This is likely to help in determining the projected adop- tion and gives some insight into the social influence. In addition, by considering information about additional relationship types, such as affiliations and mem- berships between people and other organizations or entities in the network, we may be able to develop a more nuanced picture. For example, a multi-relational clustering algorithm can account for customers? affiliations to different industrial segments and their organizational roles. This may lead to a better quality seg- mentation, that is more helpful for influence estimation or targeted advertising. To address these challenges, a number of multi-relational clustering ap- proaches have been introduced in the literature [109, 78, 10, 67, 5]. While each of these methods has their advantages and disadvantages, the majority of them either make certain distributional assumptions about the underlying data or re- quire certain characteristics in the feature set. In addition, many of these ap- proaches rely on expensive inference methods, such as Gibbs sampling or other MCMC approaches. In this work, I present a novel, general clustering approach that utilizes both feature similarity and relational dependencies across multiple relationship and entity types to produce a clustering that balances between the 44 homogeneity of the data points and their relational structure. The main advan- tages of my approach are that it is simple, elegant, scalable, and does not make any distributional assumptions about the underlying data. My work extends the affinity propagation (AP) clustering algorithm [31] to complex networks do- mains, by leveraging the relational dependencies in the underlying network data through the introduction of structural constraints in the AP model. These con- straints bias the optimization problem to favor clusterings which conserve both the homogeneity of the data points as well as their connectedness. The proposed multi-relational affinity propagation framework uses signals from the links among both similar and different node types to augment the infor- mation gained through features similarity, while allowing the user to control the extent of this effect. This facilitates the exploration of clusterings that account for both feature and structural similarities. I show the advantages of my framework over previous approaches through a set of experiments on a sample network from the social news website, Digg.com. 3.2 Related Work Early work in relational clustering was first done in the ILP community, in which objects of each type are clustered based on the objects of other types linked with them (e.g., [57]). In addition to the logical-based approaches, there has been also a body of literature on probabilistic approaches. Taskar et al.[109] proposed a relational clustering algorithm based on probabilistic relational models that used 45 both feature and link information in uncovering the latent group structure. How- ever, one of the drawbacks of the algorithm is the acyclicity constraint which is hard to satisfy in general network data. Neville et al.[78] proposed a hybrid ap- proach for graph partitioning that relies on both link and feature information. Although a number of clustering methods have been proposed to combine both feature and structural information, most of them have focused on cluster- ing a single node type, with the link structure serving as an additional factor in determining similarities or enforcing constraints on the clustering problem. Re- cently, the problem of clustering general heterogeneous data in multi-relational settings has recently started to attract the attention of more researchers, especially with the increased complexity of the existing data and the associated analysis tasks. An early example is the framework proposed by Zeng et al.[121] for clus- tering heterogeneous web objects, through an iterative reinforcement clustering process. A different approach was proposed by Xu et al.[120] by introducing an infinite dimensional latent variable for each entity in the network, as part of a Dirichlet process mixture model. As the inference in this approach mainly relies on the Chinese Restaurant Process, the method?s performance might not scale fa- vorably for large networks. More recently, a probabilistic framework approach was proposed by Long et al.[68] for clustering different types of entities, taking into consideration the multiple types of relationships among them. However, one of the limitations of this work is that it assumes the underlying statistical distribution of the data belongs to the exponential family. Bekkerman et al.[10] proposed a framework that simultaneously clusters 46 variables of different types based on their pairwise interactions. More recently, Banerjee et al.[5] proposed a multi-way clustering approach for relational data, that relies on simultaneously clustering multiple entity types represented as a multi-modal tensor. One of the limitations of this approach is that it is only ap- plicable to Bregman loss functions. However, formulating the problem as tensor clustering is an active area of research that has been recently attracting the at- tention of multiple researchers (e.g., [50, 107]). Other related work includes the framework proposed by Plangprasopchok et al.[84], which extends affinity prop- agation to account for structural constraints in inferring consistent taxonomies from shallow personal hierarchies on the web. In addition to the previous ap- proaches, a recent logic-based approach was proposed by Kok et al.[58] for dis- covering new concepts in ILP settings, using a second-order Markov logic frame- work. The proposed model forms multiple relational clusterings, while itera- tively refining them based on the underlying data. 3.3 Method I represent the underlying multi-modal network structure as a complex graph G(V ; E ; T ), where T denotes the different node types in the network, V = fV t : t 2 T g represents the set of node sets of each type in G, and E = fEt1!t2 : t1; t2 2 T g represents the set of edge sets in G. I distinguish between two types of edges in E : homogeneous and heterogeneous edges. Homogeneous edges are edges among the same node type (e.g., friendship links among people 47 in a social network), and take the form Et!t. To simplify the notation, homo- geneous edges between nodes of type t are represented as Et. Heterogeneous edges link entities of different types (e.g., affiliation links between people and organizations), are denoted as Et1!t2 where t1 6= t2. Feature-based clustering approaches focus on clustering data points using similarity measures defined over their features. One simple framework that has been recently proposed is affinity propagation (AP) [31]. AP is an exemplar- based clustering that relies on a message passing algorithm. Given the similari- ties among the underlying data points, it finds a clustering by identifying a set of exemplars, and finds an optimal assignment of the rest of the data points to these exemplars. One of the appealing aspects of affinity propagation is its formulation as a max-sum algorithm on a binary factor graph model [39]. This formulation facili- tates the incorporation of new constraints via functional nodes in the underlying factor graph. The similarity values among all pairs of data points, along with the 1-of-N constraint which enforces that each node is assigned to a single exemplar, and the exemplar consistency constraint that asserts that exemplar nodes should only choose themselves as exemplars, constitute the core of the affinity propaga- tion algorithm that I use as a base for my approach. In addition to the feature similarities among the nodes, the edges in in G also encode a set of relational dependencies across the corresponding entities the nodes represent. These dependencies should be made use of in the proposed multi-relational clustering algorithm. The proposed algorithm takes these de- 48 Figure 3.1: Sample bimodal network pendencies into consideration along with the feature-based similarity during the clustering process. Thus, I require the clustering algorithm to satisfy these addi- tional conditions: 1. Nodes that are connected by homogeneous links should be in the same clus- ter (modularity constraint). 2. Nodes that are connected by heterogeneous links to nodes of a different type residing in the same cluster should be clustered together (block con- straint). The first condition, referred to as a modularity constraint, favors clusterings that ensure a high degree of connectivity among the members of the same cluster. This is a common assumption made in a variety of community detection algo- rithms. The second condition ensures that nodes of one type that are connected to the same cluster of nodes of a different type, are also clustered together. This is a common assumption made in a variety of block-modeling algorithms, and I re- 49 ct11 c t 1j cti1 c t ij c t iN ctN1 c t Nj c t NN ct1N c t' 11 c t' 1j' ct'i'1 c t' i'j' c t' i'M ct'M1 c t' Mj' c t' MM ct'1M Et1 E t j E t N E t' 1 E t' j' E t' M It1 Iti ItN It'1 It'i' It'M Ht1 Htj H t N H t' 1 Ht'j' H t' M st11 s t 1j s t 1N stiN s t ij s t i1 stNj s t N1 stNN Lj t?t' st'11 s t' 1j' s t' 1M st'i'1 s t' i'j' s t' i'M st'M1 s t' Mj' s t' MM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LN t?t' . . . L1 t?t' . . . L1 t'?t . . . Lj' t'?t LM t'?t (a) Binary variable diagram with the structural dependencies factors. To simplify the diagram, only one instance of the factor L is included ctij S t (i, j) ?tij ?tij ?tij ? t ij ?tij ? t ij ?ij t?tx ?ij t?tx Iti Etj Stij Htj Lj t?tx ? ij t? t 1 ? ij t? t 1 Lj t?t1 Lj t?tT ? ij t?tT ? ij t?tT : : : : (b) Factor graph representation Figure 3.2: Multi-relational Affinity Propagation Model fer to this constraint as a block constraint. My goal is to encode these conditions in a flexible clustering framework that allows users to vary the importance of each. 50 3.3.1 Model Formulation Starting from the binary factor graph model introduced by Givoni and Frey [39], I augment it with the additional information needed in the multi-relational setting. Each possible assignment of node i of type t to an exemplar j is mod- eled as a binary variable ctij , such that (c t ij = 1) iff node i is assigned to the cluster represented by exemplar j. To simplify the discussion, I consider the bi- modal network illustrated in Figure 3.1. The example network G contains only two different node types V = V t; V t 0 ; jV tj = N; jV t 0 j = M , one homogeneous link type among each node type, and one heterogeneous link type across them E = Et; Et 0 ; Et!t 0 . Note that the same analysis can be easily extended to set- tings where there are more than two node and edge types. Given the above network, the possible assignments of nodes from both types to their corresponding exemplars can be described by the two sets: ctij ; i; j 2 f1; 2; : : : ; Ng, and ct 0 i0j0 ; i0; j0 2 f1; 2; : : : ;Mg. Accordingly, I extend all the con- straints defined in the original AP model to the multi-modal settings by replicat- ing the factor nodes for each node type in the model as shown in Figure 3.2. Next, I introduce two additional factors for each type to enforce my pro- posed modularity and block constraints. I formulate the structural constraints as soft constraints, parametrized by different costs for violating them. As opposed to the formulation of the 1-of-N and exemplar consistency constraints in the orig- inal AP model as hard constraints, the proposed soft structural constraints allow the user to control the level of impact of the relational dependencies on the clus- 51 tering output and at the same time increases the search space by permitting the model to violate some of the constraints to reach a better solution in the optimiza- tion process. The modularity constraint is represented by the factor H t, which is defined over each node type t as follows: H tj(c t 1j; :::; c t Nj) = 8 >>>>< >>>>: P i2V t ti 8e t(i; k) 2 Et : ctij = 0; c t kj = 1 0 otherwise (3.1) The factor H tj is defined over nodes of the same type, and it penalizes clusterings that assign different exemplars for nodes that are directly linked by an edge. For each potential exemplar j for node k, if there is an edge that links i to k, where j is not the current exemplar for node i, a cost ti is included in the objective function to reduce the likelihood of the corresponding clustering. This cost value can either be constant or variable depending on some structural properties of the terminal nodes (such as clustering coefficient, degree, etc.). For the second type of constraint capturing the clustering across edges among different node types, I introduce the block constraint factor Lt!t 0 , defined for each pair of node types t and t0 as follows: 52 Lt!t 0 j (c t 1j; :::; c t Nj) = 8 >>>>>>>>>< >>>>>>>>>: P i2V t !t!t 0 i 8e(i; i 0); e(k; k0) 2 Et!t 0 ; ct 0 i0j0 = c t0 k0j0 = 1 : ctij = 0; c t kj = 1 0 otherwise (3.2) The factor Lt!t 0 is defined over heterogeneous edges, connecting nodes of type t to all the potential exemplars of the nodes of type t0 that they are linked to. It penalizes clusterings that assign different exemplars for nodes of type t which are linked to nodes of type t0 residing in the same cluster by introducing a cost !t!t 0 for such configurations. This guides the clustering process to favor clusterings that account for the structural dependencies across node types. The global objective function of my proposed multi-relational affinity prop- agation on a network G can then be expressed as follows: S(c111; : : : ; c T NT NT ) = X t2T i;j2V t Stij(c t ij) + X t2T i2V t I ti (c t i1; : : : ; c t iNt) + X t2T j2V t Etj(c t 1j; : : : ; c t Ntj) + X t2T j2V t H tj(c t 1j; : : : ; c t Ntj) + X t;t02T j2V t Lt!t 0 j (c t 1j; : : : ; c t Ntj) (3.3) 3.3.2 Message Derivation Following the derivation of the original AP model, I use the max-sum al- gorithm to optimize the objective function in Equation 3.3, by deriving the scalar message updates in the factor graph model shown at Figure 3.2(b). The max-sum message update rules from a variable node to a factor node can be simply defined 53 as follows [12]: tij = t ij + t ij + X t02T t0 6=t t!t 0 ij + Sij (3.4) tij = t ij + t ij + X t02T t0 6=t t!t 0 ij + S t ij (3.5) tij = t ij + t ij + X t02T t0 6=t t!t 0 ij + S t ij (3.6) t!t 0 ij = t ij + t ij + t ij + X t002T t00 6=t;t0 t!t 00 ij + S t ij (3.7) where the value Sij corresponds to the feature-similarity between node i and its potential exemplar j, and the message pairs ( tij; t ij) and ( t ij; t ij) are the ones de- fined for factors Etj and I t i in the original affinity propagation model for encoding the 1-of-N and the exemplar consistency constraints. The message pair ( tij; t ij) is the one associated with the introduced modularity constraint factor H tj , and the messages ( t!t 0 ij ; t!t0 ij ) are the ones associated with the block constraint factor Lt!t 0 ij . I now move to the derivation of the message updates from the introduced factors to the corresponding variable nodes. I start with the modularity constraint factor H tj defined over nodes of similar types. To simplify the notation, I remove the type qualifiers, as the message derivation for the factor Hj is independent of the node type. To derive the message associated with factor Hj , we have to consider the two possible settings for each variable node cij . First, when cij = 1, 54 we get: ij(1) = max ckj ;k 6=i (Hj(c1j; :::; cij = 1; :::; ciN) + X l 6=i lj(clj)) (3.8) For the cases where a node l is not connected to i, the value of the function Hj re- duces to zero. However, for the set of neighboring nodesD(i) = fk : 9e(i; k) 2 Eg that are homogeneously linked with i, there are two different cases: first, if k is in the same cluster j as i (i.e., ckj = 1), then the function Hj reduces to zero. However, in the second case where (ckj = 0), the function Hj evaluates to the cor- responding cost k. By taking both cases into consideration, Equation 3.8 can then be re-written as follows ij(1) = X k2D(i) max( kj(1); kj(0) k) + X l =2D(i) max clj lj(clj) (3.9) Next, I consider the case when cij = 0: ij(0) = max ckj ;k 6=i (Hj(c1j; :::; cij = 0; :::; ciN) + X l 6=i lj(clj)) Similarly, the assignment of the nodes that are not directly linked to i is uncon- strained. However, for the nodes k 2 D(i), Hj evaluates to zero for the nodes that are not assigned to exemplar j (ckj = 0), and to the cost value i of the node i for the ones associated with the value ckj = 1. Therefore, the previous equation reduces to ij(0) = X k2D(i) max( kj(0); kj(1) i) + X l =2D(i) max clj lj(clj) (3.10) 55 By taking the difference between Equations 3.9 and 3.10, we get ij = X k2D(i) max(min( kj; i); min( k; i k kj)) = X k2D(i) max(min( kj; i); min( kj; i) k kj) = X k2D(i) (min( kj; i) + max(0; kj k)) = X k2D(i) (min( kj; i) min(0; kj + k)) (3.11) It is worth noting that in the final message value for ij , if the costs i and k are replaced with infinite value, turning the modularity constraint HJ into a hard constraint, the message value reduces to ( ij = P k2D(i) kj), which corresponds to the summation of all the incoming messages to i from its similar-type neighbors. However, if the costs are replaced by zero instead, the value of ij reduces to zero, effectively removing the effect of the corresponding constraint. For deriving the update messages for the second factor typeLt!t 0 j , I first gen- eralize the definition of the typed neighbor setDt 0 (i) = i0 2 V t 0 : 9e(i; i0) 2 Et!t 0 as the set of neighboring nodes of type t0 that are directly linked to a given node i of type t. I start by considering the case where ctij = 1: t!t 0 ij (1) = max ckj ;k 6=i Lt!t 0 j (c1j; : : : ; cij = 1; : : : ; cNj) + X l 6=i t!t 0 lj (clj) + X i0;j02V t0 t!t 0 l0j0 (c 0 l0j0) (3.12) 56 To evaluate the function Lt!t 0 j in the previous equation, we need to consider all the potential settings of the other variables ctkj of type t. For each node c t kj = 0, the function evaluates to the cost value !t!t 0 k for all the nodes k 0 of type t0 that are in the neighbor set Dt 0 (k), and are associated with with a value of c0k0j0 = 1. The function evaluates to zero in all other cases. To simplify the notation, considering a network with two different entity types, I refer to variables of type t with cij and the ones of the opposite type t0 with c0ij . During the derivation, I also remove the type qualification from the , , and ! values as I am focusing on deriving the messages of type t, depending on one alternate type t0 at a time. So, all the values mentioned in the equations afterwards are presumably qualified with t! t0 type dependency. Thus, the previous equation can be written as follows: ij(1) = X j02V t0 " X l0 =2f[xD0(x)g max cl0j0 l0j0(c 0 l0j0) + X i02D0(i) max i0j0(0)+ X k 6=i X k02D0(k) max( kj(ckj)+ k0j0(c 0 k0j0)); i0j0(1) + X k 6=i max h kj(1) + X k02D0(k) max c0 k0j0 k0j0(c 0 k0j0); kj(0) + X k02D0(k) max( k0j0(0); k0j0(1) !k) i # (3.13) The previous equation consists of two main parts: First, all nodes of type t0 that are not connected to any node of type t have unconstrained assignment to the any exemplar j0. The second part is a summation over the neighbor nodes of type t0 57 that are connected to the current node i; the function Lt!t 0 j evaluates to zero for the nodes in D0(i) that are not assigned to the current exemplar j0, and thus all other similar and opposite type nodes are now unconstrained. However, for the nodes i0 2 D0(i) that are associated with the value (c0i0j0 = 1), we need to consider all the other nodes k that are of the same type t as node i; if (ckj = 1) then the assignment of the nodes in D0(k) is unconstrained, while if (ckj = 0) the function Lt! t0j reduces to the cost value !k for all the nodes in D0(k) that are assigned to exemplar j0. Similarly, for cij = 0, the same derivation applies, and the value for ij(0) can be represented as follows: ij(0) = X j02V t0 " X l0 =2f[xD0(x)g max cl0j0 l0j0(c 0 l0j0) + X i02D0(i) max i0j0(0)+ X k 6=i X k02D0(k) max( kj(ckj)+ k0j0(c 0 k0j0)); i0j0(1) + X k 6=i max h kj(0) + X k02D0(k) max c0 k0j0 k0j0(c 0 k0j0); kj(1) + X k02D0(k) max( k0j0(0); k0j0(1) !i) i # (3.14) 58 By taking the difference between Equations 3.13 and 3.14, we get ij = X j02V t0 X i02D0(i) min i0j0+ X k 6=i min(0; kj); X k 6=i min(Ai;k; kj) min i0j0 + X k 6=i min(0; kj); X k 6=i min(0; kj Bk) ! (3.15) where the variables Ai;k and Bk are defined as follows Ai;k = X k02D0(k) min(!i; k0j0) min(0; k0j0) Bk = X k02D0(k) max(0; k0j0 !k) max(0; k0j0) 3.4 Experimental Evaluation To evaluate the proposed multi-relational affinity propagation approach, I show its performance on a number of cluster quality measures using both syn- thetic and real-world data. I compare the proposed algorithm to a variety of baselines, including the MMRC relational clustering algorithm proposed in Long et al. [68], as well as the original affinity propagation model and the modularity maximization algorithm[82]. The experimental results show that while the prop- soed approach doesn?t achieve the best performance on any single measure, un- like other baselines, it provides good performance across both the feature-based and the structure-based measures. I also show that the clustering that the algo- rithm generates is often closer to the real grouping of the data, when the ground truth is available, than the clusterings generated by the other baselines. 59 Figure 3.3: The performance of different clustering approaches for varying the levels of network assortativity. 3.4.1 Synthetic Data One of the main factors that affects the performance of the proposed al- gorithm is the level of assortativity[80] in the network, which is a measure of similarity among linked nodes. Networks with high assortativity levels tend to have connected communities of nodes that have similar characteristics, as well as highly correlated features of the connected nodes across different types. The pro- posed multi-relational AP algorithm can leverage the feature-similarity among the nodes, and, using the structural constraints, can capture the community con- nectivity; using both, it can boost overall performance. In order to test these effects, I generated a sample network of 250 users with 1209 homogeneous friendship links, and 100 social groups with 692 hetero- geneous affiliation links using the co-evolution model proposed by Zheleva et 60 al. [122]. Next, I used the labeling heuristic proposed by Rattigan et al. [86] to generate a set of labels for both node types. By varying the percentage of seed nodes for each label, different node labelings with different assortativity coeffi- cients can be generated. Finally, I generated a set of features for each node based on the assigned label using a Naive Bayes model. Figure 3.3 shows the performance of the clustering algorithms using nor- malized mutual information (NMI) [105] between the output clustering and the assigned labels for the user nodes. As shown in the figure, the proposed multi- relational AP approach outperforms the other baselines for moderate and high levels of assortative mixing. Similar trends were also obtained on the inferred labels for the social groups. 3.4.2 Social Media Data To evaluate the proposed multi-relational AP algorithm on social media data, I use a dataset from Digg.com, a popular social news website, where users can post stories on different topics, and then vote on them in a process referred to as ?digging? to determine the story?s ranking on the front page. Digg users form a social network by ?following? other users, which in turn results in the target user posts showing on their homepages. I constructed a sample from the Digg net- work which includes 104,478 ?following? links among 3750 users and their 438,379 ?digging? links to 3305 stories. I use the ?title? and ?description? of the stories to construct a normalized tf-idf word vector for each post, which is then used 61 ? ? (a) Modularity ? (b) Net Similarity Figure 3.4: The effect of varying the cost parameters ( , !) on the net similarity and the modularity of the output clustering to measure cosine similarities between different stories. Similarly for the users, I used the ?about? field that the user provides upon registration to compute the cosine similarities between users over the corresponding tf-idf vectors. The AP algorithm relies on setting a preference value for each node in the network that reflects the likelihood of this point being an exemplar, which then affects the number of clusters in the output. In the experiments, I follow the ap- proach that was proposed in the original AP model where there is no prior bias for certain nodes to be exemplars, and thus I set the preference value to the me- dian of the corresponding input similarities. For evaluation, I use both the mod- ularity of the resulting community structure and the total similarity, referred to as net similarity, of the exemplars to their assigned nodes in the output clustering to show the impact of different cost settings. First, I show the performance of the multi-relational affinity propagation al- gorithm over a range of cost values for both the homogeneous and heterogeneous 62 structural constraints. Figure 3.4(a) shows the performance of the proposed algo- rithm for both the modularity and the net similarity measures of the user nodes clustering in the Digg dataset. Note that for lower values of !, the modularity of the output clustering increases with increasing the value of , which corresponds to having higher costs on violating the homogeneous communities constraint. However, by increasing the value of the heterogeneous link cost ! for a given value of , the modularity of the output clustering increases initially, and then starts to decreases on higher values of !. This can be attributed to the fact that increasing the cost of violating the block constraint initially provides additional evidence for the clustering of the alternate node type, but after a given point it starts fragmenting the clustering output, resulting in an increased number of clusters which decreases the overall modularity of the output clustering. On the other hand, Figure 3.4(b) shows the trade-off in terms of the decrease in the net similarity of the clustering output with increasing the cost values. However, it can be noted from the figure that the average decrease in the net similarity across different cost settings is much lower than the increase gained in terms of the mod- ularity of the output clustering. Due to the lack of edges among story nodes, it is infeasible to show the effect of changing the value of on the clustering of stories, or to compute the modularity of the output clustering. However, by varying the cost of the block constraint, similar trends in the net similarity are obtained. To compare the proposed approach with other baselines, I use both Davies- Bouldin [18] and Dunn [26] indices for internal clustering validation, as well as normalized mutual information (NMI) [105] for external validation with the 63 Users Stories Modularity DB Index Dunn Index NMI DB Index Dunn Index MMRC 0.005 2.23 0.63 0.106 2.09 0.81 Modularity Maximization 0.458 2.4 0.57 N/A N/A N/A Affinity Propagation 0.072 1.504 0.67 0.209 1.86 0.86 Multi-relational AP 0.13 (0.28) 1.52 (1.54) 0.76 (0.78) 0.287 (0.34) 1.852 (1.859) 0.868 (0.87) Table 3.1: Comparison of MMRC, Modularity Maximization, AP, and multi- relational AP on different clustering evaluation measures. For multi-relational AP, the reported values are the average over all settings of the cost parameters, while the ones in parentheses are obtained from the optimal parameter settings, identified through an exhaustive search. The entries in bold face correspond to the best performance for the corresponding measure. ground truth when available. The results in Table 3.1 show that the proposed multi-relational affinity propagation approach results have a superior performance compared to the MMRC relational clustering algorithm on all evaluation mea- sures. By analyzing the clustering of the ?users? type, I find that while the mod- ularity maximization algorithm achieves the best modularity score for its out- put, it performs poorly on similarity-based measures. On the contrary, the origi- nal affinity propagation model shows better performance on the similarity based measures than the modularity score. However, the proposed multi-relational AP algorithm shows good performance on both structure-based and similarity-based evaluation measures, illustrating the balance that it is capable of achieving be- tween both paradigms. Finally, I move to the evaluation of the clustering for the ?stories? node type. Due to the fact that there are no links among stories in Digg, I am unable to compute the modularity measure. However, the stories on Digg are manu- ally assigned to a specific topic when posted. Therefore, the story topic can be used as an evaluation measure for clustering this node type, which enables the 64 computation of the normalized mutual information (NMI) quality measure of the output clustering. As can be seen in Table 3.1, the proposed algorithm achieves the best performance on all evaluation measures, including the NMI measure with the ground truth. This shows the value of the signal from block constraint and the favorable effect of coupling the clustering process across different node types. Another important advantage of the multi-relational affinity propagation algorithm over the MMRC algorithm is computational efficiency, as it is orders of magnitude faster than the MMRC algorithm. 3.5 Conclusion In this work, I presented a novel, multi-relational clustering framework for identifying latent groupings in complex network domains. The proposed ap- proach provides a simple and elegant way of extending the affinity propagation framework to multi-relational network domains by incorporating different soft constraints for capturing the structural dependencies among different types of nodes in the network. I showed how my proposed multi-relational affinity prop- agation algorithm could be used to output different clustering with varying de- grees of dependence on both the feature similarity of the nodes as well as their re- lational structure. I conducted a set of experiments on both a synthetic and a real dataset from from an online social news website, and showed that the proposed approach outperformed previous approaches for multi-relational clustering. 65 Part II The Temporal Dynamics of Multi-Modal Networks 66 Chapter 4 Understanding Actor Loyalty to Groups in Affiliation Networks After characterizing the evolution and clustering aspects of the multi-modal networks, I now proceed to characterizing the temporal dynamics of the interac- tions occurring in these complex networks, from both user-level and network- level perspectives. I start by analyzing the individual user behavior with respect to other entities in the network, and how this behavior changes over time. In this chapter, I introduce a method for analyzing the temporal dynam- ics of affiliation networks as an example of a 2-mode network. I define event- based affiliation groups as those describing temporally related subsets of actors and propose an approach for exploring changing memberships in these affiliation groups over time. To model the dynamic behavior in these networks, I introduce a measure that captures an actor?s loyalty to an affiliation group as the degree of ?commitment? an actor shows to the group over time. I evaluate the proposed loyalty measure using three real world affiliation networks: a publication net- work, a senate bill co-sponsorship network and a dolphin network. The results show the utility of the measure for analyzing the dynamic behavior of actors and quantifying the stability of their interactions with different time-varying affilia- tion groups. 67 4.1 Introduction Across many fields, researchers are interested in understanding an individ- ual?s commitment to a group [52], the social structure of groups [27], and the changing dynamics of group structure [100]. In marketing, researchers inves- tigate customer behavior, comparing the purchasing behavior of different cus- tomer groups in an attempt to determine customer satisfaction and brand loyalty [83]. In sociology, researchers investigate commitment [52], community cohesion [91] and structural embeddedness of social groups [76]. In computer science, re- searchers have also modeled time-varying links to improve automatic discovery of relational communities or groups [13, 46]. While some statistical models have been developed for longitudinal analysis of social networks (see Snijders [102] for an overview), work remains to better understand the variation in actor com- mitment or loyalty to groups over time. Social psychologists have investigated the role played by feelings of loyalty to groups. Druckman explains that ?loyalty to a group strengthens one?s identity and sense of belonging? [25]. In this chapter, I focus on an operational definition of loyalty to affiliation groups in an attempt to adequately measure this ubiqui- tous idea. Consistent with sociology literature [76], I believe that high loyalty may be an indicator of group cohesion. More specifically, I investigate actor loyalty to groups in two-mode affilia- tion networks. A two-mode affiliation network contains two different types of nodes, one for actors and one for events. Edges between actor nodes and event 68 nodes are used to indicate relationships between actors and events in which the actors participate [115]. Affiliation networks capture a wide variety of domains, including communication data among people (email, cell phone calls, etc.); orga- nizational data describing peoples? roles on teams or in companies; and epidemi- ological networks describing people and the specific disease strain with which they are infected. In time-varying affiliation networks, an actor?s participation in a particular event is associated with a specific time, representing when this par- ticipation occurred. Annotating affiliation networks with temporal information allows us to capture changing actor behavior over time. In this chapter, I focus on this changing behavior as it relates to groups. Consider an author/publication network describing authors, with the pub- lications represented as events in which the co-authors are participants. If the publications are annotated with topic areas, then I can create groups of actors who publish in the same topic area at the same time. Furthermore, I can see how loyal an author is to specific topic areas over time by examining their changing publication topics. One common scenario is that an author starts publishing in a specific area, then after some time she begins publishing in additional areas, and eventually she might end up switching areas completely. Another common scenario is that an author starts publishing in an area, and, rather than adding ad- ditional areas, remains steadfast, and continues publishing regularly in the same area over a long period of time. In this chapter, I introduce a measure that cap- tures this dynamic behavior of actors in time-varying affiliation networks by in- troducing the concept of affiliation group loyalty and define an actor?s loyalty 69 to an affiliation group as the degree of ?commitment? an actor shows to the group over time. 4.2 Loyalty Background Within literature across different disciplines, terms like loyalty, commitment, and cohesion have been given a number of different theoretical and operational definitions. my goal is not to provide an exhaustive literature review on these subjects, but rather to give a context for the remainder of the discussion on loyalty in affiliation networks. Sociologists first formalized commitment as a way to link extraneous inter- ests with a consistent line of activity [9]. While other definitions and theories concerning commitment exist in sociology [52] and social psychology [56], a def- inition proposed by [74] is as follows: Commitment is a force that binds an individual to a course of action that is of relevance to a particular target. Loyalty extends the concept of commitment. For example, [83] defines cus- tomer loyalty in terms of brand commitment (the strength of the relationship be- tween customers and a particular brand), and gives the following multi-faceted definition: Loyalty is a deeply held commitment to re-buy or re-patronize a preferred product/service consistently in the future, thereby causing repetitive same- 70 brand or same brand-set purchasing, despite situational influences and mar- keting efforts having the potential to cause switching behavior. A well known operational definition of loyalty in business literature defines brand loyalty as the percent of purchases devoted to one?s most often purchased brand [17]. Newman et al. [79] define loyal customers as those repurchasing a brand considering only that brand, while Tellis [110] views loyalty as repeat purchasing frequency or the relative volume of same brand purchasing. Jacoby et al. [49] state that frequent purchasing of a product is not synonymous with brand loyalty and that the notion of commitment is essential for distinguishing between brand loyalty and frequent purchasing of a product. While business research tends to focus on the economic component, based on purchasing behavior, social psychologists have investigated ways that people relate to groups. One dimension of this is the role played by feelings of loyalty to groups. Druckman [25] explains: The feelings of attachment that comprise loyalty are not whimsical, but are generally basic to the individual?s definitions of themselves. Loyalty to a group strengthens one?s identity and sense of belonging. As will be discussed later, I define a group in terms of related events. I fo- cus on an operational dimension of loyalty to affiliation groups in an attempt to adequately measure a ubiquitous idea. I demonstrate effects of my operational definition as it compares to frequency based brand loyalty in the business litera- ture. Consistent with sociology literature [76], I believe that a group containing 71 actors with high loyalty may be an indicator of a highly cohesive group. Because of the size and complexity of social networks, computer scientists, physicists and other scientists have also begun investigating different aspects of social networks. The community detection literature uses measures of cohesion and clustering to find subsets of actors that are densely connected to each other, but less densely connected to others. The majority of research conducted on com- munity detection focuses on static networks and constrains the problem by letting an actor belong to only a single community [13, 30, 38, 48, 64, 81]. Recently, researchers have begun to analyze the dynamics of communities over time [3, 4, 11, 32, 102, 106, 108]. Much of this research focuses on two ques- tions: what are the communities that exist in a particular data set, and how do they change or evolve over time. In contrast, the approach that I propose is a more micro-level analysis that focuses on the dynamics of specific actors or indi- viduals in the network. While I focus on creating groups using affiliation event attributes (as will be described in the next section), my analysis of actors can be conducted using the output from any grouping, clustering, or community detec- tion algorithm. Once the social groups are established, my goal is to understand the dynamics of actors and their social relationships in the context of these pre- defined social groups. One approach which also proposes methods for identifying important ac- tors in dynamic networks is the work of Habiba et al. [45]. They identify nodes in a single mode network that are likely to be good spread blockers, individu- als that can block the spread of a dynamic process through the population. To 72 accomplish this, they introduce dynamic measures for density, diameter, degree, betweenness, closeness and clustering coefficient. The graph summarization method proposed in Sharan et al [95] also uses a measure similar to ours to build a classifier for predicting evolving domains. While both of my measures attempt to quantify temporal aspects of the network, there are differences between their work and ours. First, the graph summariza- tion method is used to create an aggregation of network snapshots over time by weighting the edges according to the point in time in which they occur. In con- trast, my loyalty measure is used to quantify an actor?s participation pattern in different affiliation groups. Second, though the authors mention that their pro- posed weighting kernels are able to model both temporal recurrence and tempo- ral locality, which represents the aspects of consistency and recency I am address- ing in the loyalty measure, it is unclear how the weighting kernels used account for temporal recurrence. In contrast, the recursive formulation of the proposed loyalty measure encodes this aspect directly. 4.3 Modeling time-varying event-based groups There are many ways to define a group. Groups can be formed using com- munity detection algorithms, clustering algorithms, etc. Because I am interested in understanding groups based on affiliation networks, I describe an approach that defines groups based on a participation relationship between actors and events. Formally, an affiliation network can be represented as a bi-partite graph 73 G(A; E ;P) containing a set of actor nodes A, a set of event nodes E , and a set of participation edges P that connect actors in A to events in E : A = fa1; a2; a3; : : : ; ang E = fe1; e2; e3; : : : ; emg; and P = f(ai; ej)jai 2 A; ej 2 Eg: I denote participation of actor ai in event ej as pi;j . For clarity, I use a running example of an author publication network in which the actors are authors, the events are publications, and the participation relation is paper authorship. Figure 4.1 shows an example network with three author nodes, A = fa1; a2; a3g, fifteen publication nodes, E = fe1; e2; : : : e15g, and twenty paper authorship edges. As an example, participations involving actor a1 are the following: Pa1 = fp1;1; p1;2; p1;3; p1;4; p1;5; p1;7; p1;8; p1;9; p1;10; p1;11; p1;13; p1;14; p1;15g. Each actor node and event node can have attributes associated with them. For example, each author in Figure 4.1 may have a name and an age. For author a1, we may have the following attribute values a1:name = ?Peter Pan? and a1:age = 50. Each publication event may have a title attribute, e.g. e1:title = ?Static net- works as non-evolving dynamic networks? and a topic attribute, e1:topic = ?social networks?. In Figure 4.1, I use shapes to indicate topic. Since e1 is a circle, all the events that are circular have the same value for topic, e.g. ?social networks?. For ease of exposition, I map each shape to the following topics: circle - topic1, square 74 Figure 4.1: An affiliation network example with 3 actors, 15 events and 20 rela- tionships across 5 time points. - topic2, triangle - topic3, diamond - topic4. Because the affiliation networks are temporal, a discrete time attribute is as- sociated with each event ej , and is denoted as ej:time. For affiliation networks, this time is the same as the time of the participation relationship. In the exam- ple, the time attribute is the date of publication. I have labeled the time point associated with each event in Figure 4.1. While the publication event serves as a grouping of a subset of actors, this event only occurs at one particular time. Because my goal is to understand the dynamics of affiliation networks over time, I am interested in analyzing actor participation in groupings of similar events across time. I propose grouping events based on values of an event attribute. In other words, a social group is defined based on a shared event attribute value. The choice of a specific method for grouping actors depends on the semantics of the underlying analysis task. Using 75 shared event attributes is particularly meaningful for affiliation networks since it incorporates the semantics of events into the data model. For other types of social networks, particularly uni-mode networks, it is reasonable to use other methods for defining social groups. Each event feature or attributeF has an associated domainDomain = fg1; : : : ; gpg, where p is the number of distinct values of F . I denote a particular value gl of an event ej for event attribute F as ej:F = gl. Based on this, I define an affiliation group to be a subset of actors having the same group value gl at time t for an event ej : G(gl; t) = faijai 2 A; (ai; ej) 2 P ; where ej:F = gl and ej:time = tg. In the example, suppose the grouping attribute is topic. Referring back to Figure 4.1, G(topic1; 1) = fa1; a2g is the set of actors in topic group topic1 at time 1. I pause to mention a few advantages of my grouping formulation. First, ac- tors can belong to multiple affiliation groups at a particular time. In other words, membership in different groups can be overlapping. In the example, author a1 participates in five events at time 1. Also, actors are not required to be part of an event (or group) at every time t. This is also illustrated in the example. Author a1 participates in an event at every time step. Authors a2 and a3 do not. In my ex- perience, these assumptions better capture the dynamics of real world affiliation networks. 76 4.4 Loyalty of Individuals to Affiliation Groups In order to better understand the loyalty of an actor to groups based on event affiliation, the participation of the actor in different groups over time should be quantified. Based on the example in Figure 4.1, Figure 4.2 shows actor a1?s membership in topic groups, topic1, topic2, and topic3 across five time steps. The rectangles represent different topic groups. An edge from the author to a topic means that an author has published on the linked topic. The count on the edge represents the number of publications an actor ai has published on this topic during a particular time period. For example, the network snapshot of the first time period shows author a1 having three publications with topic1 and two pub- lications with topic2. As time continues, author a1 stops publishing on topic1, continues publishing on topic2 at each time step, and begins publishing on topic3 in the last time step. Intuitively, by considering the loyalty of the author at time step 5, it is preferable to see a higher loyalty score for topic2 since the author has published in this topic since time step 1. At time step 2, a topic shift occurs from topic1 to topic2. My goal is to create a measure that is sensitive to both continual group membership and changing group membership over time. I begin by considering two simple loyalty measures: frequent participation and recent participation, illustrating how poorly they capture the nuanced nature of loyalty in dynamic networks. Loyalty based on frequent participation, which I refer to as frequency-based loyalty considers an actor loyal if she appears in a group frequently. Let n(ai; gl) represent the number of participations of actor ai 77 in group gl and n(ai; ) represent the number of participations of actor ai in all groups. Then the frequency-based loyalty of actor ai is defined as the number of participations in a particular group gl divided by the number of participations across all groups: LoyFP (ai; gl) = n(ai; gl) n(ai; ) Using the running example, author a1 publishes in topic1 six times, topic2 six times, and topic3 one time. Therefore, LoyFP (a1; topic1) = LoyFP (a1; topic2) = 6=13 and LoyFP (a1; topic3) = 1=13. topic1 and topic2 are considered equally impor- tant even though the author has not published in topic1 since time step 2. Thus, considering frequency alone ignores the temporal component of the group affili- ation and results in assigning higher loyalty values to groups that the actor was once active in, but may not be active in any longer. Frequency-based loyalty can be viewed as a static measure of commitment. Focusing on the temporal aspect of the data, a recency-based loyalty measure considers an actor loyal if she has participated recently in a specific group. Let n(ai; gl; t) represent the number of participations of actor ai in group gl at time step t. The recency-based loyalty of actor ai is defined as the number of partici- pations in a particular group gl at the last time step tf divided by the number of participations across all groups at time tf : LoyRP (ai; gl) = n(ai; gl; tf ) n(ai; ; tf ) 78 In the example, when tf = 5, LoyRP (a1; topic2) = LoyRP (a1; topic3) = 1=2 and LoyRP (a1; topic1) = 0. Author a1 is equally loyal to topic2 and topic3 even though topic3 only appears in the current time step. If the last two time steps are considered (using a recent window as opposed to a recent time point), then a1 is most loyal to topic2. While this is accurate, the strong early participation of actor a1 to topic1 is not captured at all since LoyRP (a1; topic1) = 0. Using recent partic- ipation leads to assigning an actor high loyalty values for groups that the actor participates in during current time steps, but it disregards earlier participation. Figure 4.2: Single actor dynamic affiliation example This simple example shows that a temporal measure of affiliation group loy- alty should incorporate participation frequency for giving higher preference to actors with a large number of participations in the affiliation group, consistency for putting more bias toward actors with regular group participations across time over those with more sparse, isolated participation, and recency for favoring ac- tors with current participations. In order to capture all of these aspects, I incor- porate frequency and recency based loyalty into a more comprehensive measure of loyalty. Let Ttot represent the total number of time points over which the dynamic affiliation is defined. The loyalty of an actor to a group that she has not partic- 79 ipated in yet is equal to zero. In order to keep track of consistent participation over time, we need to keep track of the actor?s loyalty in the time step that pre- cedes the current one. Thus, I define tprev as the previous time point (relative to the current time point t) that actor ai participated in group gl. Let n(ai; gl; t) be the number of participations of actor ai in group gl from the starting time point t0 until the current time point t, and let n(ai; ; t) be the number of participations of actor ai to all groups from t0 until time t. I define the loyalty of an actor ai to a group gl on his first participation in the group as Loy(ai; gl; t0) = n(ai; gl; t0) n(ai; ; t0) where t = t0 and the loyalty on any consecutive participation is given by Loy(ai; gl; t) = n(ai; gl; t) n(ai ; t) Loy(ai; gl; tprev) t tprev Ttot where represents a smoothing parameter that will be described shortly. By examining the different components of the loyalty measure, we note that the first term, n(ai;gl; t)n(ai; ; t) , accounts for the frequency of participation of an actor into a specific group. The second term includes the component Loy(ai; gl; tprev) which takes into consideration the most recent recorded loyalty for an actor in a specific group, gl, and is used to favor recent participation in that group. Finally, to favor continuous actor participation, the second term includes an exponent term for the recent loyalty. This decreases the effect that the loyalty in the previous time 80 step has on the calculated loyalty in the current time step based on how long in the past this previous participation occurred. The more recent and continuous the participation, the larger the effect of this component on the overall loyalty of the actor to the group. The smoothing parameter is introduced to control the overall effect of time. The value of can be varied from 0 to Ttot. A value of 0 means Loy = LoyFP , focusing on the frequent participation component of the measure. A value of Ttot means that the recent participation component of the measure is dominant. For exploratory analysis, setting the value of to 1 represents a good initial point to start off, where the loyalty accounts for both the frequency and the recency factors. For consistency with the group membership notation, where the actor mem- bership values in various groups sum up to 1, the values of loyalty of a specific actor to various groups that she participated in over the considered time period are normalized. As a result, the final loyalty value of actor ai to group gl at the final point in time tf can be defined as follows Loyalty(ai; gl; tf ) = Loy(ai; gl; tf ) P j Loy(ai; gj; tf ) where the summation parameter j ranges over all the groups that actor ai partic- ipated in during the entire time period. Returning to the earlier example, we see that the proposed measure results in the desired effect. Setting the value of ( = 1), the results for actor a1 loyalty to 81 Figure 4.3: The evolution of loyalty over time for the affiliation network example different topics are as follows: Loyalty(a1; topic1; t5) = 0:429 Loyalty(a1; topic2; t5) = 0:474 Loyalty(a1; topic3; t5) = 0:097 The evolution of the author?s loyalty for each topic at each time step with = 1 is illustrated in Figure 4.3. topic1 begins with the highest loyalty at time 1. Its loyalty increases at time 2 and then begins to decline. After time 4, author a1?s loyalty to topic topic2 overtakes that of topic1 because of the effect of recency. To further illustrate the effect of the smoothing factor, Figure 4.4 shows the different values for the loyalty of the author to all the topics at the final time step 82 by varying the value of . When = 0, the loyalty values are the same as if only the normalized frequency, LoyFP , is considered. As the value of increases, the effect of recency starts to dominate the frequency. At the maximum value of ( = 5), the highest loyalty score is assigned for topic3, which corresponds to the most recent group. Figure 4.4: The effect of the smoothing factor in calculating group loyalty The interpretation here is that as the value of increases, the measure fa- vors new topics occurring at the last time point. Therefore, topic3, which has just occurred at the last time point, dominates all other topics because its loyalty is not decreased by previous occurrences. The loyalty values of topic1 and topic2 are overwhelmed by the large exponential factor resulting from the large . 83 4.5 Loyalty Analysis on Individual Data Sets I analyze my proposed loyalty measure on three data sets - a scientific pub- lication network, a senate bill sponsorship network and a dolphin social network. In order to consider frequency, consistency, and recency, I set = 1. 4.5.1 Scientific Publication Network The scientific publication network is based on publications in the ACM Computer-Human Interaction (ACMCHI) conference from 1982 until 2004. Sim- ilar to my running example, this data set describes an author/publication affil- iation network. The data set was extracted from the ACM Digital Library and contains 4,073 publications and 6,358 authors. There are 12,727 participation re- lationships (edges) between authors and publications. In this data set, I filtered 5230 authors having only one publication over the entire period of time since no ?dynamic? group loyalty exists for these actors. Also, by removing them, I avoid biasing the average loyalty statistics calculated for the data set. The remaining 1,128 authors had 4,688 relationships with publication events. There are a number of features that the publication events can be grouped on; for this analysis, like the simple running example, publications were grouped by their topic. There are 15 different topics, and the loyalty of authors to different topics was measured. The results of applying the proposed loyalty measure on the ACMCHI data set are shown in Figure 4.5. This box plot highlights the aver- age loyalty, outliers and the amount of spread for actor loyalty to different topics. 84 Figure 4.5: The average topic loyalty for the scientific publication network The topic loyalty of authors range from 0.2 to 1, while the average topic loyalty ranges from 0.5 to 0.65 for all 15 topics. While there are a number of interesting observations to be made, I highlight two of them. First, the average topic loyalty is fairly uniform across the topics. This is an indication of the continued impor- tance of these topics at the ACMCHI conference. Second, the average loyalty of authors to topic groups is very high across all the topics. This is an indication that, in general, authors in this data set consistently published in a particular research area as opposed to oscillating among multiple areas. To better understand the distribution of author loyalty as it relates to an au- thor?s employer type, Figure 4.6 shows the average loyalty of authors categorized by employer type (corporate institutions, universities, research laboratories, and 85 Figure 4.6: The average topic loyalty grouped by institution type for the scientific publication network. government). One interesting result is that authors from corporate institutions, i.e. Inc., have a statistically significant higher average loyalty to their topic areas than the authors from academic institutions (like universities and research labo- ratories). One possible explanation for this is that authors from corporate insti- tutions are more likely to publish in an area that coincide with corporate product or research goals, while authors from academia have more flexibility in terms of research agenda. 4.5.2 Senate Bill Sponsorship Network The senate bill sponsorship network is based on data collected about United States senators and the bills they sponsor ([43]). The data contains senators? demographic information and the bills each senator sponsored or co-sponsored from 1993 through February 2008. Each bill has a date and topics associated with 86 Figure 4.7: Average topic loyalty across all topics in the senator bill sponsorship network it. I group the bills using their high-level topic, and then measure the loyalty of senators to different topics. After removing the senators that do not sponsor a bill or sponsor only a single bill and removing bills that do not have a topic, my analysis uses 181 senators, 28,372 bills, and 188,040 participation relationships spanning 100 high level topics. When considering only the topics that each senator is most loyal to, the three bill topics that have the highest average loyalty values are Commemora- tions, Senate, and Congress. This average loyalty ranges from 0.22 to 0.27. By in- vestigating the data set, I found that these three topics constitutes 56,035 (approx- imately 30%) of the total number of sponsorship/co-sponsorship relationships. This finding seems consistent since bills with these topics occur frequently, reg- ularly, and have a large number of senators sponsoring them. Figure 4.7 shows 87 the 10 bill topics with the highest average loyalty across all the topic groups each senator sponsors a bill in. When looking across all topics for each senator, foreign policy has the second highest average loyalty value. Because the United States has been at war in recent years, this result is not surprising. The average loyalty of senators to bill topics is generally low. This is because of the large number of bills sponsored by senators across a large number of topics. Many may find com- fort in this result since senators supporting bills across topics can be interpreted to mean that they are servicing a wider constituency. To better understand the changes in loyalty over time, I investigate the changing dynamics of a particular senator?s loyalty over time. I selected the senator that sponsored the largest number of bills - Senator Edward Kennedy, a democrat from Massachusetts. As illustrated in Figure 4.8, I calculated his group loyalty at 5 different time points. Although he sponsors bills across 130 topics, the graph shows nine topics with the highest means and standard devia- tions for loyalty values across the entire time period. During each time period, he consistently sponsors or co-sponsors roughly 10% of the Senate bills. The fig- ure illustrates that Senator Kennedy starts out with a distribution of loyalty that favors a small number of bill topics. He does not sponsor bills across all the top- ics listed. Over time his loyalty to some of the topics decreases and increases to others as highlighted in the figure. It is also interesting to note that the variance of his loyalty across the topics decreases over time. Finally, I briefly consider the 2008 presidential election. Examining results in the spring of 2008, in the time period preceding the 2008 fall presidential elec- 88 Figure 4.8: Changing loyalty over time for Edward Kennedy in the senate bill sponsorship network tion, I compared the loyalty of the presidential candidates, John McCain, Barack Obama, and Hillary Clinton across a subset of bill sponsorship topics. The results are shown in Figure 4.9. These bill sponsorship loyalty values are consistent with priorities emphasized on the campaign trail. All the candidates have strong po- sitions on foreign policy. Senator McCain made it a centerpiece of his campaign. 89 Figure 4.9: Average topic loyalty of 2008 presidential candidates in the senate bill sponsorship network Senator Clinton had highlighted her commitment to health care. Both Senators Obama and Clinton also spent a lot of time discussing education. Interestingly, Senator McCain?s loyalty to sponsoring education bills is very low. 4.5.3 Dolphin Social Network I also consider an affiliation network based on a data set describing a long- term study of a wild bottlenose dolphin (Tursiops sp.) population in Shark Bay Australia [70]. It is the most comprehensive dolphin data set in research today with over 20 years of behavioral, reproductive, demographic and ecological data on wild bottlenose dolphins. For this analysis, I focus on observational surveys, collected by researchers on the Shark Bay Dolphin Research Project (SBDRP). 90 Data gathered includes location, animal behaviors, associates, habitat, photo- graphic information, and physical data (e.g., scars, condition, speckles). These surveys are brief, typically lasting 5 to 10 minutes. They are used to present a ?snapshot? of associations and behaviors among dolphins. The affiliation network is defined by using dolphins as actors and surveys as events. Dolphins observed in a survey constitutes the participation relationship. I group survey observations together by the location the observation takes place. There are six different general regions in this data set. Similar to the other analy- sis, I remove dolphins with few sightings (less than 5) as well as the surveys with no location. After doing this, the analysis includes 560 dolphins, 10,731 surveys, and 36,404 relationships between dolphins and surveys for the loyalty analysis. Figure 4.10 show the average loyalty of dolphins to different locations based on the observational surveys. First, the average loyalties of dolphins across all lo- cations ranges from 0.45 to 0.9. Some locations appeared to invite higher loyalty than others, e.g. East and Red Cliff Bay. One explanation for this is the vary- ing habitat structure. For example, East, which has the highest loyalty, is mostly deep channels bisected by shallow sea grass banks. Many dolphins spend a large amount of time foraging. The extensive habitat heterogeneity might limit the re- gion to dolphins with certain foraging specializations (channel foragers or sea grass bed foragers) [72]. For example, a subset of the dolphins in this popula- tion use sponges as foraging tools, and will forage almost exclusively in the East channels [92]. Peron is at the tip of the peninsula and is a very open area where the western and eastern gulf meet. This open habitat (to the Indian Ocean) may 91 Figure 4.10: Average location loyalty for dolphins allow for great mobility and less loyalty when compared to other areas. Previous work by project biologists indicates calves are most tied to the locations of their mothers and maternal foraging type [71]. After weaning, ju- veniles might range further and develop bonds with others separate from the mother. Figure 4.11 looks at the distribution of location loyalty among different age groups: calves (0-4 years), juveniles (5-11 years), young adults (12-24 years), and old adults(25+ years). The results indicate that loyalty decreases with age, but still remains very high. This may occur because older dolphins travel more during the course of their life and they explore more places, while calves tend to have higher loyalty to a small number of locations (which happen to be the ones their mothers are also in). Location loyalty is a nice indication of long-term residency in the population and allows researchers to track individuals over long 92 Figure 4.11: Average location loyalty for different dolphins? age groups periods of time. 4.6 Comparative Loyalty Analysis I now compare the average actor loyalty across these different communities. As the loyalty metric values can vary from zero to one, I divided the range of loy- alty into three bins; low loyalty (scores from 0 to 0.25), moderate loyalty (scores from 0.25 to 0.75), and high loyalty (scores from 0.75 to 1). The results in Figure 4.12 shows the percentage of actors with loyalty scores falling in each of the three bins for the scientific publication network, the dolphin survey network, and the political bill sponsorship network, respectively. This figure highlights the different distribution of actor loyalty in the different data sets. The results for the ACMCHI publication network show that 79.2% of the 93 authors have moderate loyalty and most of the rest (20.4%) have high loyalty to the topic of their publications. In the political data set, 53.3% of the senators have moderate loyalty to the topic of the bill they sponsor, and the rest fall in the low loyalty category. For the dolphin affiliation network, we can observe that most of the dolphins (61.5%) have high loyalty to their locations. This large variation in the distribution of actor loyalty across data sets reinforces the utility of a measure that captures changing loyalty of actors to affiliation groups. Figure 4.12: Loyalty Comparison Across Data Sets These classification results are consistent with the interpretations of com- munity loyalty presented in the previous section. The figure highlights the vary- ing distribution of actor loyalty to groups in each affiliation network. As a final analysis, for each affiliation network I consider the average number of events each actor participates in. This allows us to compare the loyalty of these affilia- 94 tion networks to the density of the connections in the network. The averages are as follows: 1. Average number of Publications per Author = 3.61 2. Average number of Bills per Senator = 159.94 3. Average number of Observations per dolphin = 19.16 The network with the highest density is the senate bill sponsorship net- work, followed by the dolphin social network. The author publication network is much more sparse than the other two networks. Interestingly enough, the loyalty categories are not completely consistent with these frequency averages, thereby affirming that frequency alone may not be sufficient to capture loyalty. 4.7 Comparison with centrality measures It is natural to want to understand how loyalty compares to existing central- ity measures. Does it capture the same information, or does it provide additional insight? I begin by comparing actor loyalty to the most common centrality mea- sures. The first centrality measure used is betweenness centrality, calculated by computing all pairs shortest paths in the network and computing the number of shortest paths that the target node occurs on. The second centrality measure used is the closeness centrality, defined as the average of shortest paths from the target nodes to all other nodes reachable from it. Lastly, eigenvector centrality measures importance of a node based on the importance of neighboring nodes. For more 95 Figure 4.13: Loyalty vs. Centrality for Scientific Publication Network detail about these measures, refer to [115]. For each affiliation group I consider the set of actors participating in the group and generate the underlying co-membership subnetwork. I do not gen- erate a single clique of actors as the subnetwork of the group, but instead cre- ate a static subnetwork using the aggregation of projections of all the participa- tion edges at each time point. We then compute the various centrality measures on each of the generated subnetworks corresponding to each of the affiliation groups and compare the centrality measures to the loyalty scores of the actors in each group. As can be noted in Figure 4.13, the scatter plot between loyalty and various centrality measures on the publication data set shows authors having all combination of values for both measures, with no visible trend in the results. 96 Note that actors may appear multiple times in this figure because they participate in multiple affiliation groups. The same results holds for the other two data sets. Figure 4.14: Author publications in ?Information Visualization? topic To further investigate loyalty to a particular topic, I take a more detailed look at the ?Information Visualization? topic as a sample affiliation group. For this group, Benjamin Bederson is ranked as the author with highest betweenness and eigenvector centrality. However, by examining Figure 4.14, I notice that his publi- cation pattern is neither consistent across time nor numerous. This is also true for Robert Spence who was ranked first according to the closeness centrality. On the other hand, the time-consistent, recent and numerous publications of the most loyal author, namely Stuart Card, as shown in the same figure, illustrates exactly what the proposed loyalty measure captures that the other centrality measures missed. 97 4.8 Conclusion I proposed a new measure for capturing loyalty in time-varying affiliation networks. I begin by defining affiliation groups, which describe temporally re- lated subsets of actors. This is accomplished by grouping events over time based on attribute values. To model the dynamic behavior of affiliations to groups, I consider the concept of loyalty and introduce a measure that captures an actor?s loyalty to an affiliation group as the degree of ?commitment? an actor shows to the group over time. I compare the proposed measure to both frequency-based loy- alty and recency-based loyalty and find my measure to be more flexible since it incorporates components for frequency, consistency, and recency. I then demon- strate its utility on three real world affiliation networks: a publication network, a senate bill co-sponsorship network, and a dolphin network. It is interesting to note that the distribution of actor loyalty varies substantially across data sets, thereby reinforcing the utility of a measure that captures changing loyalty of ac- tors to affiliation-based groups. 98 Chapter 5 Differential Adaptive Diffusion: Understanding Diversity and Trust Dynamics in Complex Networks After analyzing the user-level interactions in affiliation networks, I now move to a macroscopic analysis of network dynamics, investigating the process of information diffusion in complex network settings. Information diffusion tech- niques focus on modeling the spread of innovations, diseases, products, etc. on the existing social network among users. A number of probabilistic models have been proposed in the literature to model the diffusion process, and estimate the levels of adoption/infection in the network given a set of initial seeds. Although these models capture the dynamics of diffusion within a given scenario, they have a number of shortcomings. The first drawback lies in the fact that these models are independent of the nature of information that is spreading over the network. Treating the information as an orthogonal dimension to the diffusion process loses a wealth of information in the existing relationships between the users and the ideas that are spreading. For example, in viral marketing domains, traditional techniques neglect the existing user-product preference networks. My first hypothesis is that leveraging these additional dependencies allows us to bet- ter model the diffusion process. The second drawback of existing information diffusion models, is the exist- 99 ing implicit assumption that the underlying social network structure is a static proxy for the influence among users. This assumption fails to model the dynam- ics of the relationships among users, such as their trust in each others? recommen- dations, which is crucial to the diffusion process. Returning to the viral market- ing example, if there exists a spammer in the network who continuously make irrelevant product recommendations for her peers, traditional diffusion models would not be able to capture the decrease in the influence along the correspond- ing links. Thus, my second hypothesis focuses on the premise that an adaptive diffusion model that captures the change in confidence values among users as a result of their prior interactions would better model the true underlying dynam- ics of information diffusion. To evaluate my hypotheses, I propose a differential adaptive diffusion model for complex networks. I focus on the diffusion of different stories / news posts in a real-world network extracted from the Digg social news website. By analyzing the diffusion in the Digg network, I provide insights into the effects of network dynamics and topic preferences on the adoption of stories of different topics. The experiments show that the proposed model outperforms earlier non-adaptive dif- fusion models in predicting future adoptions. I also discuss the implications of the proposed adaptive diffusion model on identifying influentials in social net- works. 100 5.1 Introduction How information diffuses through social networks is a question that has attracted the attention of scholars from a wide variety of research disciplines. A richer understanding of the mechanism governing the spread of new ideas or trends in social media has implications for marketing, sociology, journalism, computer science and many other research areas. Models of network diffusion have been used to study phenomena as widespread as product recommendation systems [65], viral marketing [20, 61], disease transmission [19], herding behavior in financial markets [24], and even the contagion properties of obesity [16]. This is in part because the widespread growth and use of online social networks has created a new opportunity to observe diffusion processes on a very large scale and across different types of interactions from email to microblogging to photo- sharing. Most of the existing information diffusion literature builds upon the premise that the social network can be used as a static proxy for influence, where the dif- fusion process is mainly dependent upon the structure of the network. However, in real settings, the influence within social networks is not static. As consumers continue to listen to their friends and family, they learn that some of their social connections have recommendations that are more appropriate for them and that other members of their social network simply do not have the same interests as they do. This is in part because different individuals are interested in different topics. 101 As an example, for someone who is not interested in sports, if a friend is constantly talking to her about new sports developments, sending her emails and links to promotions for sporting events, then this friend is essentially act- ing as a spammer and the focal individual will eventually decrease their trust in her recommendations. However, if another friend makes a recommendation and the focal individual adopts the product that they recommend, then the trust of the focal individual in that friend?s recommendation will increase. As a result of these processes, the social network of confidence changes over time as a re- sult of the diffusion and adoption process. Although the dynamics of social trust has attracted the attention of multiple researchers [40], most information diffu- sion models do not fully address the social trust aspect, nor the heterogeneity of preferences that individuals have for different topics. In this chapter, I present an adaptive model that addresses these shortcom- ings by allowing individuals to have different preferences for different types of information, while adapting their confidence in other individuals? recommenda- tions on the basis of their historical interactions. I show the novelty in my model over previous ones which assume the confidence that a user has in her peers re- mains constant over time and that the preference for adoption is not dependent on the type of information being diffused. By incorporating network-level dy- namics into a standard diffusion model and allowing for heterogeneous prefer- ences, the proposed model provides a better prediction of expected users? adop- tion. Finally, I discuss one application of the proposed model for the problem of identifying influentials and seed users for maximizing the adoption process. 102 5.2 Background One of the first and most influential diffusion models was proposed by Bass [8]. This model of information diffusion predicts the number of people who will adopt an innovation over time, and though it does not explicitly account for the social network, it does assume that the rate of adoption is dependent on other members of the population, specifically the current proportion who have already adopted. The diffusion equation used by this model describes the cumulative proportion of adopters in the population at any time as a function of the intrin- sic adoption rate and a measure of social contagion. The model describes an S-shaped curve, where adoption is slow at first, then takes off exponentially and flattens at the end. The Bass model has been shown to effectively model word-of- mouth product diffusion at the aggregate level [69], but does not explicitly model the decision of an individual consumer. Though the Bass model can easily be generalized to address individual- level decisions [104], most diffusion models that capture the process of adoption of an idea or a product at an individual level use different mechanisms and can generally be divided into two groups: threshold models and cascade models. Thresh- old models are based on the work performed by Granovetter [44] and Schelling [93] in the late 70?s. Basically, each individual, v, in the network has a personal adoption threshold v 2 [0; 1], typically drawn from some probability distribu- tion. A given individual v in the network adopts a new idea or product if the sum of the connection weights of its neighboring peers that have already adopted it 103 N(v) is greater than her personal threshold: X u2N(v) w(u; v) v: Although the above model represents a linear threshold model, it can be easily gen- eralized further with replacing the summation with an arbitrary function on the set of active neighbors of individual v. Dodds and Watts [19] have also shown that a more general model than this can be used to describe both the Bass model and the threshold model. Cascade models [41] were originally inspired by research on interacting par- ticle systems. In these type of models, whenever a peer u of an individual v adopts a given idea, then individual v also adopts with probability pu;v. In other words, each individual has a single, probabilistic chance to activate each one of her currently inactive peers, after becoming active herself. A very common ex- ample is the independent cascade model, in which the probability that an individ- ual is activated by a newly active peer is independent of the set of peers who have attempted to activate her in the past. Kempe et al. [55] proposed a broader framework that simultaneously generalizes the linear threshold and independent cascade models, having equivalent formulations in both cases. Regardless of the adoption model, one of the key aspects that affects infor- mation diffusion is the interaction structure. For instance, a model for product adoption in small-world networks was proposed by Centola et al. [14], where an individual?s probability of adopting a product is dependent on having more than 104 one neighbor who has previously adopted the product. Wu et al. [119] modeled opinion formation on different network topologies and found that if highly con- nected nodes were seeded with a particular opinion, this would proportionally affect the long term distribution of opinions in the network. The work of Holme et al. [47] focuses on coupling the evolution of both the social network and opinion formation, where both aspects adapt to each other during the evolution process. Once a diffusion model and a network topology are specified, the next ques- tion is which set of individuals should be targeted to maximize the spread of information throughout the network. The problem of influence maximization was formalized by Domingos et al. [21], who noticed that ordinary data min- ing techniques that reason about consumer behavior in independent settings do not utilize network information. They proposed a probabilistic model of user- interaction to study influence propagation in networks, and then explored how to identify a group of individuals, who if they adopted a product, would max- imize the speed and amount of adoption throughout the network. Even before Domingos et al. formalized this problem, one hypothesis as to how to maximize diffusion centered around the concept of influentials, who are individuals that have a disproportionate effect, compared to average individuals, on the amount and rate of information diffusion. In many information diffusion models, it has been shown that the most influential individuals in a network are the most cen- tral, where centrality is measured in a variety of different ways, including the most highly connected nodes, i.e. degree centrality [115, 2]. Other solutions have also been proposed, for instance, Stonedahl et al. [104] show that not only is de- 105 gree centrality important in maximizing diffusion, but in real social networks it is important to consider the clustering of a node?s neighbors since tight clustering slows the diffusion process. 5.3 Case Study: Digg Many popular online social network platforms allow for individuals to rec- ommend items of interest and exchange knowledge. One such example is Digg.com, which is a popular social news website, where users can share and vote on dif- ferent stories, referred to as ?digging?, to elevate the ranking of the story on the website. Digg?s users form a social network by ?following? other users in the net- work, which enables automatic tracking of their future diggs and submissions. Each news story on Digg belongs to one of ten topics; Business, Entertainment, Gaming, Lifestyle, Offbeat, Politics, Science, Sports, Technology, and World News. I constructed a sample from the Digg network which included both the diggs and follows for 11,942 users and the stories they submitted over a 6 months period (Jul - Dec 2010). The sample include 1.3 million follows relationships among the users, with over 1.9 million diggs, on 48,554 news stories. The network alone is not enough to describe the diffusion process in a net- work, it is also important to understand the mechanism by which a user pro- vides recommendations to their peers. These mechanisms differ by platform and marketing strategy. For example, some mechanisms are based on broadcast tech- niques, where all the peers of a given user are informed when she adopts a prod- 106 5% 10% 15% 20% 25% er ce n ta ge o f s to ri es 0% Pe rc en ta ge o f s to ri es Topics Figure 5.1: Topic distribution of stories in Digg dataset uct. In other settings, the user has to explicitly select peers to send her prod- uct recommendations to after adoption. Digg.com uses a broadcast mechanism, where connected users are able to see all the activities of their peers as soon as it is performed. 5.3.1 Analysis I begin by analyzing the topic distribution of the news stories in the col- lected data. As shown in Figure 5.1, though there are differences, all ten topics are represented at comparable levels in the dataset, without a single topic dom- inating the others. Technology, Entertainment, and Lifestyle are among the topics with higher frequency, while Gaming, Science, and Sports are the ones with lowest number of submissions. I use the topic distribution of individual user submissions (the actual stories 107 / links they submitted), as opposed to their diggs, as an influence-independent source for determining a user?s topic preferences. Given this topic distribution, I then measure the correlation between the users? topic preferences and their ac- tual adoptions, i.e., their diggs. Figure 5.2 shows the Kullback-Leibler divergence between the topic distribution of the users? submissions versus their diggs. For most users, there is very little divergence between their adoption behavior and their inferred preferences according to their submissions. However, in approxi- mately 10% of the users, there is a quite significant difference between the topic distribution of the stories they digg and the ones they submit. One possible expla- nation is that while most people adopt only stories of interest to them, there are a smaller percentage of ?imitators? who are easily influenced by their peers and do not weight their own preferences as highly. Similar results were obtained using normalized mutual information (NMI) between the topic distribution of users? preferences and adoptions, with imitators appearing to be even more prominent (e16% of the users). In order to characterize users? topic preferences, I measure the KL-divergence between the topic distribution of each user?s submissions and a uniform distribu- tion of topics. Lower values indicates that the user?s submission pattern is closer to uniform, while higher values indicate that the user is more interested in cer- tain topics but not in others. From Figure 5.3, we can distinguish three different groups of users in the network: Focused users (e53% of the users) who are char- acterized by having highly skewed preferences towards one or two topics, Biased users (e32% of the users) who have less skewed preferences towards a larger set 108 0% 2% 4% 6% 8% 10% 12% 14% 0.01 0.1 1 10 P er ce nt ag e of U se rs KL-divergence (log scale) Figure 5.2: KL-divergence between the topic distribution of users? submissions and diggs. of topics, and Balanced users (e15% of the users) who have almost-uniform topic preferences in their submissions. Finally, I analyze the dynamics of change in the nature of the social relation- ships between users, and how it affects peer influence over time. I hypothesize that as time passes, peers with similar preferences in topics start gaining con- fidence in each other?s recommendations, yielding higher levels of adoptions, while on the other hand, peers whose preferences are farther apart from each other become less confident in each other?s recommendations, resulting in lower adoption levels. To test my hypothesis, I measured, at different time points, the average number of diggs on the same story by different peers for different values of KL-divergence between their topic preferences. Figure 5.4 shows that peers with lower KL-divergence in their topic preferences increase their number of shared diggs over time, while the ones with higher levels of divergence have 109 10% 15% 20% 25% Pe rc en ta ge o f u se rs 0% 5% 0 2 4 6 8 10 12 14 Pe rc en ta ge o f u se rs KL-divergence Figure 5.3: KL-divergence between uniform topic distribution and users? submis- sions a decreasing pattern of adoptions over time. 5.4 Differential Adaptive Diffusion The input social network can be viewed as a directed weighted graphG(V;E), where V represents the network users, and E represents the social relationships among them. Each edge e(u; v) 2 E is associated with a confidence valuewi(u; v) 2 [0; 1] representing the confidence user v has in the recommendations of her peer u during campaign i. This confidence value wi(u; v) is updated only once per campaign, and in general this update could take place either immediately after a recommendation or at the end of a campaign. In the model results presented here, the confidence weights are only updated at the end of a campaign. Given a preference function F(v; c) : V C ! [0; 1] that quantifies user preferences for different product categories c 2 C for a given user v, the probability of node v 110 adopting a product of category c 2 C within campaign i as a result of node u adopting it can be defined as: p(u; v) , wi(u; v) F(v; c) To start a new campaign for a certain product xc of category c, a market- ing incentive is provided to a chosen set of seed nodes in the network to initi- ate the diffusion. As the diffusion process unfolds, the set of nodes who adopt the product at each time step, t, referred to as the ?active? nodes, influence their peers through recommendations. These recommendations cause their neighbors to consider whether or not to adopt the product. The adoption function can take any form including any of the functions described in the background section, but throughout the following discussion I will assume an independent cascade pro- cess. Thus each active node u in time step t has a single chance of activating a peer v that has not already adopted the product where it succeeds with probabil- ity p(u; v), which will result in v adopting the product. Once node u attempts to activate an inactive node v, it can never attempt to activate node v, in any future time step, i.e., node u will return to an inactive but adopted state after this time step. Given the set of active neighbors Nt(v) of a given inactive node v at time t, the posterior probability of v adopting the product at time t + 1 can be defined as pt+1(v; xcjNt(v)) = 1 Q u2Nt(v) (1 p(u; v)). When a node adopts the product, it becomes active and starts activating its currently inactive neighbors at future time points. The diffusion process continues until no further adoptions occur for 111 Figure 5.4: Heat map of the average number of diggs for different values of topic divergence between peers across time. the current product. At the end of each campaign, the confidence values among peers are up- dated according to the outcome of the product recommendation across the cor- responding edge. I denote by t i (v) the time step within campaign i at which a node v adopts the product. If a given node u ends up not adopting the product by the end of campaign i, t i (u) is set to1. Using a kernel function K, the change in confidence values at the end of campaign i for product xc can be calculated as Wi+1 = K(Wi; ), where 2 [0; 1] is a kernel parameter specifying the rate of 112 change. For instance, a linear kernel can be defined as: KL(Wi; ) = 8 >>< >>: 1 wi(u;v)t i (v) t i (u)+1 ; t i (u) <1^ t i (v) <1 wi(u;v)tmaxi (v) t i (u)+1 ; t i (u) <1^ t i (v) =1 where tmaxi (v) = maxt i ft i (u) : (u; v) 2 E ^ t i (u) < 1g represents the time of the last adoption by any of v?s peers. This linear kernel assigns credit to each peer u of a node v proportional to the elapsed time between that peer?s recommendation and node v adopting the product. The intuition is that the node u, that last recommended the product, has the highest impact for influencing node v to adopt the product, and thus should be assigned higher confidence in her future recommendations to v. If node v ends up not adopting the product by the end of the campaign, each peer u who recommended the product to node v is penalized relative to the time of the last recommendation. In this case, the last person to recommend the product, even though v still has not adopted it and will not adopt it, gets the maximum penalty for their recommendation. Different types of kernels can be used to control the dynamics of the confi- dence levels in the network. For instance, this kernel could be exchanged with a kernel where only the last node to provide a recommendation is penalized or rewarded, as opposed to all nodes, or one where all nodes are punished or re- 113 warded equally. Regardless, as a new campaign is initiated for a different prod- uct, the new, updated confidence values are used to compute the influence proba- bilities, thus enabling the model to capture the dynamics of the diffusion process across different product types. 5.5 Influentials One of the tightly related problems to information diffusion is identifying the set of users that should be initially targeted to maximize the spread of infor- mation throughout the network. This problem was formalized by Domingos et al.[21] who noted that ordinary data mining techniques that reason about each consumer behavior independently, lead to suboptimal marketing decisions re- sulting from not accounting for the influence effects among users in the network. They suggested that incorporating the users? network effect into the marketing decision leads to better decisions and, consecutively, higher profit. Thus, instead of deciding whether or not to market to a customer based solely on the expected profit that would be gained from her making a purchase, marketing companies should instead take into account the effect that this consumer would have within her social network. This gives rise to the notion of ?influential? users within the network; these are users who are capable of spreading the information through- out the network at a higher rate than other members. One of the standard approaches for identifying influentials in networks is using degree centrality, where high-degree nodes are considered the most influ- 114 ential as they can reach out to many other nodes in the network [2, 116]. However, most of the centrality-based approaches ignore the dynamics of the user interac- tions that occur as a result of the diffusion processes themselves, as well as the heterogeneity in user preferences. In order to account for these shortcomings, I propose a confidence-based approach for identifying influentials based on the differential adaptive diffusion model. The proposed confidence-based approach relies on using the confidence values wi(u; v) on the edges in the social network, at the start of the target cam- paign i, to construct a confidence-weighted influence score si(v) for each user as follows: si(v) = X u2N(v) wi(u; v) At the beginning of each campaign, the confidence weights are updated according to the utilized kernel function K, and the new scores for the users in the network are calculated. Then, the network is filtered to keep only the set of users with the highest preference to the current product category c. Finally, the remaining set of users are sorted based on their current scores, and the top k% are chosen as a seed influential set for the product. By using both the user-preference network for filtering, and the adaptive scores si for sorting, the set of influentials chosen at the beginning of each cam- paign is able to capture both the diversity in user preferences as well as the trust dynamics in the network. 115 5.6 Experimental Evaluation To test the proposed model, I used the first four months of interactions, i.e., diggs and submissions, on the Digg network as training data to learn the confi- dence values between different users, and used the last two months for evalua- tion. I use the action of ?digging? a story as a proxy for product adoption, and the topic distribution of users? submissions to estimate their preferences. Starting from a uniform assignment of confidence values across all peers, I track the prop- agation of user diggs and update the corresponding confidence values according to the proposed model. The learned values along with the user preferences can then be used to predict adoptions for new stories, and to identify the influentials for future campaigns. 5.6.1 Predicting Adoptions To evaluate the accuracy of the proposed diffusion model in predicting fu- ture adoptions, I compare my approach with two proposed approaches in [42] for learning the influence probabilities from training data. In the first approach (Bernoulli), they consider each recommendation a separate Bernoulli trial, and then estimate the confidence between two users as the maximum likelihood es- timate (MLE) of the ratio of successful recommendations over the total number within a given contagion time. In the second proposed approach (Bernoulli-PC), the authors use the same Bernoulli representation but in this approach they give partial credit for each product adoption based to the set of peers who recom- 116 mended the product within a given time frame. Although both approaches have comparable performance, Goval et al. show that introducing the notion of ?con- tagion time? as a factor in estimating the influence probability outperforms static methods and yields more accurate results. The above method utilizes a threshold adoption rule as opposed to the cascade rule that is utilized in the model (Adaptive). We can convert between these two models; as shown by Kempe et al. [55], the independent cascade model is equivalent to a threshold model where the adoption threshold is set to the posterior probability of adoption; i.e. for a given user v, if we set v = 1 Q u2N(v)(1 p(u; v)), the threshold model is equivalent to the independent cas- cade model. I use this conversion to facilitate in-depth evaluation of my model. I compare the different models by means of ROC curves, which are more appro- priate than precision-recall curves in this setting [85]. The ROC curve shows the relative trade-offs between the true positives (correctly identified adoptions) and the false positives (unrealized predicted adoptions) as the discrimination thresh- old is varied. Each point in the ROC curve corresponds to one possible value of activation threshold for the users. Figure 5.5 illustrates the performance of all three models using ROC curves where the x-axis is the false positive rate (FPR) and the y-axis is the true positive rate (TPR). The proposed model (Adaptive) outperforms both baselines (Bernoulli and Bernoulli-PC), yielding higher true positive rates at low values of false posi- tives. I also experimented with using a predictor that ignores the peer-influence altogether and relies only on the stories that were promoted to the ?top sto- 117 c01c02c01 c01c02c03 c01c02c04 c01c02c05 c01c02c06 c01c02c07 c01c02c08 c01c02c09 c01c02c0a c01 c02c0b c03c02c01 c01 c01c02c03 c01c02c04 c01c02c05 c01c02c06 c01c02c07 c01c02c08 c01c02c09 c01c02c0a c01c02c0b c03 c0cc0dc0ec0fc10c11c12c13 c14c13c15c16c17c18c19c19c11 c14c13c15c16c17c18c19c19c11c1ac1bc1c c01c02c03c04c05c06c07c08c04c09c0ac09c0bc05c06c0cc02c0ac05 c0dc0e c0fc05 c06c07 c08c04 c09c0ac09 c0bc05 c06c0c c02c0a c05 Figure 5.5: ROC performance of two comparison models (Bernoulli and Bernoulli-PC) and the proposed model (Adaptive) on the basis of the False Posi- tive Rate (FPR) and True Positive Rate (TPR) for each model. ries? section in Digg.com. This popularity-based predictor yielded an accuracy of 45:7%, which is lower than random prediction This indicates that individuals? connections and interactions with their content preferences are more important factors than the overall popularity. Similar results were also confirmed by [60] These results show that by modeling the dynamics of the diffusion process at a finer-grained level, taking into account the heterogeneity of users and the dynamics of the social network, it is possible to create a model which outperforms a more na??ve model. This in turn leads to a better understanding of the whole diffusion process. 118 5.6.2 Identifying Influentials Using the same experimental setup, I compute the confidence-weighted scores s for all the users in the dataset over the training period. Then, I use the computed scores along with the learned preferences for each user to identify the set of confidence-based influentials at the start of each campaign in the over the last two months in the dataset. I compare the proposed method to a random base- line and a degree-centrality approach for identifying influentials. The evaluation is based on the cascade size, measured by the average number of diggs / post, of the stories posted by the chosen set of influentials in each of the compared methods during the test period. 0 10 20 30 40 50 60 70 80 90 100 Bu sin es s En ter tai nm en t Ga mi ng Li fes tyl e Of fb ea t Po lit ics Sc ien ce Spor ts Te chno logy W or ld Ne ws A vg di gg s/ po st Topic Figure 5.6: Average number of diggs/post for the top 10% influential users in Digg.com By choosing the top 10% influential users according to the compared meth- 119 ods, 5.6 shows that the proposed confidence-based approach consistently outper- form both the random and the degree-centrality baselines across all topics. This confirms our findings that the reputation of the users and their past behavior plays an important role in their future influence and impact over the network. The suggested confidence-approach method also provides the brand managers with a mean for identifying sets of users in the network for Ad-targeting to max- imize the spread of different product types. 5.7 Conclusion In this work, I provided new insights into the effect of network-level dy- namics and individual heterogeneity on the diffusion process in real-world net- works. Utilizing a sample of users? interactions on the Digg.com social news web- site, I analyzed the effect of peers? confidence in each other?s recommendations on the adoption of different news posts over time. I presented an adaptive diffu- sion model that is able to capture the observed properties, and showed that it out- performs earlier non-adaptive models in predicting future adoptions. I also pro- posed a confidence-based approach for identifying influentials based on the pro- posed differential adaptive diffusion model. I showed that the proposed method for identifying influentials outperforms the classical, structure-based approaches across different topics / product categories. I believe one of the important future steps is studying the implications of the proposed adaptive diffusion model on existing viral marketing mechanisms. The 120 proposed model suggests that the current incentive structure of most of the ex- isting viral marketing techniques doesn?t account for the trust dynamics among users, and might lead to decreased efficacy of these strategies in the long run. An- alyzing the performance of the existing viral marketing strategies in the light of the proposed diffusion model will provide new insights about potential methods for designing more efficient incentive structures. 121 Chapter 6 Adaptive Viral Marketing Viral Marketing has proved to be one of the most successful marketing strategies that allowed companies to effectively reach a large segment of the po- tential customers that are resistant to more traditional marketing strategies. The basic idea behind viral marketing is relying on the concept of information dif- fusion over the existing social network among customers to advertise for differ- ent products. One of the main implications of the differential adaptive diffusion model proposed in the last chapter is a better understanding of the effects of ex- isting viral marketing strategies on the underlying social networks in the long term. The model suggests that user recommendations are most effective when recommended to the right subset of friends. If a user is very selective and makes each recommendation to only a few friends, then the chances of success are slim due to limited network exposure. On the other hand, recommending a product to everyone may have limited returns as well, due to the effect of irrelevant rec- ommendations on the confidence levels in the underlying social network. In this chapter, I focus on analyzing the implications of the differential adap- tive diffusion model, discussed in the previous chapter model, on existing viral marketing strategies. I illustrate the effect of classical viral marketing techniques on the trust dynamics among users in the social network, and then propose a 122 new viral marketing strategy for maintaining the trust levels among users over time. I show the utility of the proposed adaptive viral marketing method over the multiple campaigns of different product, compared to the existing strategies. 6.1 Introduction Viral marketing builds upon the ideas from network-based diffusion pro- cesses. The main goal of viral marketing is to exploit existing social networks among customers by encouraging those customers to share product information with their friends. This goal is based on the premise that consumers? purchasing decisions are heavily influenced by recommendations and referrals from their family, friends, and colleagues; an assumption that has been supported by some of the earliest studies of diffusion [89]. Recently, viral marketing has become more appealing to marketers as consumers have started to show an increasing resistance to traditional forms of advertising such as TV or newspaper ads. One of the major early success stories of viral marketing was the introduc- tion of ?Hotmail.? When this web-based email service started in 1996, each mes- sage sent by a user included a promotional message with the URL of the service. As a result, ?Hotmail? gained its first twelve million subscribers in just eighteen months, on an advertising budget of only $50,000 [51]. Similarly, cell phone com- panies are another industry where providers take advantage of social network- based diffusion by offering highly discounted rates for customers talking to other customers within the same network. Thus, if a customer?s social circle (family, 123 friends, colleagues) is using a certain provider, there?s an added incentive for her to use the same provider. In order to motivate users to spread product recommendations throughout the network, most viral marketing strategies includes some kind of an incen- tive or a ?reward? for sending the product recommendations through the users? personal connections. A major drawback that arises from this mechanism is the emergence of star-like patterns [61] where a set of users recommend the product to all their peers in an effort to increase their expected reward. As a result, due to the heterogeneity of the user-product preferences in the network, a percentage of the users in the network end up receiving recommendations for products that they might not be interested in, just to maximize the possible benefit for the rec- ommender. Moreover, as I discussed in chapter 5, this kind of behavior leads to a change in the trust dynamics among the users in the network, which in turn affect the information diffusion process. The majority of literature on viral marketing assumes that the way infor- mation spreads throughout the network is static, and conveyed solely through the existence of links [36] or some other structural properties of the network [33]. This is evident in the methods for choosing the influentials for initial targeting, which are often determined using some structural property (degree, betweenness centrality, etc.). This strategy for identifying influentials is based on the premise that a large number of connections in the social network directly correlates with a larger impact and more potential for spreading information over the network [2, 116]. 124 In this chapter, I investigate the implications of the differential adaptive dif- fusion model on the design of the viral marketing strategies for different product types. Based on these implications, I propose an adaptive design for a new vi- ral marketing strategy which is capable of sustaining the influence level between peers in the network. My hypothesis is that utilizing more selective methods which take into account sustaining the trust among users will result in better cu- mulative adoption of various products over time. In order to test my hypothesis, I compare the proposed adaptive viral marketing strategy with classic strategies that focus only on maximizing the individual product?s adoption rate. I show that the proposed adaptive viral marketing strategy is able to incorporate: (1) multiple different product campaigns, (2) the diversity in user preferences among different product categories, and (3) changing confidence in peers? recommenda- tions over time. These factors allows the model to sustain the trust values among users in the network, thus achieving better adoption rates of different products over time. 6.2 Background Recent work by Leksovec et al. [61] focused on tracking the actual diffusion of recommendations through email, in order to quantify the importance of var- ious factors introduced in the literature. They used a product recommendation dataset from an online retailer who employed a viral marketing strategy based on rewarding the referring customer who makes a successful recommendation for a 125 product with 10% credit, and the referred customer who accepts the referral with 10% discount on their purchase. Leksovec et al. utilized the recommendation data for different products to model its suitability for viral marketing in terms of both the properties of the network and the product itself. The first observation the authors made was that the nature of the product highly affected the recommendation pattern, which can be attributed to users having different preferences for different product types. For instance, users tend to buy more DVDs and are more likely to recommend them to their friends, while they seem to be more conservative with books. The authors proposed different potential reasons for this behavior, among which is the fact that books needs further time investment for the user to read it and actually recommend it to a friend, in opposition to the nature of a DVD which can be viewed in a shorter period of time. Other factors include assumptions about the consumer behavior, as people in general are more informed about certain products (like DVDs in that case) through other means of advertisements, which in turn gives the user more confidence in making the recommendation. The second observation the authors made was that the probability of a user making a recommendation at all, given that she has already adopted the product, declines after an initial increase as one gets deeper into the cascade. However, if this deeply nested individual chooses to make recommendations, she tends to recommend the product to a larger number of peers on average. The authors also provided a thorough analysis on how the effectiveness of recommendations changes as one received more recommendations from the 126 same person. The experiments showed that recommendations start to lose effect after more than two or three are passed between two individuals. As the number of exchanged recommendations increases, the probability of buying starts to de- crease to about half of the original value and then levels off. From an aggregate perspective, they carried out a set of experiments to measure how the average number of purchases changes with the number of outgoing recommendations, and showed that the result varies with different product types. Similar findings were also discovered in [98] by analyzing another dataset from Digg.com social news website. However, in the work by Sharara et al., the authors provided a formal adaptive model for information diffusion that takes into consideration both the change in trust dynamics between users over time, and the diversity in user-product preferences. 6.3 Conceptual Model Given a reward value of r units, the classical incentive structure is to grant the recommender a full reward (r units) for each successful recommendation that results in a purchase or an adoption. However, this incentive structure encour- ages users to spread the product recommendation for all their peers, without accounting for their preferences. This behavior can lead to decreased values of confidence in peer-recommendation throughout the network, which negatively affects the diffusion process as discussed in 5. In order to avoid the side effects of this incentive structure, we need to de- 127 sign a new strategy for viral marketing that aligns the immediate utility of the users with the end goal of sustaining the confidence in peer-recommendation over time. The proposed ?adaptive reward? mechanism achieves this tradeoff by restructuring the incentives to account for both successful and unsuccessful recommendations. Specifically, when a user makes a successful recommendation to one of her peers, she gets rewarded ( r) units, whereas if the recommen- dation is unsuccessful, the user gets penalized ((1 ) r). The parameter acts as a ?conservation parameter?, varying from 0 to 1, with 0 representing fully conservative behavior and 1 representing fully non-conservative behavior. According to the classic viral marketing mechanism, where users only re- ceive rewards for successful recommendations that result in product adoptions and no penalties for the unsuccessful ones, there is no reason for a user to be selective in the choice of whom to recommend the product to. This behavior en- courages the users to send the recommendations to all their peers, as the expected reward can only increase by expanding the domain of users receiving the recom- mendation. This corresponds to setting the conservation parameter to one in the proposed adaptive rewards mechanism. However, by varying the value of , the penalty for unsuccessful recom- mendations starts to affect the net reward that the users acquire, as for values of ( < 1), it is no longer the case that additional recommendation can only increase the net reward. To illustrate the effect of , consider the following example: Sup- pose a user v adopts a product of a given category, and decides to recommend the product to her neighbors N(v). Assume that only M(v) N(v) of her peers 128 have high preference for this product category, and thus adopt the product as a result of v?s recommendation. Therefore, the net reward that v acquires can be expressed as: Rnet(v) = max(0; r ( jM(v)j (1 )jN(v)=M(v)j)) Therefore, if a user chooses to follow a nonconservative strategy, the ex- pected reward decreases by a penalty relative to the number of unsuccessful rec- ommendations she makes. Tuning the conservation parameter varies the trade- off between the reward and the penalty that the user incurs, and thus allows us to test different mechanisms and analyze their effect on both the rate of product adoption as well as the overall confidence levels among users. Despite the fact that the main benefits of the proposed adaptive strategy appears on the network level through reducing the spamming behavior within the social network, it also carries an advantage for users by maximizing their re- wards over time. While the users have different preferences for different product categories, their judgment in the confidence of their peers is evaluated on an ag- gregate level. So, if a user chooses to engage in spamming behavior, this will lead to increased resistance by her peers to any future recommendation they receive from her, regardless of their preference for the product category, thus decreasing her future rewards significantly. As a result, by using the proposed method, users must face the penalty of spamming behavior explicitly, and as a result they will be more likely to follow a strategy which will maintain their peers? confidences 129 in them on the long run, and therefore increase their long term reward. 6.4 Experiments To evaluate the proposed viral marketing strategy, I use an agent-based model to simulate the behavior of customers in real settings. First, I create a synthetic social network using the preferential attachment [7] model. Then, for a given number of product categories, the user-preference network is generated by assigning a set of preference values for each agent in the network. Follow- ing the differential adaptive diffusion model[98], each link in the social network is assigned a weight representing the confidence that the target agent has in the source agent?s recommendations. Given both the preference values and the peer- confidence, the influence probability of agent u on agent v for a product of cate- gory c at campaign i can be fully specified according to the differential adaptive diffusion model as: p(u; v) = wi(u; v) F(v; c) where F(v; c) represent the preference agent v has for product category c. For the purpose of the experiments, the confidence values across all agents are initialized to unity, and are updated using the linear kernel function defined in [98]. The objective of each agent in the model is set to maximize its cumulative reward according to the incentive structure in effect. As the agents? beliefs about the preferences of its peers play an important role in the recommendation process, I conduct two sets of experiments. In the first set, referred to as ?fully observable? 130 mode, the agents are allowed to directly observe the product preferences of their peers, and base their recommendation decisions accordingly. The second set of experiments, ?learning preferences?, is a more realistic setting which allows the agents to learn the preferences of their peers according to the output of prior recommendations. For each set of experiments, I simulate the diffusion of 500 products campaigns using 5 different categories, with an initial target set of 10% of the users. c01 c02c01c01 c03c01c01 c04c01c01 c05c01c01 c06c01c01 c01c07 c02c01c07 c03c01c07 c04c01c07 c05c01c07 c06c01c07 c08c01c07 c09c01c07 c0ac01c07 c0bc01c07 c02c01c01c07 c0cc0dc02c0e c0cc0dc01c0fc09c06c0e c0cc0dc01c0fc06 c0e c0cc0dc01c0fc03c06c0e c0cc0dc01c0e c01c02c03c04c02c05c06c07 c08c09 c0ac04 c0bc05c0a c07c0c c04c0d c0ec0f c0dc07 c0bc02 c06c0d (a) Adoption Rate c01 c02c01c01 c03c01c01 c04c01c01 c05c01c01 c06c01c01 c01 c01c07c02 c01c07c03 c01c07c04 c01c07c05 c01c07c06 c01c07c08 c01c07c09 c01c07c0a c01c07c0b c02 c0cc0dc02c0e c0cc0dc01c07c09c06c0e c0cc0dc01c07c06c0e c0cc0dc01c07c03c06c0e c0cc0dc01c0e c01c02c03c04c02c05c06c07 c08c09 c0ac0b c02c06 c0ac0cc0d c0ec07 c0fc05c10 c0ac07 c0dc0a c0cc11c0a c09c0a c11 (b) Confidence Level Figure 6.1: Fully observable mode: Varying the conservation parameter 6.4.1 Fully Observable Mode Figure 6.1 shows that by decreasing the value of , encouraging the users to be more conservative in their decisions, the rate of decline in the average con- fidence level between peers decreases. However, as a side effect of being more conservative, the spread of the product information over the network decreases as well, leading to lower adoption rates. For higher values of the conservation pa- rameter , the reward for successful recommendations is higher than the penalty, which encourages the users to send out more recommendations for their peers. 131 Despite the fact that this behavior leads to an initial increase in the product adop- tion rate, we notice that the adoption rate declines substantially in later cam- paigns due to the rapid decrease in confidence levels between peers. However, we notice that utilizing intermediate values for (e.g. = 0:5, corresponding to equal chances of reward and penalty) consistently maintains high adoption rates and high overall confidence even over a large number of marketing campaigns. The robustness of this result was tested by varying the number of product categories and the size of the initial seeding set. The same conclusion holds across all of these changes in the parameters of the systems. 6.4.2 Learning Preferences Mode In real settings, users do not necessarily know the preferences of their peers in advance, but rather learn them through the peers? responses to different rec- ommendations. To account for this more realistic situation, I give agents the abil- ity to learn the preferences of their peers instead of directly observing them. At each time step, if the agent decides to recommend a product to one of its peers, it stores the output of this recommendation (whether or not it resulted an adop- tion). Then, when deciding to make a new recommendation for a similar product category in the future, the agent uses the stored outcomes to estimate that peer?s preference toward different product categories. The basic hypothesis for this experimental mode is the increase of adoption rate over the fully observable mode, due to the fact that the agents inference of 132 20% 40% 60% 80% 100% do pt io n pe rc en ta ge ? = 0 ? = 0.25 ? = 0.5 ? = 0.75 ? = 1 0% 0 100 200 300 400 500 A do pt io n pe rc en ta ge Campaign (a) Adoption Rate (b) Confidence Level Figure 6.2: Learning preferences mode: Varying the conservation parameter their peers? preferences also takes into account the confidence levels, since the peers? response to recommendations account for both factors. This additional in- formation is not contained in the direct observation of peers? preferences and since it is the composite of confidence and preference that determines actual adoption, the agents should be able to better predict their peers? adoptions. As shown in Figure 6.2, for moderate values of , the performance of the proposed strategy is remarkably better than low and high levels of , in terms of both prod- uct adoption and maintaining confidence levels in the network, which indicates that encouraging agents to target a small subset of their peers is the optimal strat- egy. This also shows that the adaptive rewards mechanism may work even better in contexts when individuals do not have perfect knowledge of their peers? pref- erence but must infer them from observing past behavior. 6.4.3 Effect of Spammers In order to test the robustness of the adaptive viral marketing model, I car- ried out another experiment where a set of spammers are manually inserted into 133 20% 40% 60% 80% 100% do pt io n Pe rc en ta ge 10% Spammers 25% Spammers 50% Spammers 0% 0 100 200 300 400 500 A do pt io n Pe rc en ta ge Campaign (a) Adoption Rate (b) Confidence Level Figure 6.3: Varying the percentage of spammers at ( = 0:5) the network. A spammer is an agent that forwards recommendations for any product it adopts to all its peers, regardless of their preferences. I set ( = 0:5) for the rest of the users, and examined the effect of various percentages of seeded spammers. As illustrated in Figure 6.3, the agents in the network are able to identify the spamming agents after a relatively small number of campaigns, dropping their confidence in them. The effect of spamming behavior is obvious in this figure through the decreased adoption rate as the percentage of spammers present in the network is increased. However, the collective behavior of the non-spammer agents maintains the confidence level among trusted peers, while removing any confidence in spammers, which minimizes the effect of the spamming behavior on the adoption rates over time. 134 6.5 Conclusion By analyzing the implications of the differential adaptive diffusion model on existing viral marketing strategies, I illustrated that most existing strategies fo- cus on maximizing the product spread within each campaign, but fail to account for the long-term effects that spamming behavior can have on the underlying social network across campaigns. I introduced a new viral marketing strategy based on an adaptive incentive structure that accounts for the social network dynamics across different product campaigns. The experiments show that the proposed adaptive viral marketing strategy is able to account for the changes in peers? confidence across multiple campaigns, maintaining higher levels of prod- uct adoptions than those attained by classic strategies in the long term. I also showed that the proposed adaptive strategy is more robust to the existence of spammers in the network. 135 Chapter 7 Active Surveying: A Probabilistic Approach for Identifying Key Opinion Leaders Opinion leaders play an important role in influencing people?s beliefs, ac- tions and behaviors. Although a number of methods have been proposed for identifying influentials using secondary sources of information, the use of pri- mary sources, such as surveys, is still favored in many domains. In this chapter, I present a new surveying method which combines secondary data from differ- ent observable network modes, with partial knowledge from primary sources to guide the information gathering process. I apply the proposed active surveying method to the problem of identifying key opinion leaders in the medical field, and show how active surveying is able to accurately identify the opinion leaders while minimizing the amount of primary data required, which results in signifi- cant cost reduction in data acquisition without sacrificing its integrity. 7.1 Introduction Studying influence in social networks is an important topic that has at- tracted the attention of a variety of researchers in different domains [87, 55]. Peo- ple often seek the opinion and advice of their peers regarding various decisions, whether it is to try a new restaurant, buy a certain product or even to support a 136 particular politician [53]. This behavior gives rise to a certain set of individuals in the social network, referred to as influentials or opinion leaders, who have a huge impact on other people?s opinions, actions and behavior. In the commercial space, the question of how to identify opinion leaders within a given population of purchasers or decision makers is of great impor- tance [77, 59]. Identifying these individuals properly leads to more effective and efficient sales and marketing initiatives [114]. This is true in multiple industries; here I begin my exploration in the medical domain, studying the influence net- works of local physicians relative to the treatment of specific disease states. Key opinion leader identification has been the focus of multiple studies in the health care literature [103, 23]. Secondary data describing suggested influence is often easy to obtain; whereas primary data, representing surveys that measure trust and advice-seeking, is harder and much more expensive to acquire. For instance, citations are often used as an indirect indicator of influence in an academic settings, where influen- tial authors? publications tend to receive higher citations than average. Obtain- ing a citation network between a set of authors in a certain field (e.g. infectious disease) can be easily constructed by looking at the publication record of each author. However, measuring the influence of each author directly requires more work, and often involves a labor-intensive process of interviewing subjects and extracting their ?network of influence?, e.g., who they turn to for advice and rec- ommendations. Methods for identifying opinion leaders can be classified into two categories 137 according to the type of data they use for drawing their conclusions. Primary methods rely on manually collecting information about peer-influence in a given population from the individuals themselves. One of the most commonly used primary methods is surveys, where the respondents are asked to report their opinion about who they perceive as opinion leaders. Although primary meth- ods are considered to be the most informative about actual peer-influence, their main drawback is the high associated costs due to the time-intensive nature of the process: in many cases surveys are obtained through one-on-one interviews with the respondents, sometimes over the phone, but often in person. On the other hand, secondary methods rely mainly on using an underlying interaction network as a ?proxy? for influence, thus avoiding the manual aspect of primary methods. One of the most widely used techniques in this setting is re- lying on network centrality measures of these secondary networks (e.g., citation, co-authorship, etc.) to identify the opinion leaders. However, the major draw- back of these methods is the fact that the correlation between peer-influence in the actual social network and the interactions occurring in the proxy networks cannot be verified. In a recent study on public opinion formation [117], the au- thors showed through a series of experiments that the customers who are critical in accelerating the speed of diffusion need not be the most connected in their corresponding social network. In this work, I show how to combine the use of primary and secondary methods for leadership identification in the medical domain. I use primary data describing a physician nomination network in which physicians are surveyed to 138 nominate other physicians whom they turn to for professional advice. I augment this network with secondary data describing publication history (citation and co- authorship), as well as hospital affiliation information. I use ideas from the active learning literature to build a model that can use partial knowledge of primary data, together with secondary data, to guide the survey process. By targeting the most informative physicians for additional primary data collection, I minimize the amount of primary data needed for accurate leadership identification. As this type of primary data collection requires significant investment, this technique empowers organizations to tackle the task of accurate leadership identification in a much more cost effective and efficient manner. The rest of the chapter is organized as follows. Section 7.2 provides a brief overview of the related work and background for both opinion leader identifi- cation and active learning. In Section 7.3, I give a detailed description of the problem and an outline of the proposed method. Section 7.3 describes the details of the active surveying algorithm. Section 7.5 discusses the experimental settings, the dataset and the results of using the proposed method compared to different baselines. Finally, Section 7.6 concludes my work and proposes future directions. 7.2 Background 7.2.1 Opinion Leader Identification In the diffusion of innovation literature, there are two main methods for identifying opinion leaders from primary sources: self-designation and surveys 139 [88]. In the self-designation method, respondents are asked to report to what ex- tent they perceive themselves to be influential. However, as can be expected, such methods are usually biased and often reflect self-confidence rather than actual influence. On the other hand, surveys are based on having selected individuals, referred to as respondents, report who they perceive as opinion leaders in a given domain [22]. Peer-identified opinion leaders are believed to be better sources of true influence compared to self-identified ones. Due to the high costs associated with primary methods for leadership iden- tification, there has also been a great deal of attention to methods that make use of secondary data sources. These methods rely mainly on using different structural measures for determining the importance of nodes in a proxy interac- tion network. In the sociology literature, various centrality measures [115] have been used to determine the most important individuals in a given social network. Among the most commonly used measures are degree centrality, indicating the most connected individuals in the network, and betweenness centrality, distin- guishing the ?brokers? in the network. 7.2.2 Active Learning In this work, I build on ideas from the field of active learning, where the learner is able to acquire labels of additional examples to construct an accurate classifier or ranker while minimizing the number of labeled examples acquired. This is achieved by providing an intelligent, adaptive querying technique for ob- 140 taining new labels to attain a certain level of accuracy with minimal training in- stances. A generic algorithm for active learning is described in [90], where a learner is applied to an initial sample L of labeled examples, then each exam- ple in the remaining unlabeled pool is assigned an ?effectiveness score,? based on which the subsequent set of examples to be labeled is chosen until some pre- defined condition is met. The main difference between various active learning methods is how the effectiveness score of each example is computed; the score usually corresponds to the expected utility that the newly acquired example can add to the learning process. One widely used method for active learning is uncertainty sampling [66], where the learner chooses the most uncertain data point to query, given the cur- rent model and parameters. Measuring the uncertainty depends on the under- lying model used, but it usually translates to how close the data point is to the decision boundary. For instance, if a probabilistic classifier is used, the poste- rior probability can be used directly to guide the selection process. By acquiring the labels for the data points closer to the decision boundary, the model can be improved by better defining the existing margin. A variety of active learning methods have been proposed [94], with various ways to reduce the generaliza- tion error of the underlying model during learning. Active learning has proved to be useful in settings where acquiring labeled data is expensive. It has been ap- plied successfully in numerous domains, such as image processing [112], speech recognition [113], and information extraction [111]. 141 7.3 Problem Description The problem can be formulated as determining the minimal set of respon- dents needed to correctly identify at least k% of the set of opinion leaders present in a given population. In order to achieve this goal, we need a method that can guide the surveying process for selecting the next respondent, such that the ex- pected set of identified opinion leaders is maximized at each step. I apply a sim- ple threshold model on the survey responses to identify opinion leaders; if a can- didate receives more than nominations, she is considered an opinion leader. A key difference between this problem setting and the traditional active learning setting is that the acquisition of a survey response is more complex than that of a single label. A survey response is a structured object that includes a set of nominations fnominate(v; u) : u 2 populationg made by a given respondent v; all of which should be accounted for in both the learning and inference phases. In some cases there may be weights associated with each nomination; although here I am assuming uniform weights, it is straightforward to extend the model to cases where weights vary. I propose an active surveying approach that combines partial knowledge from primary sources along with secondary information to provide a dynamic framework for intelligently gathering additional primary data for opinion leader identification. In my approach, the next survey respondent is chosen to maximize the likelihood of identifying new opinion leaders. After the proposed respondent is surveyed, the survey results are incorporated back into the model to update 142 future predictions. First, we need to define the conditions upon which the next respondent should be selected in order to maximize the set of identified opinion leaders. Suppose we are given an initial set of survey responses, and a threshold that de- termines the minimum number of nominations an individual should obtain to be declared an opinion leader. Let the set of nominations received by a given nom- inee u be denoted as nominations(u) = fv : nominate(v; u) ^ v 2 respondentsg. From the initial set of responses, we can generate the following two sets of indi- viduals: leaders = fl : jnominations(l)j g candidates = fc : 0 >< >>: v WUS with probability p = Havg arg maxv S(v) with probability p = (1 Havg) Thus, the probability of choosing a respondent based on uncertainty sampling to enhance the classifier accuracy increases with higher uncertainty in the classifier output, while being more confident in the predictions yields a higher probability 149 of choosing a respondent that optimizes the objective function S(v). The full details are presented in Algorithm 7.1. Algorithm 7.1 Active Survey Algorithm repeat Train classifier C using observed nominations for each un-surveyed respondent v do Compute the objective function S(v) Compute the weight W (v) using uncertainty sampling end for Normalize uncertainty sampling weights W (v) Set vS arg maxv S(v) Set vWUS W (v) With probability p = Havg, set v vWUS , otherwise set v vS Survey respondent v , update leaders and candidates sets according to the re- sulting nominations Remove v from the un-surveyed respondents and add her survey results to the set of observed nominations. until required number of opinion leaders is obtained 7.5 Experimental Evaluation To test the proposed method, I use a health care dataset generously pro- vided by Community Analytics, a social marketing research organization which specializes in analyzing influence networks and identifying opinion leaders through conducting surveys of their clients? target audiences. The data represents survey information for nominating influential local physicians, provided by their peers. The dataset consists of 2004 physicians, with 899 actual survey respondents generating 1598 nominations. As the surveys are based on identifying locally in- fluential physicians, I limit the potential edges for each respondent to the physi- cians whose locations are within a 150 mile radius, yielding a set of 127,420 po- 150 tential edges. By setting the nomination threshold ( = 2), 260 opinion leaders could be identified. By using the physicians? lists of publications from PubMed1, I constructed both a citation and a co-authorship network among the physicians in the pri- mary network. I also used the physicians? affiliation information to construct a co-affiliation network as a third supplementary source to leverage the data. Fi- nally, using these three secondary networks, I generated a set of 20 edge features on the primary physician network and enriched the node features with addi- tional attributes from these networks. A sample of the features included in the augmented network as the input to the model are illustrated in Table 7.1. Feature Name Source Network -Geographical Distance Gnomination -Respondent?s current position Gnomination (academic/non-academic) -Nominee?s current position Gnomination -Number of co-authored publications Gco-authorship -Nominee?s publications count Gco-authorship -Number of respondent?s citations of Gcitation the nominee?s publications -Nominee?s h-index Gcitation -Number of common affiliations Gco-a liation Table 7.1: Sample features in the annotated physician nomination network To conduct the experiments, I use a logistic regression classifier and vary the target percentage k of opinion leaders to be identified, showing the correspond- ing percentage of respondents required to reach this target using the proposed active survey method. I compare active surveying with a random baseline and 1http://www.ncbi.nlm.nih.gov/pubmed 151 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% P er ce nt ag e of r es po nd en ts k Page Rank Betweenness Centrality Degree Centrality Random PI Passive Survey Active Survey Figure 7.3: The percentage of respondents (y-axis) needed to identify k% of the opinion leaders (x-axis) at ( = 2) a set of other baselines based on various centrality measures for determining the most informative physicians. I use three widely used centrality measures for the structural baselines: degree centrality, betweenness centrality, and page rank. In order to understand the relative contribution of the classifier versus active learn- ing, I compare the proposed approach to a ?passive? surveying method, which follows the same procedure of active surveying for optimizing the score function S(v) based solely on the classifier?s output, but does not incorporate uncertainty sampling. Finally, I show the performance of a method referred to as perfect information (PI). PI uses the fully observed network and, at each step, greed- ily selects the survey respondent which identifies the maximum number of new opinion leaders. Note that the PI method represents a pseudo-optimal solution at each point, and hence the lower bound for the number of required respondents 152 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % R ed uc ti on in r es po nd en ts k Passive Survey Active Survey PI Figure 7.4: The percentage reduction in required respondents to identify k% of the opinion leaders at ( = 2) at each step. As can be seen in Figure 7.3, while the performance of the centrality-based methods is indistinguishable from the random baseline, both the passive and the active surveying methods perform significantly better than the baselines. Fur- thermore, the proposed active surveying method outperforms passive survey- ing, showing that an intelligent acquisition strategy helps to improve the quality of the learned classifier. Figure 7.4 shows the actual percentage of reduction in the size of the respondent set of both the active survey method and the perfect information method, with respect to the minimum set obtained by the best per- forming baselines at the corresponding value of k. As can be noted from the figure, the proposed approach yields a 30% average reduction in the number of respondents required, as compared to a 19% average reduction by the passive 153 approach. The reduction attained by the active surveying method is reflected di- rectly in surveying costs, thus helping survey conductors achieve their required goal at minimum cost. For instance, if a survey costs $500 per person, then in order to identify 50% of the opinion leaders in the used physician network, the active survey method needs only 270 surveys rather than 375 surveys required by the best performing baseline; this leads to a savings of $52,500. 7.6 Conclusion In this chapter, I presented a novel, dynamic framework for prioritizing the acquisition of survey information, for the purpose of leadership identification. The approach enables intelligent integration of both primary and secondary data to identify which respondents to survey, based on both the likelihood of them ex- panding the set of identified opinion leaders and also the utility of the informa- tion for improving future predictions. I then validated the results on a real-world dataset describing a physician nomination network. Although the algorithm is focused on opinion leadership identification, I believe the idea of exploration vs. exploitation behind active surveying can gen- erally be applied in different settings for guiding the survey process to reduce the associated costs. 154 Chapter 8 Conclusion and Future Directions In my dissertation, I have taken some initial steps towards reasoning in dy- namic, multi-modal, multi-relational networks. I have shown the importance of considering the complex structure of these networks over time as opposed to lim- iting the analysis to a single entity or relationship type in different network anal- ysis tasks. In this chapter, I first summarize the contributions of this dissertation, then I discuss the potential directions for future work and conclude. 8.1 Summary of Contributions My dissertation focuses on analyzing and modeling different aspects associ- ated with the dynamics of multi-modal, multi-relational networks, such as mod- eling network evolution, finding cohesive clusters in multi-relational domains, analyzing the dynamics of interactions among different entity types at both micro and macro-levels, and investigating the effect of these interactions on information diffusion and opinion leader identification. 8.1.1 Network Evolution I started my research by analyzing how multi-modal networks evolve over time. To investigate this problem, I characterized the growth patterns of different 155 network modalities by analyzing the existing dependencies between the evolu- tion of users and social groups in social and affiliation networks. Based on my findings, and other social network properties that were previously described in the literature, I proposed the first generative model that captures the statistical properties of multi-modal networks [122]. The proposed co-evolution model was able to mimic the evolution of real multi-modal networks, bringing new insights about the role of friendship in joining social groups. I believe that these insights not only affect the design of network evolution models but may have broader implications on mechanism design for tasks such as group recommendation, in- formation diffusion and viral marketing strategies. 8.1.2 Multi-relational Clustering Relational clustering algorithms try to bridge the gap between traditional, similarity-based data clustering and community detection algorithms for net- work data, by accounting for both feature and structure similarities. In multi- modal networks, there are more complex dependencies among different entity types that should be also taken into consideration to provide more meaningful clusterings. To address this aspect, I proposed a multi-relational affinity propaga- tion clustering algorithm [96] to facilitates an exploration of a middle ground be- tween feature-based similarity clustering, community detection, and block mod- eling in multi-relational networks. The proposed clustering approach extends the affinity propagation clustering algorithm to multi-relational domains by encod- 156 ing the dependencies across different node types in the form of soft structural constraints, with parametrized control over how they influence the final clus- tering. I showed empirically that the proposed algorithm outperforms previous approaches for multi-relational clustering. 8.1.3 Bi-modal Interaction Dynamics In order to characterize the temporal interactions occurring in multi-modal networks, I started by investigating the user interactions with event-based groups in bi-modal networks from multiple domains: a political network of senator-bill sponsorship history, a publication network, and a dolphin observation network. By analyzing the participation patterns of users in groups over time, I was able to identify a set of factors that characterize the stability of these interactions: 1) frequency: stable relationships tend to have a larger number of associated interac- tions, 2) recency: up-to-date interactions indicate the liveliness of the correspond- ing relationship, and 3) consistency: stable relationships are usually associated with a consistent pattern of interactions among the corresponding entities over time. Taking these factors into consideration, I proposed a measure for quantify- ing users? loyalty to different groups as the degree of commitment a user shows to the group over time [99]. I showed that the proposed measure provides new insights about the temporal aspects of user-group interactions in dynamic affilia- tion networks that are not captured by existing centrality measures. 157 8.1.4 Information Diffusion The next aspect I investigated was how information diffuses in multi-modal networks. To answer this question, I focused my analysis on the factors involved in product adoption in multi-modal networks, as opposed to relying only on the inferred influence across friendship links. I started by analyzing the shortcom- ings of existing information diffusion, such as 1) considering the diffusion of a single topic or type of information, neglecting the wealth of information in the existing user-topic preferences, and 2) the inherent assumption that the under- lying social network among users is static, and thus failing to model the evolu- tion of the aspects of individual relationships, such as trust, during the course of subsequent diffusion processes. To address these issues, I proposed an adap- tive diffusion model that exploits both the dynamic social relationships among users, as well as their preferences for different topics or product types. Specif- ically, the proposed model was able to capture: 1) information across multiple diffusion processes (e.g. different product marketing campaigns), 2) the diversity in user preferences for different topics / product categories, and 3) the variation of confidence in peers recommendations over time [98]. The empirical evalua- tion of my proposed differential adaptive diffusion model on real-world network data showed that it outperforms earlier non-adaptive models in predicting fu- ture product adoptions. I also showed how the model can be utilized to identify influential users in the network for initial targeting. 158 8.1.5 Adaptive Viral Marketing Next, I moved to discussing the implications of the proposed diffusion model on viral marketing strategies. I showed that existing techniques do not address the trust dynamics among users in the underlying social network, and thus might adversely affect the diffusion process over time. To address that, I proposed a novel adaptive viral marketing strategy that aligns the immediate utility of the users with the end goal of maintaining higher adoption levels across multiple campaigns. I showed that this objective can be achieved by redesigning the incen- tive structure of the viral marketing strategy. I validated the proposed adaptive viral marketing technique through a number of simulations over an agent-based model, and showed that the proposed strategy outperformed the classical ones in providing high adoption rates over time. 8.1.6 Active Surveying In cases where historical interactions are unavailable, there?s a need for al- ternate methods for estimating influence and identifying opinion leaders. To ad- dress this problem, I investigated how to leverage different network modalities in the process of identifying opinion leaders through primary sources, such as surveys. I proposed a new active surveying method for combining secondary data from different observable network modalities with partial knowledge from pri- mary sources, to guide the information gathering process. I applied the proposed active surveying method to the problem of identifying key opinion leaders in the 159 medical domains [97], and showed that it could accurately identify the opinion leaders while minimizing the amount of primary data required. This results in significant cost reduction in data acquisition without sacrificing its integrity. 8.2 Future Directions There are a number of promising future directions for the research on dy- namic, multi-modal networks. Some of these potential avenues include: Combining structured information from networks with related unstructured data that is often under utilized in network models, such as text corpus, videos, music, etc. Incorporating these types of data into relational learn- ing models offers a wealth of information that can be used in developing a holistic view of the associated network. Utilizing multi-modal network models in different application-oriented tasks from different domains, such as behavioral models, financial markets, rec- ommender systems, and economic models. Scaling up network models to be able to handle large-scale problems. One potential approach for tackling the problem at the semantic level, is through analyzing the networks at successive levels of abstractions, as well as ex- ploring semi-supervised techniques for network analysis. 160 8.3 Conclusion With this thesis, I hope to motivate further research in developing new, scalable machine learning models that are capable of leveraging the temporal, multi-dimensional dependencies in complex networks. My work suggests that by incorporating additional network modalities, we can build rich models that provide a better understanding of the dynamics in complex networks, and thus help in providing highly accurate predictions about future events. I believe com- plex networks research constitutes an important pillar towards understanding and modeling how the local behavior of entities can affect the network on the global scale. By providing a holistic view of the corresponding network dynam- ics, this evolving line of research has the potential for considerably changing the systems and applications design for networks from different domains, which will in turn increase the value of the provided services for the users in these networks. It will also provide new means for researchers to uncover the behavioral patterns of the users, which enables accurate modeling of users activities and understand- ing the causal processes governing different social processes. 161 Bibliography [1] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9:1981?2014, 2008. [2] R. Albert, H. Jeong, and A. Barabasi. Error and attack tolerance of complex networks. Nature, 406:378 ? 382, 2000. [3] S. Asur, S. Parthasarathy, and D. Ucar. An event-based framework for char- acterizing the evolutionary behavior of interaction graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2007. [4] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: Membership, growth, and evolution. In Proceed- ings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2006. [5] A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In Proceedings of the SIAM International Conference on Data Mining (SDM), 2007. [6] A. L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509?512, 1999. [7] A. L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 285(5439):509 ? 512, 1999. [8] F. Bass. A new product growth for model consumer durables. Management Science, 15(5):215 ? 227, 1969. [9] H. S. Becker. Notes on the concept of commitment. Journal of Sociology, 66:32?42, 1960. [10] R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005. [11] T. Berger-Wolf and J. Saia. A framework for analysis of dynamic social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2006. [12] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. [13] D. Cai, Z. Shao, X. He, X. Yan, and J. Han. Community mining from multi- relational networks. In Proceedings of the 9th European Conference on Princi- ples and Practice of Knowledge Discovery in Databases (PKDD), October 2005. 162 [14] D. Centola and M. Macy. Complex contagion and the weakness of long ties. American Journal of Sociology, 113:702?734, 2005. [15] D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Computing Surveys (CSUR), 38(1), 2006. [16] N. A. Christakis and J. H. Fowler. The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357:370 ? 379, 2007. [17] R. M. Cunningham. Brand loyalty - what, where, how much? Harvard Business Review, 34:116?128, 1956. [18] D. L. Davies and D. W. Bouldin. A clustering separation measure. IEEE PAMI Trans., 1(2):224?227, 1979. [19] P. S. Dodds and D. J. Watts. A generalized model of social and biological contagion. Journal of Theoretical Biology, 232:587 ? 604, 2005. [20] P. Domingos. Mining social networks for viral marketing. IEEE Intelligent Systems, 20(1):80 ? 82, 2005. [21] P. Domingos and M. Richardson. Mining the network value of customers. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2001. [22] S. Dorfman and J. Maynor. Under the influence. Pharmaceutical Executive, 26:148 ? 150, 2006. [23] G. Doumit, M. Gattellari, J. Grimshaw, and M. A. O?Brien. Local opin- ion leaders: effects on professional practice and health care outcomes. Cochrane Database of Systematic Revs., 2007. [24] M. Drehmann, J. Oechssler, and A. Roider. Herding and contrarian be- havior in financial markets : An internet experiment. American Economic Review, 95(5):1403 ? 1426, 2005. [25] D. Druckman. Nationalism, patriotism, and group loyalty: A social psy- chological perspective. Mershon International Studies Review, 38(1):43?68, 1994. [26] J. C. Dunn. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3):32?57, 1973. [27] D. C. Dunphy. The social structure of urban adolescent peer groups. So- ciometry, 26:230?246, 1963. [28] M.G. Everett and S.P. Borgatti. Analyzing clique overlap connections. Con- nections, 21(1):49?61, 1998. 163 [29] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In Proceedings of the conference on Applications, tech- nologies, architectures, and protocols for computer communications (SIGCOMM), September 1999. [30] G. Flake, S. Lawrence, C. Giles, and F. Coetzee. Self-organization and iden- tification of web communities. Computer, 35(3):66?71, 2002. [31] B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315:972?976, 2007. [32] L. Friedland and D. Jensen. Finding tribes: identifying close-knit individ- uals from employment patterns. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Au- gust 2007. [33] A. Galstyan and P. Cohen. Influence propagation in modular networks. In Proceedings of the AAAI Symposium on Social Information Processing, 2008. [34] L. Getoor and C. P. Diehl. Link mining: a survey. ACM SIGKDD Explo- rations Newsletter, 7(2):3 ? 12, December 2005. [35] L. Getoor and B. Taskar. Introduction to Statistical Relational Learning. The MIT Press, 2007. [36] R. Ghosh and K. Lerman. Leaders and negotiators: An influence-based metric for rank. In Proceedings of the third International AAAI Conference on Weblogs and Social Media (ICWSM), 2009. [37] M. Girvan and M. E. J. Newman. Community structure in social and bio- logical networks. PNAS, 99(12):7821?7826, 2002. [38] M. Girvan and M. E. J. Newman. Community structure in social and bi- ological networks. Proceedings of the national academy of science (PNAS), 99(12):7821?7826, 2002. [39] I. E. Givoni and B. J. Frey. A binary variable model for affinity propagation. Neural Computation, 21(6):1589?1600, 2009. [40] J. Golbeck, editor. Computing with Social Trust. Springer, 2009. [41] J. Goldenberg, B. Libai, and E. Muller. Talk of the network: A complex sys- tems look at the underlying process of word-of-mouth. Marketing Letters, 3(12):211 ? 223, 2001. [42] A. Goval, F. Bonchi, and L. V. S. Lakshmanan. Learning influence probabili- ties in social networks. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM), February 2010. 164 [43] Govtrack. Senate bill sponsorship data. website: http://www.govtrack.us, 2008. [44] M. Granovetter. Threshold models of collective behavior. American Journal of Sociology, 83(6):1420 ? 1443, 1978. [45] Habiba, T. Y. Berger-Wolf, Y. Yu, and J. Saia. Finding spread blockers in dynamic networks. In Proceedings of the 2nd SNA-KDD workshop on Web mining and social network analysis (SNA-KDD), August 2008. [46] S. Hill, D. Agarwal, R. Bell, and C. Volinsky. Building an effective represen- tation of dynamic networks. Journal of Computational and Graphical Statistics, 25:584?608, 2006. [47] P. Holme and M. E. J. Newman. Non-equilibrium phase transition in the coevolution of networks and opinions. Physical Review E, 74, 2006. [48] J. Hopcroft, O. Khan, B. Kulis, and B. Selman. Natural communities in large linked networks. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2003. [49] J. Jacoby and D. B. Kyner. Brand loyalty vs. repeat purchasing behavior. Journal of Marketing Research, 10(1):1?9, 1973. [50] S. Jegelka, S. Sra, and A. Banerjee. Approximation algorithms for tensor clustering. In Proceedings of the 20th International Conference on Algorithmic Learning Theory (ALT), 2009. [51] S. Jurvetson. What exactly is viral marketing? Red Herring, pages 110 ? 111, 2000. [52] R. M. Kanter. Commitment and social organization: A study of commit- ment mechanisms in utopian communities. American Sociological Review, 33:499?517, 1968. [53] E. Keller and J. Berry. One American in ten tells the other nine how to vote, where to eat, and what to buy. They are the influentials. Free Press, 2003. [54] C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. Learn- ing systems of concepts with an infinite relational model. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI), 2006. [55] D. Kempe, J. Kleinberg, and E?. Tardos. Maximizing the spread of influence in a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2003. [56] C. A. Kiesler. The psychology of commitment: Experiments linking behavior to belief. Academic Press, 1971. 165 [57] M. Kirsten and S. Wrobel. Relational distance-based clustering. In Proceed- ings of the 8th International Conference on Inductive Logic Programming (ILP), 1998. [58] S. Kok and P. Domingos. Statistical predicate invention. In Proceedings of the 24th International Conference on Machine Learning (ICML), 2007. [59] D. Krackhardt. Structural leverage in marketing. Networks in Marketing, pages 50 ? 59, 1996. [60] K. Lerman. Social networks and social information filtering on digg. In Proceedings of the first International Conference on Weblogs and Social Media (ICWSM), March 2007. [61] J. Leskovec, L. Adamic, and B. Huberman. The dynamics of viral market- ing. ACM Transactions on the Web, 1, 2007. [62] J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolu- tion of social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2008. [63] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. ACM Transactions on Knowledge Discovery from Data, 1(1):177?187, 2007. [64] J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Statistical properties of community structure in large social and information networks. In Pro- ceeding of the 17th international conference on World Wide Web (WWW), April 2008. [65] J. Leskovec, A. Singh, and J. Kleinberg. Patterns of influence in a recom- mendation network. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), April 2006. [66] D. Lewis and W. Gale. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM-SIGIR Conference on Re- search and Development in Information Retrieval (SIGIR), July 1994. [67] B. Long, Z. Zhang, X. Wu?, and P. S. Yu. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning (ICML), 2006. [68] B. Long, Z. Zhang, and P. S. Yu. A probabilistic framework for relational clustering. In Proceedings of the thirteenth ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining (KDD), 2007. [69] V. Mahajan, E. Muller, and F.M. Bass. New product diffusion models in marketing: A review and directions for research. The Journal of Marketing, 54(1):1?26, 1990. 166 [70] J. Mann, R. C. Connor, L. M. Barre, and M. R. Heithaus. Female reproduc- tive success in bottlenose dolphins (/tursiops/ sp.): Life history, habitat, provisioning, and group size effects. Behavioral Ecology, 11:210?219, 2000. [71] J. Mann and B. Sargeant. Like mother, like calf: The ontogeny of foraging tra- ditions in wild Indian Ocean bottlenose dolphins, pages 236?266. Cambridge University Press, 2003. [72] J. Mann, B.L. Sargeant, J.J. Watson-Capps, Q.A. Gibson, M.R. Heithaus, R.C. Connor, and E. Patterson. Why do dolphins carry sponges. PLoS One, 3(12):e3868, 2008. [73] M. McGlohon, L. Akoglu, and C. Faloutsos. Weighted graphs and discon- nected components. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2008. [74] J. P. Meyer, T. E. Becker, and C. Vandenberghe. Employee commitment and motivation: A conceptual analysis and integrative model. Journal of Applied Psychology, 89(6):991?1007, 2004. [75] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (IMC), October 2007. [76] J. Moody and D. R. White. Structural cohesion and embeddedness: A hi- erarchical concept of social groups. American Sociological Review, 68(1):103? 127, 2003. [77] J. Myers and T. Robertson. Dimensions of opinion leadership. Journal of Marketing Research, 9:41 ? 46, 1972. [78] J. Neville, M. Adler, and D. Jensen. Clustering relational data using at- tribute and link information. In Proceedings of the Text Mining and Link Anal- ysis Workshop, 2003. [79] J. W. Newman and R. A. Werbel. Multivariate analysis of brand loyalty for major household appliances. Journal of Marketing Research, 10:404?409, 1973. [80] M. Newman. Mixing patterns in networks. Phys. Rev. E, 67:026126, 2003. [81] M. E. J. Newman. Detecting community structure in networks. The Euro- pean Physical Journal B, 38:321?330, 2004. [82] M. E. J. Newman. Modularity and community structure in networks. PNAS, 103(23):8577?8696, 2006. [83] R. L. Oliver. Loyalty: Whence consumer loyalty? Journal of Marketing, 63:33?44, 1999. 167 [84] A. Plangprasopchok, K. Lerman, and L. Getoor. A probabilistic approach for learning folksonomies from structured data. In Proceedings of the 4th International Conference on Web Search and Data Mining (WSDM), 2011. [85] F. J. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estima- tion for comparing induction algorithms. In Proceedings of the 15th Interna- tional Conference on Machine Learning (ICML?98), July 1998. [86] M. Rattigan, M. Maier, and D. Jensen. Exploiting network structure for active inference in collective classification. Technical report, University of Massachusetts Amherst, 2007. [87] B. H. Raven. Social influence and power. Current studies in social psychology, pages 371 ? 382, 1965. [88] E. M. Rogers and D. G. Cartano. Methods of measuring opinion leadership. Public Opinion Quarterly, 26:435 ? 441, 1962. [89] B. Ryan and N.C. Gross. The diffusion of hybrid seed corn in two Iowa communities. Rural sociology, 8(1):15?24, 1943. [90] M. Saar-Tsechansky and F. Provost. Active sampling for class probability estimation and ranking. Machine Learning, 54(2):153 ? 178, 2004. [91] R. J. Sampson and W. B. Grove. Community structure and crime: testing so- cial disorganization theory. American Journal of Sociology, 94:774?802., 1989. [92] B. L. Sargeant, A. J. Wirsing, M. R. Heithaus, and J. Mann. Can environmen- tal heterogeneity explain individual foraging variation in wild bottlenose dolphins? Behavioral Ecology and Sociobiology, 61:679?688, 2007. [93] T. C. Schelling. Micromotives and Macrobehavior. Norton, 1978. [94] B. Settles. Active learning literature survey. Technical Report 1648, Univer- sity of Wisconsin - Madison, 2009. [95] U. Sharan and J. Neville. Temporal-relational classifiers for prediction in evolving domains. In Proceedings of the IEEE International Conference on Data Mining (ICDM), December 2008. [96] H. Sharara and L. Getoor. Multi-relational affinity propagation. Active submission, 2012. [97] H. Sharara, L. Getoor, and M. Norton. Active surveying: A probabilistic approach for identifying key opinion leaders. In Proceedings of the 22nd International Joint Conference On Artificial Intelligence (IJCAI), 2011. 168 [98] H. Sharara, W. Rand, and L. Getoor. Differential adaptive diffusion: Un- derstanding diversity and learning whom to trust in viral marketing. In Proceedings of the fifth International AAAI Conference on Weblogs and Social Media (ICWSM), 2011. [99] H. Sharara, L. Singh, L. Getoor, and J. Mann. Understanding actor loyalty to event-based groups in affiliation networks. Journal of Advances in Social Networks Analysis and Mining, 1(2):115?126, 2011. [100] M. E. Shaw. Group dynamics. Annual Review of Psychology, 12:129?156, 1961. [101] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE PAMI Trans., 22(8):888?905, 2000. [102] T. Snijders. Models for Longitudinal Network Data, page Chapter 11. New York: Cambridge University Press, 2005. [103] S. Soumerai, T. McLaughlin, J. Gurwitz, E. Guadagnoli, P. Hauptman, C. Borbas, N. Morris, B. McLaughlin, X. Gao, D. Willison, R. Asinger, and F. Gobel. Effect of local medical opinion leaders on quality of care for acute myocardial infarction: A randomized controlled trial. The Journal of the American Medical Association, pages 1358 ? 1363, 1998. [104] F. Stonedahl, W. Rand, and U. Wilensky. Evolving viral marketing strate- gies. In Proceedings of the 12th annual conference on Genetic and evolutionary computation (GECCO), July 2010. [105] A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining partitionings. In AAAI, 2002. [106] J. Sun, C. Faloutsos, S. Papadimitriou, and P. Yu. Graphscope: parameter- free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Jose, CA, August 2007. [107] I. Sutskever, R. Salakhutdinov, and J. Tenenbaum. Modelling relational data using Bayesian clustered tensor factorization. In Proceedings of the 23rd An- nual Conference on Neural Information Processing Systems (NIPS), 2009. [108] C. Tantipathananandh, T. Berger-Wolf, and D. Kempe. A framework for community identification in dynamic social networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Jose, CA, August 2007. [109] B. Taskar, E. Segal, and D. Koller. Probabilistic classification and clustering in relational data. In Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), 2001. 169 [110] G. Tellis. Advertising exposure, loyalty, and brand purchase: A two-stage model of choice. Journal of Marketing Research, 25:134?144, 1988. [111] C. A. Thompson, M. E. Califf, and R. J. Mooney. Active learning for natu- ral language parsing and information extraction. In Proceedings of the 16th International Conference on Machine Learning (ICML), June 1999. [112] S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proceedings of the 9th ACM international conference on Multimedia, September 2001. [113] G. Tu?r, D. Hakkani-Tu?r, and R.E. Schapire. Combining active and semisu- pervised learning for spoken language understanding. Speech Communica- tion, 24(2):171 ? 186, 2005. [114] T. Valente and R. Davis. Accelerating the diffusion of innovations using opinion leaders. The ANNALS of the American Academy of Political and Social Science, 566(1):55 ? 67, 1999. [115] S. Wasserman and K. Faust. Social network analysis: methods and applications. Cambridge University Press, 1994. [116] S. Wasserman and K. Faust. Social network analysis: Methods and applications. New York, Cambridge University Press, 1994. [117] D. Watts and P. Dodds. Influentials, networks, and public opinion forma- tion. Journal of Consumer Research, 34(4):441 ? 458, 2007. [118] D. J. Watts and S. H. Strogatz. Collective dynamics of ?small-world? net- works. Nature, 393:440?442, 1998. [119] F. Wu and B. A. Huberman. Social structure and opinion formation. Com- putational Economics, 2004. [120] Z. Xu, V. Tresp, K. Yu, , and H. Kriegel. Infinite hidden relational models. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI), 2006. [121] H. J. Zeng, Z. Chen, and W. Y. Ma. A unified framework for clustering heterogeneous web objects. In Proceedings of the 3rd International Conference on Web Information Systems Engineering (WISE), 2002. [122] E. Zheleva, H. Sharara, and L. Getoor. Co-evolution of social and affiliation networks. In Proceedings of the 15th ACM SIGKDD Conference On Knowledge Discovery and Data Mining (KDD), 2009. 170