Abstract
1- Introduction
2- Related work
3- Notations and problem statement
4- Content diffusion models
5- Experimental validation
6- Influence maximization (IM)
7- Conclusion
References
Abstract
Predicting the diffusion of information in social networks is a key problem for applications like Opinion Leader Detection, Buzz Detection or Viral Marketing. Many diffusion models are direct extensions of the Cascade and Threshold models, initially proposed for epidemiology and social studies. In such models, the diffusion process is based on the dynamics of interactions between neighbor nodes in the network (the social pressure), and largely ignores important dimensions as the content diffused and the active/passive role users tend to have in social networks. We propose here a new family of models that aims at predicting how a content diffuses in a network by making use of additional dimensions: the content diffused, user’s profile and willingness to diffuse. In particular, we show how to integrate these dimensions into simple feature functions, and propose a probabilistic modeling to account for the diffusion process. These models are then illustrated and compared with other approaches on two blog datasets. The experimental results obtained on these datasets show that taking into account the content diffused is important to accurately model the diffusion process. Lastly, we study the influence maximization problem with these models and prove that it is NP-hard, prior to propose an adaptation of the greedy algorithm to approximate the optimal solution.
Introduction
Propagation models in content networks, i.e. social networks in which content are shared and diffused among users, aim at reproducing the diffusion of information between users. Being able to accurately model this diffusion has several practical applications, as the identification of influence hubs, the choice of initial diffusers for a maximal diffusion, or the identification of links one has to remove in order to limit the diffusion (e.g. for stopping rumors). Most of the models proposed in the domain of information diffusion are extensions of the Independent Cascade model (IC) [1] and the Linear Threshold model (LT) [2]. The IC model is based on the following simple principle: as soon as a user (i.e. a node in the social network) nj is infected, she has a unique chance to infect each of her direct neighbors ni with a probability Pji that depends on both nj and ni. The LT model considers that a node ni of the social network (i.e. a user) is contaminated if the sum of the weights on its incoming edges are above a threshold θi specific to ni, this threshold being chosen randomly in many instances of the model [3]. They nevertheless fail to take into account for two important elements: • They ignore the content of the information diffused even though, in a given social network, two different pieces of information will not propagate in the same way; • They tend to ignore users characteristics even though the interest of a particular user plays a major role in the diffusion process. Ignoring the content being diffused entails that, in these models, different contents issued from the same user will diffuse in the same manner. In other words, in content-agnostic models, the diffusion cascades1 originating from a given user are the same, regardless of the content being diffused.