P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng, An application of reinforcement learning to aerobatic helicopter flight, Advances in Neural Information Processing Systems 19, pp.1-8, 2007.

C. C. Aggarwal, F. Al-garawi, and P. S. Yu, Intelligent crawling on the World Wide Web with arbitrary predicates, 2001.
DOI : 10.1145/371920.371955

G. Almpanidis, C. Kotropoulos, and I. Pitas, Combining text and link analysis for focused crawling. An application for vertical search engines, Inf. Syst, vol.32, issue.6, 2007.
DOI : 10.1007/11551188_30

E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts, Everyone's an influencer: Quantifying influence on twitter, Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM '11, pp.65-74, 2011.
DOI : 10.1145/1935826.1935845

A. Baranes and P. Y. Oudeyer, R-iac: Robust intrinsically motivated exploration and active learning, IEEE Transactions on Autonomous Mental Development, vol.1, issue.3, pp.155-169, 2009.
DOI : 10.1109/tamd.2009.2037513

URL : https://hal.archives-ouvertes.fr/hal-00818174

D. Bergmark, C. Lagoze, and A. Sbityakov, Focused crawls, tunneling, and digital libraries, ECDL, 2002.
DOI : 10.1007/3-540-45747-x_7

C. Boutilier, R. Dearden, and M. Goldszmidt, Exploiting structure in policy construction, Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol.2, pp.1104-1111, 1995.

C. Boutilier, R. Dearden, and M. Goldszmidt, Stochastic dynamic programming with factored representations, Artif. Intell, vol.121, issue.1-2, pp.49-107, 2000.
DOI : 10.1016/s0004-3702(00)00033-3

URL : https://doi.org/10.1016/s0004-3702(00)00033-3

R. I. Brafman and M. Tennenholtz, R-max-a general polynomial time algorithm for near-optimal reinforcement learning, JMLR, vol.3, 2003.

S. Chakrabarti, M. Van-den-berg, and B. Dom, Focused crawling: A new approach to topic-specific web resource discovery, 1999.

D. Chakraborty and P. Stone, Structure learning in ergodic factored mdps without knowledge of the transition function's in-degree, Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp.737-744, 2011.

M. Chau and H. Chen, A machine learning approach to web page filtering using content and structure analysis, Decis. Support Syst, vol.44, issue.2, 2008.

S. Chen, J. Fan, G. Li, J. Feng, K. Tan et al., Online topic-aware influence maximization, Proc. VLDB Endow, vol.8, pp.666-677, 2015.
DOI : 10.14778/2735703.2735706

W. Chen, T. Lin, Z. Tan, M. Zhao, and X. Zhou, Robust influence maximization, Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, pp.795-804, 2016.

W. Chen, T. Lin, and C. Yang, Real-time topic-aware influence maximization using preprocessing, Computational Social Networks, vol.3, issue.1, 2016.
DOI : 10.1007/978-3-319-21786-4_1

URL : http://arxiv.org/pdf/1403.0057

W. Chen, C. Wang, and Y. Wang, Scalable influence maximization for prevalent viral marketing in large-scale social networks, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10, pp.1029-1038, 2010.
DOI : 10.1145/1835804.1835934

J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and J. Leskovec, Can cascades be predicted?, Proceedings of the 23rd International Conference on World Wide Web, WWW '14, pp.925-936, 2014.
DOI : 10.1145/2566486.2567997

URL : http://arxiv.org/pdf/1403.4608

S. P. Choi, D. Yeung, and N. L. Zhang, An environment model for nonstationary reinforcement learning, Advances in Neural Information Processing Systems, vol.12, pp.987-993, 2000.

B. C. Silva, E. W. Basso, A. L. Bazzan, and P. M. Engel, Dealing with non-stationary environments using context detection, Proceedings of the 23rd International Conference on Machine Learning, ICML '06, pp.217-224, 2006.

P. Dai and J. Goldsmith, Topological value iteration algorithm for markov decision processes, Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, pp.1860-1865, 2007.

P. Dai and E. A. Hansen, Prioritizing bellman backups without a priority queue, Proceedings of the Seventeenth International Conference on International Conference on Automated Planning and Scheduling, ICAPS'07, pp.113-119, 2007.

B. D. Davison, Topical locality in the web, SIGIR, 2000.

T. Degris and O. Sigaud, Factored Markov Decision Processes, pp.99-126, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01336925

T. Degris, O. Sigaud, and P. Wuillemin, Learning the structure of factored markov decision processes in reinforcement learning problems, Proceedings of the 23rd International Conference on Machine Learning, ICML '06, pp.257-264, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01336925

J. S. Dibangoye and B. , Chaib-draa, and A.-i. Mouaddib. A novel prioritization technique for solving markov decision processes, FLAIRS Conference, pp.537-542, 2008.

T. G. Dietterich, The MAXQ method for hierarchical reinforcement learning, ICML, 1998.

M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori, Focused crawling using context graphs, VLDB, 2000.

C. Diuk, L. Li, and B. R. Leffler, The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.249-256, 2009.

C. Diuk, A. L. Strehl, and M. L. Littman, A hierarchical approach to efficient reinforcement learning in deterministic domains, AAMAS, 2006.

P. Domingos and G. Hulten, Mining high-speed data streams, Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '00, pp.71-80, 2000.

P. Domingos and M. Richardson, Mining the network value of customers, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '01, pp.57-66, 2001.

M. Ester, M. Groß, and H. Kriegel, Focused web crawling: A generic framework for specifying the user interest and for adaptive crawling strategies, VLDB, 2001.

S. Faußer and F. Schwenker, Ensemble methods for reinforcement learning with function approximation, Proceedings of the 10th International Conference on Multiple Classifier Systems, MCS'11, pp.56-65, 2011.

M. D. Garcia-hernandez, J. Ruiz-pinales, E. Onaindia, J. G. Cervantes, S. Ledesma-orozco et al., New prioritized value iteration for markov decision processes, Artif. Intell. Rev, vol.37, issue.2, pp.157-167, 2012.

A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. P. How, Online discovery of feature dependencies, Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML'11, pp.881-888, 2011.

A. Geramifard, T. J. Walsh, N. Roy, and J. P. How, Batch-ifdd for representation expansion in large mdps, Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI'13, pp.242-251, 2013.

M. Ghavamzadeh and S. Mahadevan, A multiagent reinforcement learning algorithm by dynamically merging Markov decision processes, AAMAS, 2002.

M. Ghavamzadeh and S. Mahadevan, Learning to communicate and act using hierarchical reinforcement learning, AAMAS, 2004.

M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou, Walking in facebook: A case study of unbiased sampling of osns, Proceedings of the 29th Conference on Information Communications, INFOCOM'10, pp.2498-2506, 2010.

S. Goel, D. J. Watts, and D. G. Goldstein, The structure of online diffusion networks, Proceedings of the 13th ACM Conference on Electronic Commerce, EC '12, pp.623-638, 2012.

J. Goldenberg, B. Libai, and E. Muller, Talk of the network: A complex systems look at the underlying process of word-of-mouth, Marketing Letters, vol.12, issue.3, pp.211-223, 2001.

G. Gouriten, S. Maniu, and P. Senellart, Scalable, generic, and adaptive systems for focused crawling, HyperText, pp.35-45, 2014.
DOI : 10.1145/2631775.2631795

URL : https://hal.archives-ouvertes.fr/hal-01069821

M. Granovetter, Threshold models of collective behavior, American Journal of Sociology, vol.83, issue.6, pp.1420-1443, 1978.
DOI : 10.1086/226707

A. Grigoriadis and G. Paliouras, Focused crawling using temporal differencelearning, 2004.
DOI : 10.1007/978-3-540-24674-9_16

M. Grounds and D. Kudenko, Parallel reinforcement learning with linear function approximation, Proceedings of the 5th , 6th and 7th European Conference on Adaptive and Learning Agents and Multi-agent Systems: Adaptation and Multi-agent Learning, ALAMAS'05/ALAMAS'06/ALAMAS'07, pp.60-74, 2008.
DOI : 10.1145/1329125.1329179

M. Grze´sgrze´s and J. Hoey, Efficient planning in r-max, The 10th International Conference on Autonomous Agents and Multiagent Systems, vol.3, pp.963-970, 2011.

M. Grze´sgrze´s and J. Hoey, On the convergence of techniques that improve value iteration, The 2013 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2013.

C. Guestrin, R. Patrascu, and D. Schuurmans, Algorithm-directed exploration for model-based reinforcement learning in factored mdps, Proceedings of the International Conference on Machine Learning, pp.235-242, 2002.

J. Guo, P. Zhang, C. Zhou, Y. Cao, and L. Guo, Personalized influence maximization on social networks, Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM '13, pp.199-208, 2013.
DOI : 10.1145/2505515.2505571

M. Han, P. Senellart, S. Bressan, and H. Wu, Routing an autonomous taxi with reinforcement learning, Proc. CIKM, 2016.
DOI : 10.1145/2983323.2983379

URL : https://hal.archives-ouvertes.fr/hal-02143465

M. Han, P. Senellart, and P. Wuillemin, Focused crawling through reinforcement learning, Proc. ICWE, 2018.
DOI : 10.1007/978-3-319-91662-0_20

URL : https://hal.archives-ouvertes.fr/hal-01851547

P. Hernandez-leal, M. Taylor, B. Rosman, L. E. Sucar, and E. M. De-cote, Identifying and tracking switching, non-stationary opponents: A bayesian approach, 2016.

T. Hester and P. Stone, Generalized model learning for reinforcement learning in factored domains, AAMAS, 2009.

T. Hester and P. Stone, Learning and Using Models, pp.111-141, 2012.

G. Hulten, L. Spencer, and P. Domingos, Mining time-changing data streams, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '01, pp.97-106, 2001.
DOI : 10.1145/502512.502529

Q. Jiang, G. Song, G. Cong, Y. Wang, W. Si et al., Simulated annealing based influence maximization in social networks, Proceedings of the TwentyFifth AAAI Conference on Artificial Intelligence, AAAI'11, pp.127-132, 2011.

K. Jung, W. Heo, and W. Chen, Irie: Scalable and robust influence maximization in social networks, Proceedings of the 2012 IEEE 12th International Conference on Data Mining, ICDM '12, pp.918-923, 2012.
DOI : 10.1109/icdm.2012.79

URL : http://arxiv.org/pdf/1111.4795

M. Kearns and D. Koller, Efficient reinforcement learning in factored mdps, Proceedings of the 16th International Joint Conference on Artificial Intelligence, vol.2, pp.740-747, 1999.

M. Kearns and S. Singh, Near-optimal reinforcement learning in polynomial time, Mach. Learn, vol.49, issue.2-3, pp.209-232, 2002.

D. Kempe, J. Kleinberg, and E. Tardos, Maximizing the spread of influence through a social network, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, pp.137-146, 2003.

J. M. Kleinberg, Authoritative sources in a hyperlinked environment, J. ACM, vol.46, issue.5, 1999.

J. Z. Kolter and A. Y. Ng, Near-bayesian exploration in polynomial time, Proceedings of the 26th International Conference on Machine Learning (ICML-09), p.65, 2009.

G. Konidaris and A. Barto, Building portable options: Skill transfer in reinforcement learning, Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp.895-900, 2007.

J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. Vanbriesen et al., Cost-effective outbreak detection in networks, Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '07, pp.420-429, 2007.

H. Li, S. S. Bhowmick, J. Cui, Y. Gao, and J. Ma, Getreal: Towards realistic selection of influence maximization strategies in competitive networks, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pp.1525-1537, 2015.

Y. Li, J. Fan, D. Zhang, and K. Tan, Discovering your selling points: Personalized social influential tags exploration, Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pp.619-634, 2017.

S. Lin, S. Lin, and M. Chen, A learning-based framework to handle multi-round multi-party influence maximization on social networks, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '15, pp.695-704, 2015.

M. Lopes, T. Lang, M. Toussaint, and P. Yves-oudeyer, Exploration in modelbased reinforcement learning by empirically estimating learning progress, Advances in Neural Information Processing Systems, vol.25, pp.206-214, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00755248

M. Lopes and P. Y. Oudeyer, The strategic student approach for life-long exploration and learning, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL), pp.1-8, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00755216

A. S. Maiya and T. Y. Berger-wolf, Benefits of bias: Towards better characterization of network sampling, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pp.105-113, 2011.

H. B. Mcmahan and G. J. Gordon, Fast exact planning in markov decision processes, ICAPS, pp.151-160, 2005.

F. Menczer, Lexical and semantic clustering by web links, J. Am. Soc. Inf. Sci. Technol, vol.55, issue.14, 2004.

F. Menczer, Web Data Mining: Exploring Hyperlink, Content and Usage Data, 2007.

F. Menczer and R. K. Belew, Adaptive retrieval agents: Internalizing local contextand scaling up to the web, Mach. Learn, vol.39, issue.2-3, 2000.

F. Menczer, G. Pant, and P. Srinivasan, Topical web crawlers: Evaluating adaptive algorithms, ACM Trans. Internet Technol, vol.4, issue.4, 2004.

R. Meusel, P. Mika, and R. Blanco, Focused crawling for structured data, CIKM, 2014.

S. Mihara, S. Tsugawa, and H. Ohsaki, Influence maximization problem for unknown social networks, Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, ASONAM '15, pp.1539-1546, 2015.

A. W. Moore and C. G. Atkeson, Prioritized sweeping: Reinforcement learning with less data and less time, Mach. Learn, vol.13, issue.1, pp.103-130, 1993.

T. T. Nguyen, T. Silander, and T. Leong, Transferring expectations in model-based reinforcement learning, Proceedings of the 25th International Conference on Neural Information Processing Systems, vol.2, pp.2555-2563, 2012.

L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web, 1999.

G. Pant, P. Srinivasan, and F. Menczer, Exploration versus exploitation in topic driven crawlers, WWW Workshop on Web Dynamics, 2002.

R. Parr, C. Painter-wakefield, L. Li, and M. Littman, Analyzing feature generation for value-function approximation, Proceedings of the 24th International Conference on Machine Learning, ICML '07, pp.737-744, 2007.

J. Peng and R. J. Williams, Efficient learning and planning within the dyna framework, Adapt. Behav, vol.1, issue.4, pp.437-454, 1993.

M. Qu, H. Zhu, J. Liu, G. Liu, and H. Xiong, A cost-effective recommender system for taxi drivers, KDD, 2014.

R. Rana and F. S. Oliveira, Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning, Omega, vol.47, pp.116-126, 2014.

J. Rennie and A. Mccallum, Using reinforcement learning to spider the web efficiently, ICML, 1999.

M. Richardson and P. Domingos, Mining knowledge-sharing sites for viral marketing, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '02, pp.61-70, 2002.

M. Riedmiller, T. Gabel, R. Hafner, and S. Lange, Reinforcement learning for robot soccer, Autonomous Robots, vol.27, issue.1, pp.55-73, 2009.

B. Rosman, M. Hawasly, and S. Ramamoorthy, Bayesian policy reuse, Machine Learning, vol.104, pp.99-127, 2016.

T. C. Schelling, Micromotives and macrobehavior, 2006.

S. Singh and D. Bertsekas, Reinforcement learning for dynamic channel allocation in cellular telephone systems, Proceedings of the 9th International Conference on Neural Information Processing Systems, NIPS'96, pp.974-980, 1996.

A. L. Strehl, C. Diuk, and M. L. Littman, Efficient structure learning in factored-state mdps, AAAI, 2007.

R. S. Sutton, Integrated architecture for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh International Conference (1990) on Machine Learning, pp.216-224, 1990.

R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1998.

G. Tesauro, R. Das, H. Chan, J. Kephart, D. Levine et al., Managing power consumption and performance of computing systems using reinforcement learning, Advances in Neural Information Processing Systems, vol.20, pp.1497-1504, 2008.

P. E. Utgoff, N. C. Berkman, and J. A. Clouse, Decision tree induction based on efficient tree restructuring, Mach. Learn, vol.29, issue.1, pp.5-44, 1997.

C. J. Watkins, Learning from Delayed Rewards, 1989.

C. J. Watkins and P. Dayan, Technical note: Q-learning, Mach. Learn, vol.8, pp.3-4, 1992.

L. Weng and F. Menczer, Topicality and impact in social media: Diverse messages, focused messengers, PLOS ONE, vol.10, issue.2, p.2015

M. A. Wiering and H. Van-hasselt, Ensemble algorithms in reinforcement learning, Trans. Sys. Man Cyber. Part B, vol.38, issue.4, pp.930-936, 2008.

D. Wingate and K. D. Seppi, P3vi: A partitioned, prioritized, parallel value iterator, Proceedings of the Twenty-first International Conference on Machine Learning, ICML '04, p.109, 2004.

D. Wingate and K. D. Seppi, Prioritization methods for accelerating mdp solvers, J. Mach. Learn. Res, vol.6, pp.851-881, 2005.

J. Yuan, Y. Zheng, X. Xie, and G. Sun, T-drive: Enhancing driving directions with taxi drivers' intelligence, TKDE, vol.25, issue.1, 2013.

N. J. Yuan, Y. Zheng, L. Zhang, and X. Xie, T-finder: A recommender system for finding passengers and vacant taxis, TKDE, vol.25, issue.10, 2013.

Y. Zheng, Trajectory data mining: An overview, ACM Trans. Intell. Syst. Technol, vol.6, issue.3, 2015.

Y. Zhu, D. Li, and Z. Zhang, Minimum cost seed set for competitive social influence, IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, pp.1-9, 2016.

H. Zhuang, Y. Sun, J. Tang, J. Zhang, and X. Sun, Influence maximization in dynamic social networks, 2013 IEEE 13th International Conference on Data Mining, pp.1313-1318, 2013.

, Pour les facteurs basées sur un sujet (ou une catégorie), afin de décider si un message est pertinent pour le sujet (ou la catégorie), nous pouvons utiliser une méthode de classification ou similarité cosinus entre un vecteur de mot du sujet donné (ou catégorie) et celle d'un poste. Lorsque nous utilisons la similarité de cosinus, un seuil ? doitêtredoitêtre sélectionné. Ensuite, si la similarité est supérieure au seuil ?, on peut la considérer pertinente pour le sujet donné. Sur cette base, nous pouvons calculer un taux de publication d'un sujet donné (ou catégorie) parmi tous les messages générés par un utilisateur comme suit : pour un utilisateur, le nombre de messages de l'utilisateur qui sont pertinents pour le sujet, En visitant un noeud, nous enregistrons pour ses noeuds enfants l'information de parenté du noeud actuel

, ? Taux moyen de publication de tous les parents pour un sujet donné ? Taux moyen de publication de tous les parents pour les catégories ? Changement du taux d'enregistrement pour un sujet donné ? Distance du dernier noeud activé

, L'un est les comportements généraux de l'utilisateur sur un réseau social et l'autre est l'intérêt de l'utilisateur. Le facteur Nombre d'enfants est un bon indicateur pour voir si un utilisateur a un grand groupe d'audience ou non. Le facteur Nombre de messages peutêtrepeutêtre utilisé pour prédire l'activité de l'utilisateur. Les deux autres facteurs, Taux de message pour le sujet donné et Taux de message pour les Catégories, Action. Les facteurs d'action sont basées sur les informations propres d'un noeud, composées de deux types d'informations