The
general theme of the course will be on algorithmic, graph theoretical, and
application oriented issues related to large scale complex social networks. The
specific focus of the course will be on models and algorithms for social
networking structure, information, influence, and belief propagation, privacy
and security issues, anonymous and de-anonymous, and data mining issues in
social networks.
The
course will have a seminar format and it will be based on recently published
material. The list of recommended reading will be constantly updated. Students
are expected to present some of the selected papers on some topics and also
implement some selected projects related to social networking.
The
course will start with a review of necessary background topics such as
applications of social networking in different fields, the emerging enabling
technologies for social networking and big data analytics. Then, the course
will discuss different technical issues in social networking such as large
scale network modeling, the information propagation, influence modeling and
propagation, spam detection, sentiment analysis, privacy and security issues in
social networking, anonymous and de-anonymous, clustering, business sides of
social networking. Theories, algorithms and protocols will complement the
application oriented material of the class. New and emerging topics in both
theoretical research and applications will be presented as well.
The
goal of the course is to provide students with the necessary foundations to
apply social networking, theory, and algorithms in the field of social
networking and big data. The focus of this class is to discuss and understand
the challenges in emerging social networking systems, and mobile networks.
1)
Classroom: Stuart Building 107 (changed from
220 SB)
2)
Date/Time: W 6:25pm-9:05pm (see IIT
calendars for holidays)
3)
Instructor: XiangYang Li; Electronic contact: xli at
cs dot iit dot edu; Office: SB 229C; Office hours: M, W: 2-3pm
4)
Teaching Assistant: ???, Email:???; Office: SB
019B, Office Hours: Friday 1PM to 3PM.
5)
Course Lectures: will be put online
(blackboard) when ready
Working knowledge of some programming languages such as C++, Java,
Python, and data structures is required. Familiarity with basic algorithmic
concepts, probability theory, statistics and linear algebra is also preferred
(some are required). Programming projects will require knowledge of C (or C++),
java, python.
Courses
required: CS 116, CS 330, and CS 331.
Courses
recommended: CS 422, CS 430.
There is no mandated textbook. Recommended books:
1. Networks,
Crowds, Markets, by D. Easley and J. Kleinberg. See Online version
here
2. Think
Complex, by Allen Downey, see online PDF
version here.
Students
who take the course for credit will be required to complete a course project.
Completion of the project constitutes 50% of the overall grade. In the course
of preparing the project the students will have to do one presentation of
several related papers (preferably related to their project). The presentation
will be 20% of the credit. Finally, a reaction paper that summarizes the
initial thoughts of the students with respect to their topic will be another
20% of the total credit. Finally, active class participation will be the
additional 10%.
1. One class
presentation 20%
2. One
reaction paper 20%
3. One
project 50%
4. Class
participation. 10%
Incompletes
will not be given.
Late
Assignment Policy: There will be a penalty of 10% per day, up to three days
late. After that no credit will be given.
The
course is primarily based on recent material (from the past 5-10 years); this
means that most of it exists in the form of papers on the Web, and the existing
literature raises a lot of interesting issues that have yet to be explored.
In the
course of preparing the project the students will have to do one presentation
of some papers (preferably related to their project). The presentation will
either be an initial attempt to familiarize the students with the area they are
going to be working on for the rest of the semester. A student may decide to do
a presentation on a topic irrelevant to his/her project theme.
As a way
to get everyone thinking about the research issues underlying the course, there
will be a short reaction paper of at least 5 pages in length in IEEE format.
The reaction paper should be structured as follows. First, you should read at
least two (it is better to read more) closely related papers relevant to a
particular section of the course. It is better to read the most recent papers
or widely cited papers. You should then write at least 5 pages in IEEE
Transactions format in which you address the following points:
1.
What is main technical content of the papers?
2.
Why is it interesting in relation to the corresponding section of
the course?
3.
What are the weaknesses of the papers, and how could they be
improved?
4.
What are some promising further research questions in the
direction of the papers, and how could they be pursued?
Reaction
papers should not just be
summaries of the papers you read; most of your text should be focused on
synthesis of the underlying ideas, and your own perspective on the papers. To
make this concrete, you should make sure that you devote much of the content to
the last bullet above: promising directions for further research. In
particular, the reaction paper should contain at least some amount of each of
the following types of content:
1.
A proposal for a model or algorithm - potentially extending, varying, or
improving something in the papers you have read - together with some
mathematical analysis of it. You should also show the feasibility of your
approach:
1.
What are the hypotheses that you want to show? How will you verify
your hypothesis? Where will you get your data to test your method? Why will
your method work? How will you evaluate your method? What will you do if your
method did not work as you expected?
2.
The time plan for your project.
2.
A test of a model or algorithm (either your own or something from
one of the papers) on a dataset or on simulated data.
The
reaction paper should be considered as a very good way to explore a potential
project topic.
The final
piece of the work for the course will be a project. You can work on this in
groups of up to 3 students, and it is largely up to you to define the topic and
scope of the project.
The first
step in the project will be a short proposal. This is meant just
to be a brief description of what you are intending for the project
- about 2 pages in length, with a discussion of relevant background work and
tentative plans for how you would proceed. If your project is based on your reaction paper,
then you do not need to repeat things you have said in the reaction paper - it
is enough to describe how you plan to turn the ideas from the
reaction paper into a larger project.
The basic
genres of project are the following:
1.
An experimental evaluation of an algorithm, model, or measure on an
interesting dataset. The datasets on the course home page suggest some possible
domains in which to think about such experiments; but you can also assemble
your own data. See the SNAP project
from Stanford for some interesting data set.
2.
A simple Facebook or twitter, or Weibo application.
3.
A theoretical project that considers an algorithm, model, or
measure in social network, (checkout the related papers from ACM STOC, IEEE
FOCS, ACM SODA, ACM EC, and so on)
4.
The area of some course topics, and derive rigorous results about
it.
5.
An extended, critical survey of one the course topics, going into
significant depth and offering a novel perspective on the area.
As with
the reaction paper, the project should contain at least some amount of
mathematical analysis, and some experimentation on real or synthetic data (it
is also recommended for even a survey paper).
The
result of the project will typically be a 10-15 page paper (in ACM format, and
survey paper will be longer), describing the approach, the results, and the
related work. The references on the course home page serve as examples of what
such papers tend to look like; of course, the overall form of the paper will
depend on the nature of the project.
The final
stage will be a presentation of the projects in class by each group. The exact
schedule for the project presentations will be worked out later in the
semester.
Before the project, every student needs to write
code for crawling the data from twitter, or Facebook, or weibo (using our own
code, and published code online). Or you can get large scale social networking
data from the following sites
you can use the data published online
1. http://snap.stanford.edu/data/index.html
(SNAP from stanford)
2. http://www-personal.umich.edu/~mejn/netdata/
3. http://iv.slis.indiana.edu/db/index.html
4. http://math.nist.gov/~RPozo/complex_datasets.html
(NIST)
5. http://www.infochimps.com/tags/social
6. http://www.infochimps.com/tags/socialnetwork
For network analysis and data visualization, we
can use
Stanford Network Analysis Platform (SNAP), a general purpose, high performance system for
analysis and manipulation of large networks. See http://snap.stanford.edu/snap/index.html
for instructions
1.
Sentiment analysis: given a tweet, analyze the sentiment; also
analyze the sentiment evolvement (similar to sentiment140, and wisdom project
by microstrategy)
2.
Belief propagation: given a tweet, capture the belief propagation
in social networks
3.
Personality analysis of users: given the data collected, analyze
the personality of a user, and its evolvement
4.
Spam detection in large scale online social networks
5.
Online social network user behavior modeling, and capture
6.
Online social network user belief modeling and capture
7.
Given the data collected, analyze the sentiment and feedback of
users towards a certain product (such as camera, or Canon Camera). We need to
separate the feedback after the action (e.g, buying), or before the action.
8.
Privacy protection in large scale social networks
9.
Non-tracking and advertisement in social networks
10.
Integrating social networking with mobile computing: mobile social
network
11.
Integrating social networking with crowd-computing: crowd-based
location based services, crowd-based emergency handling and evacuation.
12.
How to find useful data in a social network. Here usefulness depends on the applications. For example, for different business model,
the focused keywords will be different. For anti-terrorist, it may be
interested in some keywords, and patter. For bestbuy, they care about selling
products to customer; for Samsung, they only care about their products.
13.
Something similar to Wisdom project by Microstragy?
14.
Integrate data from various social network sources (facebook,
twitter, and phone record) to de-anonymous users.
15.
Integrate data from various social network sources to study a user
personality? A users value for a commercial application?
David Easley and Jon Kleinberg: Networks, Crowds, and Markets: Reasoning About
a Highly Connected World.
Generative models for social networks
1.
Michael Mitzenmacher, A Brief History of Generative Models for
Power Law and Lognormal Distributions,
2.
Bela Bollobas and Oliver RiorDan, The Diameter of Scale-Free
Random Graph, Combinatorica 24 (1), 2004. Page 5-34.
3.
S. N. Dorogovtsev, A. V. Goltsev, J. F. F. Mendes, Critical
phenomena in complex networks
4.
Hossam Sharara, Lisa Singh, Lise Getoor, Janet Mann: The Dynamics of Actor Loyalty to Groups in
Affiliation Networks. International Conference on Advances in Social
Network Analysis (ASONAM) 2009.
5.
Elena Zheleva, Hossam Sharara, Lise Getoor: Co-evolution of social and affiliation
networks. ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD) 2009.
6.
Silvio Lattazi, D. Sivakumar: Affiliation networks. ACM
Symposium on Theory of Computing (STOC), 2009.
7.
Jure Leskovec, Christos Faloutsos: Scalable modeling of real graphs using
Kronecker multiplication. International Conference
on Machile Learning (ICML), 2007.
8.
Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws,
Shrinking Diameters and Possible Explanations. ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)
2005.
9.
M. E. J. Newman: Power laws, Pareto distributions and Zipf law,
Contemporary Physics.
10.
R. Albert and L.A. Barabasi, Statistical Mechanics of Complex
Networks, Rev. Mod. Phys. 74, 47-97 (2002).
11.
B. Bollobas: Mathematical Results in Scale-Free random Graphs.
12.
D. S. Callaway, J. E. Hopcroft, J. M. Kleinberg, M. E. J. Newman,
and S. H. Strogatz: Are randomly grown graphs really random? Phys.
Rev. E 64, 041902 (2001).
13.
D.J. Watts: Dynamics and Small-World Phenomenon.
American Journal of Sociology, Vol. 105, Number 2, 493-527, 1999
14.
Watts, D. J. and S. H. Strogatz: Collective dynamics of small-world networks. Nature
393:440-42, 1998
Information propagation / collaboration
1.
Michael Mathioudakis, Nick Koudas: Efficient identification of starters and
followers in social media. Extended DataBase
Technology Conference (EDBT), 2009.
2.
Theodoros Lappas, Kun Liu, Evimaria Terzi: Finding a team of experts in social networks. ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)
2009.
3.
Heikki Mannila, Evimaria Terzi: Finding links and initiators: a
graph-reconstruction problem. SIAM Data Mining
Conference (SDM) 2009.
4.
Deepayan Chakrabarti, Yang Wang, Chenxi Wang, Jure Leskovec,
Christos Faloutsos: Epidemic thresholds in real networks. ACM
Transactions on Information and Systems Security, 2008.
5.
Amit Goyal, Francesco Bonchi, Laks V. S. Lakshmanan: Discovering leaders from community actions. ACM
Conference on Information and Knowledge Management (CIKM) 2008.
6.
Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos
Faloutsos, Jeanne VanBriesen, Natalie S. Glance: Cost-effective outbreak detection in networks. ACM
International Conference on Knowledge Discovery and Data Mining (KDD), 2007.
7.
Hanghang Tong, Christos Faloutsos: Center-piece subgraphs: problem definition and
fast solutions. ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD), 2006.
8.
David Kempe, Jon Kleinberg, Eva Tardos: Maximizing the spread of influence through a
social network. ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD), 2003.
9.
Pedro Domings, Matthew Richardson: Mining the network value of customers. ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD),
2001.
Privacy and social networks
1. Barnes, S.B., A privacy paradox: Social networking in the United States, First Monday, volume 11, page 11--15, 2006
2. Rosenblum, D., What anyone can know: The privacy risks of social networking sites, IEEE Security & Privacy, vol 5, number 3, 2007
1. Lei Zou,
Lei Chen, M. Tamer Oszu: K-automorphism: a general framework for
privacy preserving network publication. Proceedings of Very Large
DataBases (PVLDB) 2009.
2. Elena
Zheleva, Lise Getoor: To join or not to join: the illusion of
privacy in social networks with mixed public and private user profiles.
International Conference on World Wide Web (WWW), 2009.
3. Arvind
Narayanan, Vitaly Shmatikov: De-anonymizing Social Networks. IEEE
Symposium on Security and Privacy 2009.
4. Michael
Hay, Chao Li, Gerome Miklau, David Jensen: Accurate Estimation of the Degree Distribution
of Private Networks. IEEE International Conference on Data
Mining (ICDM) 2009.
5. Kun Liu,
Evimaria Terzi: A framework for computing the privacy score of users in online
social networks. IEEE International Conference on Data
Mining (ICDM) 2009.
6. X. Ying
and X. Wu: Graph generation with prescribed feature
constraints. Siam Data Mining Conference (SDM), 2009.
7. Kun Liu,
Evimaria Terzi: Towards identity anonymization on graphs, ACM
International Conference on Management of Data (SIGMOD) 2008.
8. Michael
Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis: Resisting structural identification in
anonymized social networks. Conference on Very Large
Databases (VLDB) 2008.
9. Lars
Backstrom, Cynthia Dwork, Jon Kleinberg: Wherefore art thoou r3579x?: anonymized social
networks, hidden patterns and structural steganography,
International Conference on World Wide Web (WWW), 2007.
Influence
propagation and Influencer Determination
3. Kempe, D. and Kleinberg, J. and Tardos, E., Maximizing the spread of influence through a social network, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137--146, 2003.
3. J
Leskovec, M McGlohon, C Faloutsos, N Glance, Cascading Behavior in Large Blog
Graphs, arxiv.
4. Hartline, J. and Mirrokni, V. and Sundararajan, M., Optimal marketing strategies over social networks, Proceedings of the 17th international conference on World Wide Web, 2008.
5. Tang, J. and Sun, J. and Wang, C. and Yang, Z., Social influence analysis in large-scale networks, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009.
6. Dasgupta, K. and Singh, R. and Viswanathan, B. and Chakraborty, D. and Mukherjea, S. and Nanavati, A.A. and Joshi, A., Social ties and their relevance to churn in mobile telecom networks, Proceedings of the 11th international conference on Extending database technology: Advances in database technology, 2008.
7. Bakshy, E. and Hofman, J.M. and Mason, W.A. and Watts, D.J., Everyone's an influencer: quantifying influence on twitter, Proceedings of the fourth ACM international conference on Web search and data mining, 2011.
8. Chen, W. and Wang, C. and Wang, Y., Scalable influence maximization for prevalent viral marketing in large-scale social networks, Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010.
9. Chen, W. and Wang, Y. and Yang, S., Efficient influence maximization in social networks, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009
Community
Detection and Partition in Social Networks
10. Girvan, M. and Newman, M.E.J., Community structure in social and biological networks, Proceedings of the National Academy of Sciences, volume 99, number 12, year 2002, National Acad Sciences.
11. Newman, M.E.J. and Girvan, M., Finding and evaluating community structure in networks, Physical review E, volume 69, number 2, 2004
12. Palla, G. and Derenyi, I. and Farkas, I. and Vicsek, T., Uncovering the overlapping community structure of complex networks in nature and society, Nature, volume 435, number 7043, 2005
13. Clauset, A. and Newman, M.E.J. and Moore, C., Finding community structure in very large networks, Physical review E, 2004
14. Newman, M.E.J., Fast algorithm for detecting community structure in networks, Physical Review E, 2004
15. Leskovec, J. and Lang, K.J. and Dasgupta, A. and Mahoney, M.W., Statistical properties of community structure in large social and information networks, Proceeding of the 17th international conference on World Wide Web, 2008.
16. Mislove, A. and Marcon, M. and Gummadi, K.P. and Druschel, P. and Bhattacharjee, B. ,Measurement and analysis of online social networks, Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, 2007.
17. Jin, E.M. and Girvan, M. and Newman, M.E.J., Structure of growing social networks, Physical review E, Volume 64, 2001
18. Kumar, R. and Novak, J. and Tomkins, A., Structure and evolution of online social networks, Link Mining: Models, Algorithms, and Applications, pages 337--357, 2010
19. McPherson, M. and Smith-Lovin, L. and Cook, J.M., Birds of a feather: Homophily in social networks, Annual review of sociology, Pages 415--444, 2001
20. Newman, M.E.J., Finding community structure in networks using the eigenvectors of matrices, Physical review E, 2006
21. Danon, L. and Diaz-Guilera, A. and Duch, J. and Arenas, A., Comparing community structure identification, Journal of Statistical Mechanics: Theory and Experiment, 2005
22. Backstrom, L. and Huttenlocher, D. and Kleinberg, J. and Lan, X., Group formation in large social networks: membership, growth, and evolution, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44--54, 2006
23. Markines, B. and Cattuto, C. and Menczer, F., Social spam detection, Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, 2009,
24. Wang, A.H., Don't follow me: Spam detection in twitter, Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), 2010.
25. Benevenuto, F. and Rodrigues, T. and Almeida, V. and Almeida, J. and Zhang, C. and Ross, K., Identifying video spammers in online social networks, Proceedings of the 4th international workshop on Adversarial information retrieval on the web, 2008,
26. Tseng, C.Y. and Chen, M.S., Incremental SVM model for spam detection on dynamic email social networks, International Conference on Computational Science and Engineering, 2009.
27. Heymann, P. and Koutrika, G. and Garcia-Molina, H., Fighting spam on social web sites: A survey of approaches and future challenges, IEEE Internet Computing, 2007.
28. Stringhini, G. and Kruegel, C. and Vigna, G., Detecting spammers on social networks, Proceedings of the 26th Annual Computer Security Applications Conference, 1-9, 2010.
29. Benevenuto, F. and Rodrigues, T. and Almeida, V. and Almeida, J. and Goncalves, M., Detecting spammers and content promoters in online video social networks, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 620--627, 2009
30. Wang, A., Detecting spam bots in online social networking sites: a machine learning approach, Data and Applications Security and Privacy XXIV, pages 335--342, 2010
31. Lee, K. and Caverlee, J. and Webb, S., Uncovering social spammers: social honeypots+ machine learning, Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval,435-442, 2010.
32. Leskovec, J. and Huttenlocher, D. and Kleinberg, J., Predicting positive and negative links in online social networks, Proceedings of the 19th international conference on World wide web, 641--650, 2010.
33. Yang, W.S. and Dia, J.B. and Cheng, H.C. and Lin, H.T., Mining social networks for targeted advertising, Proceedings of the 39th Annual Hawaii International Conference on System Sciences, 2006. HICSS'06. 2006.
34. Trusov, M. and Bucklin, R.E. and Pauwels, K., Effects of word-of-mouth versus traditional marketing: Findings from an internet social networking site, Robert H. Smith School Research Paper No. RHS, 2008.
35. Provost, F. and Dalessandro, B. and Hook, R. and Zhang, X. and Murray, A., Audience selection for on-line brand advertising: privacy-friendly social network targeting, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009.
36. Narayanan, A. and Shmatikov, V., De-anonymizing social networks, 30th IEEE Symposium on Security and Privacy, 2009,
37. Domingos, P., Mining social networks for viral marketing, IEEE Intelligent Systems, pages 80--82, 2005.
38. Hartline, J. and Mirrokni, V. and Sundararajan, M., Optimal marketing strategies over social networks, Proceedings of the 17th international conference on World Wide Web, 189--198, 2008.
39. Mossel, E. and Roch, S., On the submodularity of influence in social networks, Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, 128--134, 2007.