Editing
Machine Learning/Kaggle Social Network Contest
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Status == {| border="1" cellspacing="0" cellpadding="2" |- ! |Tasks !| Status !| target date !|subpage |- !|Lit review | started | - | [[/lit review| Lit Review]] |- |- !|Load data | started | 11/24 | [[/load data| Load data]] |- !|Describe network | started | 11/24 | [[/Network Description| Network Description]] |- !|Choose problem representation | started | 11/24 | [[/Problem Representation| Problem Representation]] |- !|Generate candidate features | 0% | 11/24 | [[/Features | Features]] |- !|fit to model | 0% | | [[/model | Model]] |- !|Win competition | 0% | | [[/what to do with all the money | Prize Plan]] |} == Official Contest Links == * Overview: http://kaggle.com/socialNetwork * Data Details: http://kaggle.com/socialNetwork?viewtype=data (login required) == Official Data Downloads == * Official Training Data File: http://dl.dropbox.com/u/14895843/social-network-kaggle/social_train.csv * Official Test Data File: http://dl.dropbox.com/u/14895843/social-network-kaggle/social_test.txt * Official Sample Submission File: http://dl.dropbox.com/u/14895843/social-network-kaggle/sample_submission.csv == Key Contest Info == The data has been downloaded using the API of a social network. There are 7.2m contacts/edges of 38k users/nodes. These have been drawn randomly ensuring a certain level of closedness. You are given 7,237,983 contacts/edges from a social network (social_train.zip). The first column is the outbound node and the second column is the inbound node. The ids have been encoded so that the users are anonymous. Ids reach from 1 to 1,133,547. There are 37,689 outbound nodes and 1,133,518 inbound nodes. Most outbound nodes are also inbound nodes so that the total number of unique nodes is 1,133,547. The way the contacts were sampled makes sure that the universe is roughly closed. Note that not every relationship is mutual. The test dataset contains 8,960 edges from 8,960 unique outbound nodes (social_test.csv). Of those 4,480 are true and 4,480 are false edges. You are tasked to predict which are true (1) and which are false (0). You need to supply back a file with outbound node id,inbound node id,[0,1] in each row. This means you can assign a probability of being true to an edge. You are being scored on the AUC. A random model will have an AUC of 0.5, so you need to try to do better than that (ie have a higher AUC). Your entry should conform to the format in sample_submission.csv. You are encouraged to explore techniques which explain the social network/graph. The best entrant should try to explain his approach/method to other users. Donβt despair if your first couple of solutions score low, this is an explorative process. == Our Working Data Dumps == * Adjacency list based from the training data: http://dl.dropbox.com/u/14895843/social-network-kaggle/adj_list.out.csv First column: outbound vertex Remaining columns: list of vertices to which it points Note: Useful when loaded up as a hashtable keyed on outbound vertex returning the list. * Adjacency list of the reversed Graph: http://dl.dropbox.com/u/14895843/social-network-kaggle/reverse_adj_list.out.csv First column: inbound vertex Remaining columns: list of vertices which point to it Note: This is useful if interested in following the edges backwards quickly. This is useful to load as a hashtable keyed on inbound vertex returning the list. * Degree Features for all Nodes: http://dl.dropbox.com/u/14895843/social-network-kaggle/node_degree_features.csv First column: Node Id Second column: Outbound Degree (count of the number of outbound edges from node) Third column: Inbound Degree (count of the number of inbound edges to node) Note: You can think of these as number of followees and followers (respectively). Additionally, note that only the first 32.7k rows have 'followees' == Useful Links == * [[http://arxiv.org/abs/1011.4071 | Supervised Random Walks: Predicting and Recommending Links in Social Networks]] * Matrix Digraph Algs: http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/GraphAlgor/graphIntro.htm * "Strongly Connected Components": http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/GraphAlgor/strongComponent.htm * http://en.wikipedia.org/wiki/Graph_theory#Graph-theoretic_data_structures * http://en.wikipedia.org/wiki/Glossary_of_graph_theory * [[http://books.google.com/books?id=Ww3_bKcz6kgC&lpg=PA67&ots=aFSGYEjA_g&dq=calculate%20degree%20in%20%22directed%20graph%22%20OR%20digraph&pg=PP7#v=onepage&q&f=false | Another Google Book on Social Network Analysis]]
Summary:
Please note that all contributions to Noisebridge are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
Noisebridge:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Request account
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Dig in!
Noisebridge
- Status: MOVED
- Donate
- ABOUT
- Accessibility
- Vision
- Blog
Manual
MANUAL
Visitors
Participation
Community Standards
Channels
Operations
Events
EVENTS
Guilds
GUILDS
- Meta
- Electronics
- Fabrication
- Games
- Music
- Library
- Neuro
- Philosophy
- Funding
- Art
- Crypto
- Documentation/Wiki
Wiki
Recent Changes
Random Page
Help
Categories
(Edit)
Tools
What links here
Related changes
Special pages
Page information