Machine Learning/Kaggle Social Network Contest/Features: Difference between revisions
Jump to navigation
Jump to search
Line 3: | Line 3: | ||
== Possible Features == | == Possible Features == | ||
*nodeid | *Node Features | ||
* | **nodeid | ||
* | **outdegree | ||
* | **indegree | ||
*inbound edges | **local clustering coefficient | ||
**reciprocation of inbound probability (num of edges returned / num of inbound edges) | |||
* | **reciprocation of outbound probability (num of edges returned / num of outbound edges) | ||
*reciprocation probability (num of edges returned / num of outbound edges) | |||
*Edge Features | |||
**nodetofollowid | |||
**shortest distance nodeid to nodetofollowid | |||
**density? (<strike>median path length</strike>) | |||
**is nodetofollowid following nodeid? | |||
**number of common friends | |||
**indegrees & outdegrees of nodetofollowid | |||
* Network features | * Network features | ||
** unweighted random walk score | ** unweighted random walk score | ||
** global clustering coefficient | |||
** Adamic-Adar score | ** Adamic-Adar score | ||
*** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper] | *** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper] | ||
The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future | |||
Revision as of 22:54, 19 November 2010
TODO
- Precisely define the listed features
Possible Features
- Node Features
- nodeid
- outdegree
- indegree
- local clustering coefficient
- reciprocation of inbound probability (num of edges returned / num of inbound edges)
- reciprocation of outbound probability (num of edges returned / num of outbound edges)
- Edge Features
- nodetofollowid
- shortest distance nodeid to nodetofollowid
- density? (
median path length) - is nodetofollowid following nodeid?
- number of common friends
- indegrees & outdegrees of nodetofollowid
- Network features
- unweighted random walk score
- global clustering coefficient
- Adamic-Adar score
- see original paper
The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future