Machine Learning/Kaggle Social Network Contest/Features: Difference between revisions

From Noisebridge
Jump to navigation Jump to search
Line 3: Line 3:


== Possible Features ==
== Possible Features ==
*nodeid
*Node Features
*nodetofollowid
**nodeid
*median path length
**outdegree
*shortest distance from nodeid to nodetofollowid
**indegree
*inbound edges
**local clustering coefficient
*outbound edges
**reciprocation of inbound probability (num of edges returned / num of inbound edges)
*clustering coefficient
**reciprocation of outbound probability (num of edges returned / num of outbound edges)
*reciprocation probability (num of edges returned / num of outbound edges)


The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future
*Edge Features
**nodetofollowid
**shortest distance nodeid to nodetofollowid
**density? (<strike>median path length</strike>)
**is nodetofollowid following nodeid?
**number of common friends
**indegrees & outdegrees of nodetofollowid


From the Backstrom and Leskovec, for a node s and a potential target c
* Network features
* Network features
** unweighted random walk score
** unweighted random walk score
** global clustering coefficient
** Adamic-Adar score
** Adamic-Adar score
*** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper]
*** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper]
** number of common friends
 
** indegrees and outdegrees of  s
The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future
*** the indegree is the number of edges coming into node s
*** the outdegree is the number of edges leaving node s
** indegrees and outdegrees of  c

Revision as of 22:54, 19 November 2010

TODO

  • Precisely define the listed features

Possible Features

  • Node Features
    • nodeid
    • outdegree
    • indegree
    • local clustering coefficient
    • reciprocation of inbound probability (num of edges returned / num of inbound edges)
    • reciprocation of outbound probability (num of edges returned / num of outbound edges)
  • Edge Features
    • nodetofollowid
    • shortest distance nodeid to nodetofollowid
    • density? (median path length)
    • is nodetofollowid following nodeid?
    • number of common friends
    • indegrees & outdegrees of nodetofollowid
  • Network features
    • unweighted random walk score
    • global clustering coefficient
    • Adamic-Adar score

The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future