Open Academic Graph is announced!

This data set is generated by linking two large academic graphs: Microsoft Academic Graph (MAG) and AMiner, and it is used for research purpose only. This version includes 166,192,182 papers from MAG and 154,771,162 papers from AMiner. We generated 64,639,608 linking (matching) relations between the two graphs. In the future, more linking results, like authors, will be published. It can be used as a unified large academic graph for studying citation network, paper content, and others, and can be also used to study integration of multiple academic graphs.

The overall data set includes three parts, which are described in the table below:

 Data Set
 Download Link
 #Paper
Total Size
 Date
Linking relations (matching)
 64,639,608
1.6GB
2017-06-22
MAG papers
166,192,182
104GB
2017-06-09
AMiner papers
154,771,162
39GB
2017-03-22

Method

We obtain linking relations of two publication graphs by two steps:

  1. Use Microsoft Graph Search API to query each AMiner paper’s title and obtain candidate matching papers for each AMiner paper.
  2. We match two papers if they have
    • very similar titles
    • same number of authors
    • similar author names and
    • same published year.

Evaluation

We random sampled 4100 linking pairs and evaluated the matching accuracy. The number of truly matching pairs is 4029 and the matching accuracy can achieve 98.27%.