bibliometRics is an R package for bibliometric analysis of scientific production. It can be used for analysing the production of a single author, a working team, department, institute, etcetera.
This document describes the main functionalities in the package, and how to do a bibliometric analysis with
bibliometRics, including producing automatic pdf reports via
First of all, make sure you installed the package by sourcing it (note that this is a working project, so no ‘oficial’ package has been created yet). You can download the source code from github.
Obtaining bibliometric data
So far the unique source of blibiometric information accepted by
bibliometRy is the ISI Web of Knowledge, but other sources can be added. It’s easy to select the publications of a given author on the ISI-WoK if you know its author ID. You can also make a selection of articles (for example, for all the members of a research group) and use the ‘add to marked list’ option to create a custom publication list. Once you are good with your selection, click on the ‘create citation report’ button, then select the ‘save to text file’ option and specify the full range of records. This will generate a text file and download it to your computer.
You may want to edit the AUTHOR field on the text file, although this is not strictly necessary for doing the analysis.
One example of bibliometric information that can be downloaded from ISI-WoK can be found in the file sbegueria.txt, which contains data of papers I have authored, as of January 2015.
The core function for reading ISI-WoK data is, not surprisingly,
read.isiwok. Its only argument is the name of the data file to read:
bib <- read.isiwok('sbegueria.txt') str(bib)
The result is a list with three elements:
pubs. The latter is a data.frame with the publications in rows, and the data referring to each publications in columns, including the number of citations received, year by year.
Analyzing bibliometric data
The main function for analyizing these data is
bibliometric. It takes an object with the bibliometric data resulting from a call to
read.isiwok and returns a data.frame with a number of bibliometric indices.
## name ini years pubs lead pubs_year hin hin_year gin ## 1 BEGUERIA, SANTIAGO 2000 15 80 16 5.33 28 1.87 48 ## gin_year cit_tot cit_year cit_art ifact2 ifact5 i10 i25 i50 cit_max ## 1 3.2 2587 172.47 32.34 5.23 9.26 59 37 13 376 ## pubs09 pubs09_lead pubs099 iscore iscore_lead ## 1 32 8 8 5631 1089
|name||Name of the author, group, department, etc.|
|ini||Initial year of the publication record|
|span||Time span (years) of the publication record|
|pubs||Total number of publications|
|lead||Total number of publications, as the lead (first) author|
|pubs_year||Mean number of publications per year|
|hin_year||h‑index per year|
|gin_year||g‑index per year|
|cit_tot||Total number of citations|
|cit_year||Mean number of citations per year|
|cit_art||Mean number of citations per article|
|ifact2||Impact factor, computed over the last two years|
|ifact5||Impact factor, computed over the last five years|
|i10||Number of publications with 10 or more citations|
|i25||Number of publications with 25 or more citations|
|i50||Number of publications with 50 or more citations|
|cit_max||Number of citations of the most citated publications|
|pubs09||Number of publications over the 90th percentile in its discipline|
|pubs09_lead||Number of publications over the 90th percentile in its discipline, as lead author|
|pubs099||Number of publications over the 99th percentile in its discipline|
|iscore_lead||i‑score, as lead author|
There are functions for computing some of the indices, such as the Hirsch and the Egghe indices. These can be computed for the whole period analyized, or up to a given year.
##  28
##  48
##  13
##  44
There is also a function for ranking the publications in quantiles according to the ISI-WoK Science Indicators tables. These need to be loaded as a table named quantiles.csv, located in the same directory as the data.
A specific plotting function makes it easy to resume most of this information in graphic form. The plots also inform about the temporal evolution of the bibliometric indices, which may be useful for evaluating the scientific career of the evaluated.
The first plot reflects the productivity (quantity) of the author, as well as its impact (citations received). It shows the cumulative number of publications, with distinction between the publications as lead author (black bars) and those as co-author (white bars). The plot also showcases the number of citations for all the publications (white circles) and for those as lead author (black circles). There is a fixed ratio of 1⁄10 between the left (publications) and the right (citations) axis, allowing for easy comparison between different authors, groups, etc.
The second plot focuses on the impact of the publications. It shows the annual evolution of the Hirsch’s h‑index (black circles) and the Egghe’s g‑index (black circles), with a fixed ratio of ½ between them. Evolution of the h‑index is compared with an 1:1 evolution (dashed line), since it is usually assumed that the h‑index grows, as an average, at a rate of 1 per year.
The third plot attempts at evaluating the excelence of the publications. It shows the number of publications classified by quantiles, according to the ISI-WoK Scientific Indicators per discipline. For each quantile, the total number of publications is shown (white bars), as well as the publications as lead author (black bars).
#format_pub(bib$pubs, au=bib$au) apply(cbind(bib$pubs,rank(bib$pubs)),1,format_pub,au=bib$au)
Automated bibliometric reports
The package also contains a template Rtex file, useful for creating automated reports.
require(knitr) infile <- 'sbegueria.txt' outfile <- 'sbegueria.Rtex' # Create custom .Rtex file from the template and knit it x <- readLines('template.Rtex') x <- gsub('FILENAME',infile,x) write(x,outfile) knit(outfile) # Compile the resulting .tex file and create a .pdf from it system(paste('/Library/TeX/texbin/pdflatex ', gsub('.txt','',f),'.tex',sep='')) # Remove all intermediate files kk <- list.files('.',paste(gsub('.txt','',f))) file.remove(kk[-c(grep('.pdf',kk),grep('.txt',kk))])
(I have generated this post automatically with R Markdown and the RWordPress package. I’m not totally satisfied with the rendering, I need to work on it a bit.)