bibliometRics: an R package for bibliometric analysis

bibliometRics is an R package for bibliometric analysis of scientific production. It can be used for analysing the production of a single author, a working team, department, institute, etcetera.

This document describes the main functionalities in the package, and how to do a bibliometric analysis with bibliometRics, including producing automatic pdf reports via knitr.

First of all, make sure you installed the package by sourcing it (note that this is a working project, so no ‘oficial’ package has been created yet). You can download the source code from github.

source('bibliometRics.R')

Obtaining bibliometric data

So far the unique source of blibiometric information accepted by bibliometRy is the ISI Web of Knowledge, but other sources can be added. It’s easy to select the publications of a given author on the ISI-WoK if you know its author ID. You can also make a selection of articles (for example, for all the members of a research group) and use the ‘add to marked list’ option to create a custom publication list. Once you are good with your selection, click on the ‘create citation report’ button, then select the ‘save to text file’ option and specify the full range of records. This will generate a text file and download it to your computer.
You may want to edit the AUTHOR field on the text file, although this is not strictly necessary for doing the analysis.

One example of bibliometric information that can be downloaded from ISI-WoK can be found in the file sbegueria.txt, which contains data of papers I have authored, as of January 2015.

The core function for reading ISI-WoK data is, not surprisingly, read.isiwok. Its only argument is the name of the data file to read:

bib <- read.isiwok('sbegueria.txt')
str(bib)

The result is a list with three elements: author, reference, and pubs. The latter is a data.frame with the publications in rows, and the data referring to each publications in columns, including the number of citations received, year by year.

Analyzing bibliometric data

The main function for analyizing these data is bibliometric. It takes an object with the bibliometric data resulting from a call to read.isiwok and returns a data.frame with a number of bibliometric indices.

bibliometric(bib)
##                  name  ini years pubs lead pubs_year hin hin_year gin
## 1 BEGUERIA,  SANTIAGO 2000    15   80   16      5.33  28     1.87  48
##   gin_year cit_tot cit_year cit_art ifact2 ifact5 i10 i25 i50 cit_max
## 1      3.2    2587   172.47   32.34   5.23   9.26  59  37  13     376
##   pubs09 pubs09_lead pubs099 iscore iscore_lead
## 1     32           8       8   5631        1089

These are:

Label Meaning
name Name of the author, group, department, etc.
ini Initial year of the publication record
span Time span (years) of the publication record
pubs Total number of publications
lead Total number of publications, as the lead (first) author
pubs_year Mean number of publications per year
hin Hirsch’s h-index
hin_year h-index per year
gin Egghe’s g-index
gin_year g-index per year
cit_tot Total number of citations
cit_year Mean number of citations per year
cit_art Mean number of citations per article
ifact2 Impact factor, computed over the last two years
ifact5 Impact factor, computed over the last five years
i10 Number of publications with 10 or more citations
i25 Number of publications with 25 or more citations
i50 Number of publications with 50 or more citations
cit_max Number of citations of the most citated publications
pubs09 Number of publications over the 90th percentile in its discipline
pubs09_lead Number of publications over the 90th percentile in its discipline, as lead author
pubs099 Number of publications over the 99th percentile in its discipline
iscore i-score
iscore_lead i-score, as lead author

There are functions for computing some of the indices, such as the Hirsch and the Egghe indices. These can be computed for the whole period analyized, or up to a given year.

hirsch(bib)
## [1] 28
egghe(bib)
## [1] 48
hirsch(bib, 2010)
## [1] 13
egghe(bib, 2010)
## [1] 44

There is also a function for ranking the publications in quantiles according to the ISI-WoK Science Indicators tables. These need to be loaded as a table named quantiles.csv, located in the same directory as the data.

rank(bib)
table(rank(bib))

A specific plotting function makes it easy to resume most of this information in graphic form. The plots also inform about the temporal evolution of the bibliometric indices, which may be useful for evaluating the scientific career of the evaluated.

biblioplot(bib)

unnamed-chunk-6-1

The first plot reflects the productivity (quantity) of the author, as well as its impact (citations received). It shows the cumulative number of publications, with distinction between the publications as lead author (black bars) and those as co-author (white bars). The plot also showcases the number of citations for all the publications (white circles) and for those as lead author (black circles). There is a fixed ratio of 1/10 between the left (publications) and the right (citations) axis, allowing for easy comparison between different authors, groups, etc.

The second plot focuses on the impact of the publications. It shows the annual evolution of the Hirsch’s h-index (black circles) and the Egghe’s g-index (black circles), with a fixed ratio of ½ between them. Evolution of the h-index is compared with an 1:1 evolution (dashed line), since it is usually assumed that the h-index grows, as an average, at a rate of 1 per year.

The third plot attempts at evaluating the excelence of the publications. It shows the number of publications classified by quantiles, according to the ISI-WoK Scientific Indicators per discipline. For each quantile, the total number of publications is shown (white bars), as well as the publications as lead author (black bars).

#format_pub(bib$pubs[1], au=bib$au)
apply(cbind(bib$pubs,rank(bib$pubs)),1,format_pub,au=bib$au)

Automated bibliometric reports

The package also contains a template Rtex file, useful for creating automated reports.

require(knitr)

infile <- 'sbegueria.txt'
outfile <- 'sbegueria.Rtex'

# Create custom .Rtex file from the template and knit it
x <- readLines('template.Rtex')
x <- gsub('FILENAME',infile,x)
write(x,outfile)
knit(outfile)

# Compile the resulting .tex file and create a .pdf from it
system(paste('/Library/TeX/texbin/pdflatex ',
    gsub('.txt','',f),'.tex',sep=''))

# Remove all intermediate files
kk <- list.files('.',paste(gsub('.txt','',f)))
file.remove(kk[-c(grep('.pdf',kk),grep('.txt',kk))])

(I have generated this post automatically with R Markdown and the RWordPress package. I’m not totally satisfied with the rendering, I need to work on it a bit.)

No Comments

Leave a Reply

Your email is never shared.Required fields are marked *