Some R Packages or R Codes Generated Through Our Research Team

R code for CAMDA 2019 Metagenomic Forensic Challenge

Composition of microbial communities can be location-specific, and the different abundance of taxon within location could help us to unravel city-specific signature and predict the sample origin locations accurately. In this study, the whole genome shotgun (WGS) metagenomics data from samples across 16 cities around the world and samples from another 8 cities were provided as the main and mystery datasets respectively as the part of the CAMDA 2019 MetaSUB “Forensic Challenge”. The feature selection, normalization, three methods of machine learning, PCoA (Principal Coordinates Analysis) and ANCOM (Analysis of composition of microbiomes) were conducted for both the main and mystery datasets.

scREhurdle: Identifying Differentially Expressed Genes with Single Cell RNA-Seq Data

scREhurdle is an R package for detecting differentially expressed genes in discrete single-cell RNA sequencing data. This package interfaces with rstan and fits a mixed effect hurdle model on zero-inflated count data.

  • Sekula, M., Gaskins, J., Datta, S. (2019) Detection of differentially expressed genes in discrete single cell RNA. Accepted in Biometrics, April 22, 2019. DOI: 10.1111/biom.13074

  • RankAggreg: Weighted Rank Aggregation of Cluster Validation Measures

    RankAggreg performs aggregation of ordered lists based on the ranks using several different algorithms: Cross-Entropy Monte Carlo algorithm, Genetic algorithm, and a brute force algorithm (for small problems).

  • Pihur, V.,Datta, S. and Datta, S. RankAggreg: an R package for weighted rank aggregation. BMC Bioinformatics, (2009).
  • Pihur, V., Datta, S. and Datta, S. Weighted rank aggregation of cluster validation measures: A Monte Carlo cross-entropy approach. Bioinformatics, 23, 1607-1615 (2007).

  • clValid: Validation of Clustering Results

    clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, “internal”, “stability”, and “biological”. The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering.

  • Brock, G.,Pihur, V., Datta, S. and Datta, S. clValid: an R package for cluster validation. Journal of Statistical Software, 25, 4 (2008).
  • Datta, S. and Datta, S. Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes, BMC Bioinformatics, 7:397 (2006).
  • Datta, S. and Datta, S. Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics, 19, 459-466 (2003).