Computational tools

Computational tools

CMC-GO-Term Activity: an R package for GO term activity transformation model
Single-cell RNA sequencing (scRNA-seq) technologies allow for the exploration of cellular and tissue heterogeneity. However, scRNA-seq data are highly noisy and suffer from dropout effects, which limit the power of these heterogeneity analyses. To overcome these challenges, Guoqiang Yu and Yue Wang’s groups developed a GO-term activity transformation model to transform scRNA-seq datasets into a GO-term activity score matrix. Since the GO-term activity score accounts for the expression of multiple functionally related genes, this score is more robust against noise and dropout effects.

Download and installation instructions: https://github.com/yu-lab-vt/CMC/tree/CMC-GOTermActivity
Cheng Z, Wei S, Wang Y, Wang Z, Lu R, Wang Y, Yu G. An Efficient and Principled Model to Jointly Learn the Agnostic and Multifactorial Effect in Large-Scale Biological Data. bioRxiv 2024.04.12.589306; doi: https://doi.org/10.1101/2024.04.12.589306


CMC Model to explore multifactorial effect in large-scale data
The rich information contained in biological data is often distorted by multiple interacting intrinsic or extrinsic factors. Modeling the effects of these factors is necessary to uncover the underlying true signals. However, this is challenging because no reliable prior knowledge is available on how these factors exert their effects, the extent of their impact, and how these factors interact with each other. Guoqiang Yu’s group has developed a new model, the Conditional Multifactorial Contingency (CMC), to overcome these challenges and jointly learn the multifactorial effects in large-scale data.

Download and installation instructions: https://github.com/yu-lab-vt/CMC
Cheng Z, Wei S, Wang Y, Wang Y, Lu R, Wang Y, Guoqiang Yu G. “An Efficient and Principled Model to Jointly Learn the Agnostic and Multifactorial Effect in Large-Scale Biological Data,” bioRxiv 2024.04.12.589306; doi: https://doi.org/10.1101/2024.04.12.589306
Cheng Z,Wei S and G. Yu G. A Single-Cell-Resolution Quantitative Metric of Similarity to a Target Cell Type for scRNA-seq Data. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 2022, pp. 2824-2831, doi: 10.1109/BIBM55620.2022.9995574.


CAM3.0: an R package for fully unsupervised deconvolution of complex tissues
Complex tissues are dynamic ecosystems consisting of molecularly distinct yet interacting cell types. Computational deconvolution aims to dissect bulk tissue data into cell types and cell- specific information. However, most existing deconvolution tools exploit supervised approaches, requiring various types of references that may be unreliable or even unavailable for specific tissue microenvironments. CAM3.0 provides functions to perform fully unsupervised deconvolution on mixture expression profiles using Convex Analysis of Mixtures (CAM) and auxiliary tools to help interpret the cell type-specific results. Comparative experimental results obtained from both realistic simulations and case studies show that CAM3.0 can more accurately identify known or novel cell markers, determine cell proportions, and estimate cell-specific expressions from bulk tissue data.

Download and installation instructions: https://github.com/ChiungTingWu/CAM3?tab=readme-ov-file
Wu CT, Du D, Chen L, Dai R, Liu C, Yu G, Bhardwaj S, Parker SJ, Zhang Z, Clarke R, Herrington DM, Wang Y. CAM3.0: determining cell type composition and expression from bulk tissues with fully unsupervised deconvolution. Bioinformatics. 2024 Mar 4;40(3):btae107. doi: 10.1093/bioinformatics/btae107. PMID: 38407991; PMCID: PMC10924278.


DDN3.0: An open-source software tool to determine significant rewiring of biological network structures
Complex diseases are often caused and characterized by the misregulation of multiple biological pathways. Differential network analysis enables joint inference of common and rewired biological network structures under different conditions. DDN3.0 provides basic functions to identify a network of significantly rewired molecular players potentially responsible for phenotypic transitions.

Download and installation instructions: https://github.com/cbil-vt/DDN3
Fu Y, Lu Y, Wang Y, Zhang B, Zhang Z, Yu G, Liu C, Clarke R, Herrington DM, Wang Y. DDN3.0: determining significant rewiring of biological network structure with differential dependency networks Bioinformatics. 2024 Jun 3;40(6):btae376. doi: 10.1093/bioinformatics/btae376. PMID: 38902940; PMCID: PMC11199198.


AQuA2: A fast, accurate, and versatile data analysis platform for the quantification of molecular spatiotemporal signals
AQuA2 (Activity Quantification and Analysis 2) is a fast, accurate, and versatile data analysis platform built upon advanced machine learning techniques. AQuA2 allows for the quantification of spatiotemporal signals across biosensors, cell types, organs, animal models, and imaging modalities. Developed by Axel Nimmerjahn and Guoqiang Yu’s groups, AQuA2 is available for MATLAB and as a Fiji plugin.

Download and installation instructions: https://github.com/yu-lab-vt/AQuA2?tab=readme-ov-file
Mi X, Chen ABY, Duarte D, Carey E, Taylor CR, Braaker PN, Bright M, Almeida RG, Lim JX, Rutten VM, Zheng W, Wang M, Reitman ME, Wang Y, Poskanzer KE, Lyons DA, Nimmerjahn A, Misha Ahrens MB, Yu G. Fast, Accurate, and Versatile Data Analysis Platform for the Quantification of Molecular Spatiotemporal Signals bioRxiv 2024.05.02.592259; doi:https://doi.org/10.1101/2024.05.02.592259.


BILCO: An Efficient Algorithm for Joint Alignment of Time Series
BILCO (BIdirectional pushing with Linear Component Operations) is an efficient algorithm developed by Guoqiang Yu’s group to solve joint alignment problems of time series and min-cut for GTW graphs. BILCO has the same theoretical time complexity as the most popular methods, such as HIPR. However, it provides a significant empirical efficiency boost without sacrificing the accuracy of joint alignment. In thousands of datasets under various simulated scenarios and real application cases, BILCO is around 10 to 50 times faster and only costs 1/10 memory compared to the best peer methods.

Download and installation instructions: https://github.com/yu-lab-vt/BILCO.
Mi X, Wang M, Chen ABY, Lim JX, Wang Y, Ahrens M, Yu G. BILCO: An Efficient Algorithm for Joint Alignment of Time Series. NeurIPS 2022


Synbot: An open-source image analysis software for automated quantification of synapses
Quantifying the number of synaptic contacts from light microscopy images has traditionally been a challenging and time-consuming task, with results varying between experimenters. To overcome these limitations, Cagla Eroglu’s group at Duke University has developed SynBot, a new open-source, ImageJ-based software. SynBot addresses the technical bottlenecks of traditional synapse quantification analysis by automating several stages of the process and incorporates the machine learning algorithm ilastik, which enables accurate thresholding for synaptic puncta identification. Additionally, the software’s code is easily modifiable, allowing users to adapt it to their specific needs.

Download and installation instructions: https://github.com/Eroglu-Lab/Syn_Bot
Savage JT, Ramirez J, Christopher Risher W, Wang Y, Irala D, Eroglu C. SynBot: An open-source image analysis software for automated quantification of synapses. bioRxiv [Preprint]. 2024 Apr 4:2023.06.26.546578. doi: 10.1101/2023.06.26.546578. PMID: 37425715; PMCID: PMC10327002.