NADCdb: a database for predicting the occurrence and progression of NADC based on high-throughput sequencing data.

How to cited: NADCdb: A Joint Transcriptomic Database for Non-AIDS-Defining Cancer Research in HIV-Positive Individuals. Int. J. Mol. Sci. 2026, 27(3), 1169; https://doi.org/10.3390/ijms27031169

Critical Note: The results of this website constitute preliminary, exploratory research. They should under no circumstances inform direct clinical decision-making. Their validity and application await further rigorous assessment through additional independent validation studies.

Frequently asked questions (FAQs)

1. What is the NADCdb?

NADCdb is a novel database that fully integrates key biomarkers and underlying mechanisms for 23 NADC types by jointly mining thousands of RNA-seq and microarray data from PLWH and cancer patients.
NADCdb encompasses three core models: "rNADC", "dNADC", and "iPredict". Given that three main theories including immunosuppression, chronic inflammation and clinical biomarker application, we systematically identified a set of pivotal factors with specific upregulated expression trends in NADC. Through the deciphering of their normal expression profiles, we developed a risk assessment model for NADC in PLWH, termed "rNADC". Importantly, we concomitantly crafted an interactive "rNADC" tool that enable users to upload transcriptome data from PLWH for the risk assessment of NADC.
To further explore the potential diagnostic biomarkers by which HIV may promote the onset of NADC, we respectively identified key gene features by screening dysregulated genes shared between PLWH and cancer patients, along with HIV differential genes in the upstream pathways in 16 NADCs. These features were then integrated into the Random Forest (RF) and Conditional Inference Tree (CIT) algorithms to construct the "dNADC". For 11 of the 16 cancer types, the accuracy rate was greater than 75%. Among them, the diagnostic model of KICH and UCEC reached more than 90% accuracy.
Significantly, considering that the HIV primarily targets the human immune system, leading to a decrease in CD4 and an increase in CD8 during chronic stage, we categorized PLWH and cancer patients into four groups based on different immune statuses. For different groups, we pinpointed potential immune biomarkers with concordant dysregulated patterns in PLWH and cancer patients and ultimately obtained 1,905 markers across 16 NADC types to the "iPredict".
Subsequently, we provided detailed annotations for the aforementioned key factors, including the annotations of basic genome, ontology, function, location, phenotype, and disease, etc. Furthermore, for key biomarkers identified from different models, we performed functional enrichment analyses, constructed their PPI interaction networks and TF-miRNA regulatory networks, and investigated potentially effective compound molecules via CMap analysis, providing multi-dimensional and reliable data for elucidate the development and treatment of NADC.
Additionally, NADCdb also offers two modules: "Cancer" and "Gene", allowing users to query cancers or genes of interest and directly retrieve detailed regulatory relationships between key factors and NADC.
Finally, NADCdb provides "FAQs" webpage, elucidating the content and operational methodologies of NADCdb in detail.
Overall, NADCdb is the first public platform designed to describe NADC. It offers multiple user-friendly webpages and graphics to integrate and analyze key biomarkers implicated in NADC development and their intrinsic biological regulatory networks, providing novel insights and impetus for the mechanisms, diagnosis and treatment of NADC.

2. What is NADCdb used for?

NADCdb aims to use transcriptome data from HIV and cancer patients to present the key factor map of NADC occurrence and development from multiple perspectives such as immunosuppression, viral interaction, and upstream and downstream relationships of biological pathways. In addition, an online prediction tool is available to assess the risk of NADC in HIV patients.

3. What are ART and nonART?

It is well known that, to date, it is nearly impossible to completely eradicate HIV from the bodies of infected individuals. To maximize the survival time of these patients, highly effective antiretroviral therapy (ART) has been developed. The main goal of ART is to suppress HIV replication, reduce viral load, delay disease progression, and restore immune system function. ART typically consists of a combination of drugs with different mechanisms of action. In NADCdb, "ART" refers to HIV-infected individuals who have received ART, while "nonART" refers to those who have not received ART treatment. Generally, the immune system function of ART patients is closer to that of the normal population, but they still carry the risk of developing NADCs.

4. How to define the potential key genes associated to the pathogenesis of NADC?

In NADCdb, there are three main analysis modules: rNADC (Risk Assessment Model), dNADC (Diagnosis Model), and iPredict (Immunobiomarker Prediction). The key genes in each module are defined as follows:

Risk Assessment Model (rNADC): Based on three main theories of immunosuppression, chronic inflammation, and clinical biomarkers, the module identifies genes that change in the same trend in HIV and cancer, and their expression patterns can distinguish between NADC and regular HIV-infected people.
Diagnosis Model (dNADC): This module identifies genes that are abnormally expressed in both HIV and cancer, or are regulated by these abnormally expressed genes in cancer-related pathways. These genes are selected through machine learning algorithms and their expression patterns can distinguish between NADC and regular HIV-infected individuals. They also reflect the interconnected changes occurring in HIV infection and cancer development.
Immunobiomarker Prediction (iPredict): The genes in this module are derived from HIV-infected individuals or cancer patients in specific immune states, showing similar patterns of dysregulated expression in both conditions. These genes represent specific gene expression profiles for HIV-infected individuals or cancer patients under certain immune states. Given their potential immune relevance, these genes can serve as potential immune biomarkers.

5. What are the detailed annotations of key genes that are closely related to NADC?

The NADCdb provides a comprehensive annotation of potential key genes associated with NADC, aiming to reveal the underlying functional mechanisms that influence the occurrence and development of NADC post-HIV infection. These detailed annotation include:

Basic genomic annotations
- Symbol: HUGO gene symbol.
- Synonyms: aliases of the gene.
- Description: short description.
- Summary: A paragraph describing the function of gene.
- Protein functions (chatGPT): A paragraph describing the function of gene by chatGPT.
Ontology annotations
- Canonical Pathways: Canonical Pathway from MSigDB.
- KEGG Pathway: KEGG pathway.
- Hallmark Gene Sets & BioCarta Gene Sets: annotation from Hallmark Gene Sets and BioCarta Gene Sets.
Function/Location
- Biological Process, Cellular Component, Molecular Function: BP, MF, CC annotation from GO.
- KinaseClass: Detailed kinase classes.
- Protein Function: Protein Function from Protein Atlas.
- InterPro: InterPro Protein Family.
- Subcellular Location, Plasma: Subcellular Location/Whether the protein is a Plasma protein according to Protein Atlas.
- Transmembrane (UniProt), Secreted (UniProt): Whether the protein is transmembrane or secreted according to UniProt.
Phenotype and disease annotations
- PathogenicLoF (ClinVar): Pathogenic clinical indication due to loss-of-function genetic variations provided by NCBI ClinVar database.
- dbGap: Gene information from the database of Genotypes and Phenotypes (dbGaP).
- GWAS: Genome-wide association study (NHGRI).
- Human Phenotype Ontology: HPO human phenotype associated with the gene.
- Drug (DrugBank): Drug information of the gene as a target according to DrugBank.
- Disease and Gene Associations (DisGeNET): Human gene-disease associations of the gene according to DisGenNET database.
- Disease Ontology (DO): Disease ontology information of the gene.
- Disease Drugs (ChatGPT): The related diseases and drug information of the gene generated by ChatGPT.
Tissue expression
- Tissue Specificity (TiGER): Specific tissues the gene is expressed in, according to TiGER.
- RNA Tissue Category, Protein Expression Normal, Protein Expression Cancer: RNA Tissue Category, Protein Expression Normal and Protein Expression Cancer according to Protein Atlas.

The annotations in the nadcdb are meticulously curated to elucidate the complex interplay between transcriptome patterns and the pathogenesis of NADC in the context of HIV infection, offering a rich resource for researchers in the field of NADC.

6. What other associated analyses are included in NADCdb?

In NADCdb, genes associated with different NADCs are included in each module. To understand the potential biological significance of these genes:

PPI (protein-protein interaction) analysis was performed to determine their interactions.
Enrichment analysis reveals potential biological functions of NADC-associated genes.
WGCNA (Weighted Gene Co-Expression Network Analysis) was performed to illustrate the distribution of the contained genes in different cancer co-expression modules and their correlation with each module and cancer phenotype.
Using the CMap database, compounds that can reverse upregulated gene expression associated with different NADCs in cancer were identified. These compounds were identified based on commonalities between HIV infection and cancer development, providing valuable insights into future HIV and NADC research.
Classification based on gene family and subcellular localization, and the regulatory network of TF-miRNA and NADC-related genes was constructed, which provided important clues for the occurrence and pathogenesis of NADC at the system level.

7. How to map cancer abbreviations in NADCdb to specific cancers?

Cancer abbreviations in NADCdb and the descriptions are identical to those in TCGA. The relationship is shown below.

Abbreviation	Description
ACC	Adrenocortical Carcinoma
BLCA	Bladder Urothelial Carcinoma
BRCA	Breast Invasive Carcinoma
CESC	Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma
COAD	Colon Adenocarcinoma
DLBC	Diffuse Large B-cell Lymphoma
ESCA	Esophageal Carcinoma
HNSC	Head and Neck Squamous Cell Carcinoma
KICH	Kidney Chromophobe
KIRC	Kidney Renal Clear Cell Carcinoma
KIRP	Kidney Renal Papillary Cell Carcinoma
LIHC	Liver Hepatocellular Carcinoma
LUAD	Lung Adenocarcinoma
LUSC	Lung Squamous Cell Carcinoma
OV	Ovarian Serous Cystadenocarcinoma
PRAD	Prostate Adenocarcinoma
READ	Rectum Adenocarcinoma
SKCM	Skin Cutaneous Melanoma
STAD	Stomach Adenocarcinoma
TGCT	Testicular Germ Cell Tumors
THCA	Thyroid Carcinoma
UCEC	Uterine Corpus Endometrial Carcinoma
UCS	Uterine Carcinosarcoma

8. Related tools and software used in this database.

fastqc v0.12.1
multiqc v1.17
cutadapt v4.5
hisat2 v2.1.0
subread v2.0.6
R v4.3.1
arrayQualityMetrics v3.56.0
caret v6.0-94
clusterProfiler v4.8.3
ggplot2 v3.4.3
jsonlite v1.8.7
limma v3.56.2
lumi v2.52.0
oligo v1.64.1
org.Hs.eg.db v3.5.0
party v1.3-15
pROC v1.18.5
randomForest v4.7-1.1
Rcy3 v2.20.2
STRINGdb v2.12.1
sva v3.48.0
cytoscape v3.10.2