Frequently asked questions (FAQs) 

1. What is the NADCdb? 

    NADCdb is a novel database designed to predict the occurrence and development of non-AIDS-defining cancers (NADC) based on large-scale high-throughput sequencing data and gene chip data. Studies have shown that individuals infected with HIV may develop certain cancers defined as NADC. These cancers are more likely to occur in HIV-infected individuals but are not diagnostic criteria for the progression to AIDS. To uncover the potential relationship between HIV infection and the development of NADC, NADCdb employs machine learning algorithms such as CIT, RF, and LASSO to construct a series of models, termed dNADCs, which predict key genes associated with various NADCs based on large-scale transcriptome data from HIV and cancer patients. The performance of different dNADCs is evaluated using the area under the ROC curve (AUC), systematically identifying genes that may play crucial roles in the occurrence and progression of different NADCs. Additionally, through PPI network analysis and the cMAP database, NADCdb reveals the potential functions and mechanisms of key genes in the context of NADC development. To identify more suitable risk assessment genes for HIV-infected individuals, NADCdb selects a series of NADC risk assessment genes based on two approaches: cancer induced by immune and inflammatory responses, and common pathways between HIV infection and cancer development. Using transcriptome data, NADCdb constructs a risk assessment model for NADCs specific to HIV-infected individuals. Furthermore, NADCdb selects HIV and cancer samples with different immune characteristics, identifying immune-related biomarkers with similar features in various immune dysregulation states. Enrichment analysis, PPI network analysis, and cMAP database analysis further reveal the potential immune functions and mechanisms of these biomarkers in the context of NADC development in HIV-infected individuals. Finally, NADCdb integrates dNADCs, rNADCs, and immune-related biomarkers to develop a tool for ranking the contribution of dysregulated genes to NADC development in HIV-infected individuals. Using transcriptome data from HIV-infected patients, this tool provides a risk score for developing NADC and generates clinically meaningful reference reports.

2. What is NADCdb used for? 

    NADCdb aims to leverage transcriptome data from HIV and cancer patients to present diagnostic biomarkers for various NADCs in different formats and from multiple perspectives. It provides an online platform for clinically applicable risk assessment models for PLWH (People Living With HIV) developing NADCs and identifies potential immune biomarkers related to NADC.

3. What are ART and nonART? 

    It is well known that, to date, it is nearly impossible to completely eradicate HIV from the bodies of infected individuals. To maximize the survival time of these patients, highly effective antiretroviral therapy (ART) has been developed. The main goal of ART is to suppress HIV replication, reduce viral load, delay disease progression, and restore immune system function. ART typically consists of a combination of drugs with different mechanisms of action. In NADCdb, "ART" refers to HIV-infected individuals who have received ART, while "nonART" refers to those who have not received ART treatment. Generally, the immune system function of ART patients is closer to that of the normal population, but they still carry the risk of developing NADCs.

4. How to define the potential key genes associated to the pathogenesis of NADC?  

    In NADCdb, there are three main analysis modules: rNADC (Risk Assessment Model), dNADC (Diagnosis Model), and iPredict (Immunobiomarker Prediction). The key genes in each module are defined as follows:

  1. Risk Assessment Model (rNADC): Based on three main theories of immunosuppression, chronic inflammation, and carcinogenic pathways, the module identifies genes that change in the same trend in HIV and cancer, and their expression patterns can distinguish between NADC and regular HIV-infected people.
  2. Diagnosis Model (dNADC): This module identifies genes that are abnormally expressed in both HIV and cancer, or are regulated by these abnormally expressed genes in cancer-related pathways. These genes are selected through machine learning algorithms and their expression patterns can distinguish between NADC and regular HIV-infected individuals. They also reflect the interconnected changes occurring in HIV infection and cancer development.
  3. Immunobiomarker Prediction (iPredict): The genes in this module are derived from HIV-infected individuals or cancer patients in specific immune states, showing similar patterns of dysregulated expression in both conditions. These genes represent specific gene expression profiles for HIV-infected individuals or cancer patients under certain immune states. Given their potential immune relevance, these genes can serve as potential immune biomarkers.

5. What are the detailed annotations of key genes that are closely related to NADC?  

    The NADCdb provides a comprehensive annotation of potential key genes associated with NADC, aiming to reveal the underlying functional mechanisms that influence the occurrence and development of NADC post-HIV infection. These detailed annotation include:

  1. Basic genomic annotations
    • Symbol: HUGO gene symbol.
    • Synonyms: aliases of the gene.
    • Description: short description.
    • Summary: A paragraph describing the function of gene.
    • Protein functions (chatGPT): A paragraph describing the function of gene by chatGPT.
  2. Ontology annotations
    • Canonical Pathways: Canonical Pathway from MSigDB.
    • KEGG Pathway: KEGG pathway.
    • Hallmark Gene Sets & BioCarta Gene Sets: annotation from Hallmark Gene Sets and BioCarta Gene Sets.
  3. Function/Location
    • Biological Process, Cellular Component, Molecular Function: BP, MF, CC annotation from GO.
    • KinaseClass: Detailed kinase classes.
    • Protein Function: Protein Function from Protein Atlas.
    • InterPro: InterPro Protein Family.
    • Subcellular Location, Plasma: Subcellular Location/Whether the protein is a Plasma protein according to Protein Atlas.
    • Transmembrane (UniProt), Secreted (UniProt): Whether the protein is transmembrane or secreted according to UniProt.
  4. Phenotype and disease annotations
    • PathogenicLoF (ClinVar): Pathogenic clinical indication due to loss-of-function genetic variations provided by NCBI ClinVar database.
    • dbGap: Gene information from the database of Genotypes and Phenotypes (dbGaP).
    • GWAS: Genome-wide association study (NHGRI).
    • Human Phenotype Ontology: HPO human phenotype associated with the gene.
    • Drug (DrugBank): Drug information of the gene as a target according to DrugBank.
    • Disease and Gene Associations (DisGeNET): Human gene-disease associations of the gene according to DisGenNET database.
    • Disease Ontology (DO): Disease ontology information of the gene.
    • Disease Drugs (ChatGPT): The related diseases and drug information of the gene generated by ChatGPT.
  5. Tissue expression
    • Tissue Specificity (TiGER): Specific tissues the gene is expressed in, according to TiGER.
    • RNA Tissue Category, Protein Expression Normal, Protein Expression Cancer: RNA Tissue Category, Protein Expression Normal and Protein Expression Cancer according to Protein Atlas.
    The annotations in the nadcdb are meticulously curated to elucidate the complex interplay between transcriptome patterns and the pathogenesis of NADC in the context of HIV infection, offering a rich resource for researchers in the field of NADC.

6. What other associated analyses are included in NADCdb?  

    In NADCdb, genes associated with different NADCs are included in each module. To understand the potential biological significance of these genes, we conducted PPI (protein-protein interaction) analysis to identify their interactions. Additionally, various databases were used for enrichment analysis, which revealed the potential biological functions of NADC-related genes.Furthermore, NADCdb performed WGCNA (Weighted Gene Co-expression Network Analysis) on cancer transcriptome data to illustrate the distribution of included genes in different cancer co-expression modules and their correlation with each module and cancer phenotypes.Lastly, using the cMAP database, we identified compounds that can reverse the expression of upregulated genes related to different NADCs in cancer. These compounds were identified based on the commonalities between HIV infection and cancer development, providing valuable insights for future research on HIV and NADC.

7. How to use NADCdb?  

    Users can explore genes potentially related to different NADCs in HIV patients, whether they have received antiretroviral therapy (ART) or not, in the dNADC module. NADCdb provides annotations for these genes, interaction network information, and small molecule drug information with potential anti-tumor activity suitable for the HIV-infected population based on the cMAP database.In the rNADC module, users can explore genes for evaluating the risk of NADCs in HIV-infected individuals from three different perspectives: genes related to immune and inflammation-induced cancer, genes involved in cancer development linked to HIV infection, and genes in common pathways of HIV infection and cancer. This module provides scores for the contribution of different genes to the development of various NADCs.In the iPredict module, users can identify common immune features between HIV-infected individuals and cancer patients with different immune characteristics. NADCdb offers valuable immune biomarkers, their annotations, PPI interaction networks, and potential immunotherapy-related drugs based on the cMAP database.     Besides these three modules, users can query information on different NADCs and genes included in NADCdb. They can access expression profiles of NADC-related genes for each type of NADC. NADCdb also provides information on the genes' distribution in different co-expression pathways through WGCNA. Additionally, users can look up expression and annotation information for genes of interest and their expression profiles in various cancers with specific immune characteristics. In the Tool module, users can input TPM values of key genes to assess the risk of NADC occurrence in PLWH.


© 2024. Institutes if Life & Health Engineering, Jinan University, China.