Welcome to CoffeeProt!
For the integration of QTL and Protein/transcript data.
CoffeeProt: An online tool for correlation and functional enrichment of proteome-wide systems genetics
The integration of genomics, proteomics and phenotypic traits across genetically diverse populations is a powerful approach to discover novel biological regulators. The increasing volume of complex data require new and easy-to-use tools accessible to a variety of scientists for the discovery and visualization of functionally relevant associations. To meet this requirement, we developed CoffeeProt, an open-source tool that analyzes genetic variants associated to protein networks and phenotypic traits. CoffeeProt uses proteomics data to perform correlation network analysis and annotates protein-protein interactions, subcellular localizations and drug associations. It then integrates genetic and phenotypic associations along with variant effect predictions. We demonstrate its utility with the analysis of mouse and human population data enabling the rapid identification of genetic variants associated with druggable proteins and clinical traits. We expect that CoffeeProt will serve the proteomics and systems genetics communities, leading to the discovery of novel biologically relevant associations. CoffeeProt is available at www.coffeeprot.com.
Table of contents
- CoffeeProt workflow
- Data preparation OR using demo data
- Data upload and processing
- Proteomics/expression data
- pQTL/eQTL data
- GWAS/molQTL data
- Analysis
- Correlation analysis
- Database enrichment
- SNP-protein analysis
- Network
- Bait network
- Data exporting
- Tables
- Static plots
- Interactive plots
- App status
- Browser compatibility
- Contact
- Citation
- Acknowledgements
CoffeeProt workflow
Figure. CoffeeProt workflow. The CoffeeProt workflow starts with users accessing the CoffeeProt web user interface at www.CoffeeProt.com to upload datafiles and specify analysis parameters (1). The user interface and server backend running R are connected using the Shiny R package (2). Feature annotations are performed based on local databases included in CoffeeProt as well as remotely hosted databases on the Nectar Research Cloud (3). User data is analyzed to perform summary statistics (I), correlation (II), interaction (III) and network (IV) analyses (4). The results are displayed in the web interface for result interpretation by the user (5). Finally, individual tables and plots are exported (6).
Data preparation OR using demo data
The easiest way to get started using CoffeeProt is by using the demo dataset included with the app. Simply click on the blue Load demo data
button on the right side of the Welcome page
. This option will directly load the proteomics, pQTL and lipidQTL data from the Parker study. To continue, navigate to the Protein/transcript data
, pQTL/eQTL data
and GWAS/molQTL
tabs to further process the data. When using demo data, you can skip the Choose file
steps, but should still perform filtering, annotation and correlation.
Alternatively, analyses in CoffeeProt can be performed on user-uploaded data. This data needs to be converted to the data formats as shown on the CoffeeProt Welcome page. Proteomics (or expression) data requires an identifiers in the first column, and quantitative measurements in all other columns. The preferred identifiers are Gene Names but ENSEMBL genes or UniProt IDs are also accepted (but are internally converted to Gene Names). (Optional) Users can also prepare custom identifier-annotation mapping files (identifier in the first column, annotations in the second) to perform analyses using annotations that are currently not present in CoffeeProt. pQTL/eQTL data files require the columns with information related to the SNP, the affected protein/transcript and a measure of the association. The GWAS/molQTL format is similar to the pQTL/eQTL files, but only needs the following 6 columns: rsID, phenotype, SNP location, SNP chromosome, p-value and grouping. It is recommended to follow the example datasets on the CoffeeProt Welcome page.
Data upload and processing
Proteomics/expression data
- Click “Choose file”
- Select your file (in .csv, .txt or excel format) [1][2]
- Use the slider to select a missing value cut-off
- Click “Process Proteins!”
- Perform correlation by selecting a correlation method and p-value adjustment method
- Click “Correlate!”
[1] Maximum file size of 900 Mb
[2] Any column after the first column should only contain numeric data, blank cells or missing values (NA, Na, na, NaN, NAN)
pQTL/eQTL data
- Click “Choose file”
- Select your file (in .csv, .txt or excel format)
- Select the filter type and significance data type
- Use the slider to apply the filters
- Select a species for variant effect annotation (optional)
- Click “Process pQTLs!”
GWAS/molQTL data
- Click “Choose file”
- Select your file (in .csv, .txt or excel format)
- Select the filter type and significance data type
- Use the slider to apply the filters
- Click “Process molQTLs!”
Analysis
To start analyzing your data, click on to the Analysis tab
which will reveal subtabs for the separate analyses. The analyses are divided into Correlation analysis
, Database enrichment
, SNP-protein analysis
, Network analysis
and Bait network analysis
. Opening any of these tabs will show a new page with some information regarding the analysis, and a checklist indicating whether the required data for the analysis has been uploaded/processed. If all required data is present, new input options will appear allowing the analysis to be performed.
Correlation analysis
This tab displays a summary of the protein-protein correlation analysis. Prior to producing the plots, co-regulation is defined by the user by setting correlation coefficient and q-value cut-offs. The histograms visualize the number of protein-protein interactions that meet these criteria. For each protein, the number of co-regulation partners is determined based on the user-specified criteria.
Database enrichment
Analyses are performed after annotating co-regulated protein pairs to determine the extend of overlapping annotations. Protein-protein interaction databases (STRING, CORUM & BioPlex 3.0) are searched to identify previously discovered protein pairs. It is expected that a larger percentage of co-regulated protein pairs is found in these databases, compared to the non co-regulated pairs. It is recommended to adjust the co-regulation criteria if no enrichment is detected.
SNP-protein analysis
The SNP-Protein plot summarizes the interactions in the uploaded data by combining several visualizations. A Manhattan plot (top) highlights the QTL p-values per chromosome. Edges are drawn (center) connecting QTL and protein data, where edge color indicates the QTL type. Protein-Protein interactions are shown using arc-diagrams, proteins are ordered by complexsize and number of connections. The user can alter the plots by selecting a single chromosome or proteincomplex of interest.
Network analysis
Network plots are used to visualize interactions between co-regulated proteins in interactive plots. The user can produce networks for 1) All protein interactions, 2) all protein interactions involved in QTLs, 3) protein interactions in the CORUM database or 4) protein interactions in the BioPlex 3.0 database. If QTLs have been uploaded they can be added directly to the network plots. Finally, the nodes and edges in the interactive plot can be colored by nodetypes (protein / SNP) and the user-uploaded proxies or annotations. The interactive plot allows zooming in on, moving and highlighting sections of the network.
Bait network analysis
Bait network plots are used to visualize interactions between co-regulated proteins in interactive plots. The bait refers to a single, or list of, proteins or phenotypes of interest.The nodes and edges in the interactive plot can be colored by nodetypes (protein / SNP) and the user-uploaded proxies. The interactive plot allows zooming in on, moving and highlighting sections of the network.
Data exporting
To export all tables or plots as a compressed (Zipped) folder, click the “Export all tables” or “Export all plots” buttons. The table folder contains the plots generated based on the user data, such as the annotated proteomics dataset, correlation results and QTL tally tables. The plot folder contains all static plots created in the data upload tabs, the correlation, database enrichment and QTL-protein analyses. Alternatively, individual plots can be exported in various dimensions or file formats.
App status
CoffeeProt is currently in active development. The following new features will be added in the near future:
- Custom protein/transcript annotation support, allowing users to upload their own annotations
- New plots utilizing the correlation table share_loc and overlap_loc columns
- Option to open (bait)networks in new browser tabs
Browser compatibility
OS | version | Chrome | Firefox | Microsoft Edge | Safari |
---|---|---|---|---|---|
Linux | Ubuntu 20.04.1 LTS | 87.0.4280.88 | 78.0.1 | n/a | n/a |
MacOS | 10.13.6 | 87.0.4280.67 | 83.0 | n/a | 13.1.2 |
Windows | 10 | 87.0.4280.88 | 83.0 | 87.0.664.55 | n/a |
Contact
Email: support@coffeeprot.com
Citation
Acknowledgements
This research was supported by use of the Nectar Research Cloud and by the University of Melbourne Research Platform Services. The Nectar Research Cloud is a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy. This work was funded by an Australian National Health and Medical Research Council Ideas Grant (APP1184363) and The University of Melbourne Driving Research Momentum program.
Human Protein Atlas subcellular localization data was obtained from http://www.proteinatlas.org and has previously been described in Thul PJ et al., A subcellular map of the human proteome. Science. (2017).
Drug-gene interaction data was obtained from DGIdb (https://www.dgidb.org/downloads).
Protein complex or protein interaction data were retrieved from the CORUM (http://mips.helmholtz-muenchen.de/corum/#download), BioPlex 3.0 (https://bioplex.hms.harvard.edu/interactions.php) and STRINGdb (https://string-db.org/cgi/download) databases.
The DNA vector image used in the CoffeeProt banner on the Welcome page was obtained from Vecteezy (Human dna design Vectors by Vecteezy)
Demo datasets
OR
Download demo data
Upload protein/transcript data
Please start by preparing protein/transcript data files as described on the Welcome page
. Your dataset should contain identifiers in the first column, which could either be gene names, UniProt IDs or ENSEMBL IDs. All other columns should contain numeric data, corresponding to the protein/transcript abundance per sample. CoffeeProt will convert all IDs to gene names after the data is uploaded.
Upload SNPs associated to a protein/transcript
Please start by preparing pQTL/eQTL data files as described on the Welcome page
. pQTL/eQTL data files require the columns with information related to the SNP, the affected protein/transcript and a measure of the association. An optional haplotype / LD block file can be uploaded to annotate SNPs that are in linkage disequilibrium, which can be used in the network plots to summarize many SNP nodes into fewer LD nodes. Human or mouse SNPs (<500.000) can be annotated with variant effects and variant impacts.
GWAS Catalog publicly available datasets
Upload SNPs associated to a phenotype or molecular trait
Please start by preparing molQTL/GWAS data files as described on the Welcome page
. The GWAS/molQTL format is similar to the pQTL/eQTL files, but only needs the following 6 columns: rsID, phenotype, SNP location, SNP chromosome, p-value and grouping.
Instructions
This tab displays a summary of the protein-protein correlation analysis. Prior to producing the plots, co-regulation is defined by the user by setting correlation coefficient and q-value cut-offs. For the correlation coefficient, a range with a minimum and maximum value can be selected, where any value outside of this range is considered as correlated. The histograms visualize the number of protein-protein interactions that meet these criteria. For each protein, the number of co-regulation partners is determined based on the user-specified criteria.
Instructions
Analyses are performed after annotating co-regulated protein pairs to determine the extend of overlapping annotations. Protein-protein interaction databases (STRING, CORUM & BioPlex 3.0) are searched to identify previously discovered protein pairs. It is expected that a larger percentage of co-regulated protein pairs is found in these databases, compared to the non co-regulated pairs. It is recommended to adjust the co-regulation criteria if no enrichment is detected.
Instructions
Network plots are used to visualize interactions between co-regulated proteins in interactive plots. The user can produce networks for 1) All protein interactions, 2) all protein interactions involved in QTLs, 3) protein interactions in the CORUM database or 4) protein interactions in the BioPlex 3.0 database. If QTLs have been uploaded they can be added directly to the network plots. Finally, the nodes and edges in the interactive plot can be colored by nodetypes (protein / SNP) and the user-uploaded proxies or annotations. The interactive plot allows zooming in on, moving and highlighting sections of the network. Correlated protein pairs outside of the range (minimum and maximum value) are included in the network. To only visualize positively correlated pairs, set the lower correlation cut-off to -1.
Downloading network plots: Network plots can be downloaded as interactive HTML files using the download button next to the plot button. These HTML files can be opened in most browsers and saved to a PDF file using the save to PDF option (control + p). On some browsers the network plots can be directly saved to a SVG or PDF file (Right click network plot > save as ...)
Plot parameters (optional)
Instructions
Bait network plots are used to visualize interactions between co-regulated proteins in interactive plots. The bait refers to a single, or list of, proteins or phenotypes of interest.The nodes and edges in the interactive plot can be colored by nodetypes (protein / SNP) and the user-uploaded proxies. The interactive plot allows zooming in on, moving and highlighting sections of the network.
Downloading network plots: Network plots can be downloaded as interactive HTML files using the download button next to the plot button. These HTML files can be opened in most browsers and saved to a PDF file using the save to PDF option (control + p). On some browsers the network plots can be directly saved to a SVG or PDF file (Right click network plot > save as ...)
Plot parameters (optional)
Instructions
The SNP-Protein plot summarizes the interactions in the uploaded data by combining several visualizations. A Manhattan plot (top) highlights the QTL p-values per chromosome. Edges are drawn (center) connecting QTL and protein data, where edge color indicates the QTL type. Protein-Protein interactions are shown using arc-diagrams, proteins are ordered by complexsize and number of connections. The user can alter the plots by selecting a single chromosome or proteincomplex of interest.
Export all plots & tables
Use this tab to download all plots or all tables compressed in a Zip folder. The plots are exported at the default square ratio and in .svg file format. To export plots in different dimensions or file types, use the individual plot export option.
Download all plots (Zip) Download all tables (Zip)