Welcome to CoffeeProt!

For the integration of QTL and Protein/transcript data.

CoffeeProt: An online tool for correlation and functional enrichment of proteome-wide systems genetics

The integration of genomics, proteomics and phenotypic traits across genetically diverse populations is a powerful approach to discover novel biological regulators. The increasing volume of complex data require new and easy-to-use tools accessible to a variety of scientists for the discovery and visualization of functionally relevant associations. To meet this requirement, we developed CoffeeProt, an open-source tool that analyzes genetic variants associated to protein networks and phenotypic traits. CoffeeProt uses proteomics data to perform correlation network analysis and annotates protein-protein interactions, subcellular localizations and drug associations. It then integrates genetic and phenotypic associations along with variant effect predictions. We demonstrate its utility with the analysis of mouse and human population data enabling the rapid identification of genetic variants associated with druggable proteins and clinical traits. We expect that CoffeeProt will serve the proteomics and systems genetics communities, leading to the discovery of novel biologically relevant associations. CoffeeProt is available at www.coffeeprot.com.

CoffeeProt workflow
Data preparation OR using demo data
Data upload and processing
- Proteomics/expression data
- pQTL/eQTL data
- GWAS/molQTL data
Analysis
- Correlation analysis
- Database enrichment
- SNP-protein analysis
- Network
- Bait network
Data exporting
- Tables
- Static plots
- Interactive plots
App status
Browser compatibility
Contact
Citation
Acknowledgements

CoffeeProt workflow

drawing

Figure. CoffeeProt workflow. The CoffeeProt workflow starts with users accessing the CoffeeProt web user interface at www.CoffeeProt.com to upload datafiles and specify analysis parameters (1). The user interface and server backend running R are connected using the Shiny R package (2). Feature annotations are performed based on local databases included in CoffeeProt as well as remotely hosted databases on the Nectar Research Cloud (3). User data is analyzed to perform summary statistics (I), correlation (II), interaction (III) and network (IV) analyses (4). The results are displayed in the web interface for result interpretation by the user (5). Finally, individual tables and plots are exported (6).

Data preparation OR using demo data

The easiest way to get started using CoffeeProt is by using the demo dataset included with the app. Simply click on the blue Load demo data button on the right side of the Welcome page. This option will directly load the proteomics, pQTL and lipidQTL data from the Parker study. To continue, navigate to the Protein/transcript data, pQTL/eQTL data and GWAS/molQTL tabs to further process the data. When using demo data, you can skip the Choose file steps, but should still perform filtering, annotation and correlation.

Alternatively, analyses in CoffeeProt can be performed on user-uploaded data. This data needs to be converted to the data formats as shown on the CoffeeProt Welcome page. Proteomics (or expression) data requires an identifiers in the first column, and quantitative measurements in all other columns. The preferred identifiers are Gene Names but ENSEMBL genes or UniProt IDs are also accepted (but are internally converted to Gene Names). (Optional) Users can also prepare custom identifier-annotation mapping files (identifier in the first column, annotations in the second) to perform analyses using annotations that are currently not present in CoffeeProt. pQTL/eQTL data files require the columns with information related to the SNP, the affected protein/transcript and a measure of the association. The GWAS/molQTL format is similar to the pQTL/eQTL files, but only needs the following 6 columns: rsID, phenotype, SNP location, SNP chromosome, p-value and grouping. It is recommended to follow the example datasets on the CoffeeProt Welcome page.

drawing

Data upload and processing

Proteomics/expression data

Click “Choose file”
Select your file (in .csv, .txt or excel format) [1][2]
Use the slider to select a missing value cut-off
Click “Process Proteins!”
Perform correlation by selecting a correlation method and p-value adjustment method
Click “Correlate!”

drawing

^{[1] Maximum file size of 900 Mb}
^{[2] Any column after the first column should only contain numeric data, blank cells or missing values (NA, Na, na, NaN, NAN)}

pQTL/eQTL data

Click “Choose file”
Select your file (in .csv, .txt or excel format)
Select the filter type and significance data type
Use the slider to apply the filters
Select a species for variant effect annotation (optional)
Click “Process pQTLs!”

drawing

GWAS/molQTL data

Click “Choose file”
Select your file (in .csv, .txt or excel format)
Select the filter type and significance data type
Use the slider to apply the filters
Click “Process molQTLs!”

Analysis

To start analyzing your data, click on to the Analysis tab which will reveal subtabs for the separate analyses. The analyses are divided into Correlation analysis, Database enrichment, SNP-protein analysis, Network analysis and Bait network analysis. Opening any of these tabs will show a new page with some information regarding the analysis, and a checklist indicating whether the required data for the analysis has been uploaded/processed. If all required data is present, new input options will appear allowing the analysis to be performed.

Correlation analysis

This tab displays a summary of the protein-protein correlation analysis. Prior to producing the plots, co-regulation is defined by the user by setting correlation coefficient and q-value cut-offs. The histograms visualize the number of protein-protein interactions that meet these criteria. For each protein, the number of co-regulation partners is determined based on the user-specified criteria.

Database enrichment

Analyses are performed after annotating co-regulated protein pairs to determine the extend of overlapping annotations. Protein-protein interaction databases (STRING, CORUM & BioPlex 3.0) are searched to identify previously discovered protein pairs. It is expected that a larger percentage of co-regulated protein pairs is found in these databases, compared to the non co-regulated pairs. It is recommended to adjust the co-regulation criteria if no enrichment is detected.

SNP-protein analysis

The SNP-Protein plot summarizes the interactions in the uploaded data by combining several visualizations. A Manhattan plot (top) highlights the QTL p-values per chromosome. Edges are drawn (center) connecting QTL and protein data, where edge color indicates the QTL type. Protein-Protein interactions are shown using arc-diagrams, proteins are ordered by complexsize and number of connections. The user can alter the plots by selecting a single chromosome or proteincomplex of interest.

Network analysis

Bait network analysis

drawing

Data exporting

To export all tables or plots as a compressed (Zipped) folder, click the “Export all tables” or “Export all plots” buttons. The table folder contains the plots generated based on the user data, such as the annotated proteomics dataset, correlation results and QTL tally tables. The plot folder contains all static plots created in the data upload tabs, the correlation, database enrichment and QTL-protein analyses. Alternatively, individual plots can be exported in various dimensions or file formats.

drawing

App status

CoffeeProt is currently in active development. The following new features will be added in the near future:

Custom protein/transcript annotation support, allowing users to upload their own annotations
New plots utilizing the correlation table share_loc and overlap_loc columns
Option to open (bait)networks in new browser tabs

Browser compatibility

OS	version	Chrome	Firefox	Microsoft Edge	Safari
Linux	Ubuntu 20.04.1 LTS	87.0.4280.88	78.0.1	n/a	n/a
MacOS	10.13.6	87.0.4280.67	83.0	n/a	13.1.2
Windows	10	87.0.4280.88	83.0	87.0.664.55	n/a

Contact

Email: support@coffeeprot.com

Citation

Jeffrey Molendijk, Marcus M Seldin, Benjamin L Parker. CoffeeProt: an online tool for correlation and functional enrichment of systems genetics data. Nucleic Acids Research; doi: https://doi.org/10.1101/2020.10.02.323246

Acknowledgements

This research was supported by use of the Nectar Research Cloud and by the University of Melbourne Research Platform Services. The Nectar Research Cloud is a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy. This work was funded by an Australian National Health and Medical Research Council Ideas Grant (APP1184363) and The University of Melbourne Driving Research Momentum program.

Human Protein Atlas subcellular localization data was obtained from http://www.proteinatlas.org and has previously been described in Thul PJ et al., A subcellular map of the human proteome. Science. (2017).

Drug-gene interaction data was obtained from DGIdb (https://www.dgidb.org/downloads).

Protein complex or protein interaction data were retrieved from the CORUM (http://mips.helmholtz-muenchen.de/corum/#download), BioPlex 3.0 (https://bioplex.hms.harvard.edu/interactions.php) and STRINGdb (https://string-db.org/cgi/download) databases.

The DNA vector image used in the CoffeeProt banner on the Welcome page was obtained from Vecteezy (Human dna design Vectors by Vecteezy)

Demo datasets

OR

Download demo data

Figure: Correlation coefficient histogram. Correlation analyses can be performed using protein/transcript data to identify correlated protein or transcript pairs. This plot indicates the number of pairs that would be considered to be 'correlated' at the user-selected correlation coefficient cut-off of 0.5. These plots are useful to judge appropriate cut-offs for other analyses that use the same parameters, such as the network analyses.

Figure: Correlation database enrichment plot. The protein or transcript pairs which are considered to be correlated, based on the user-defined cutoffs (correlation coefficient and 1-value), can be further analyzed using enrichment plots. In this example, the fraction of correlated (true) and non-correlated pairs (false) found in the CORUM protein-protein interaction database are shown. As expected, the correlation pairs detected using CoffeeProt are in agreement with the database, leading to a fold change enrichment of <50. Please note that the relatively low percentages on the y-axis are due to the large number of potential correlation pairs (several million) compared to the protein pairs in the database (~30.000).

Figure: SNP-Protein interactions. The SNP-protein interaction plot is constructed from three separate plots. A Manhattan plot (top) highlights the QTL p-values per chromosome. Edges are drawn (center) connecting QTL and protein data, where edge color indicates the QTL type. Protein-Protein interactions are shown using arc-diagrams, proteins are ordered by complex size and number of connections.

Correlation network plot. Interactive correlation networks are produced based on the user-uploaded protein/transcript data. These networks are highly customizable, allowing for the addition of pQTLs/eQTLs and molQTLs, colouring the links by a variable of interest (variant effect, intragenic SNPs). Additionally, known drug-gene interaction data can be added to aid the identification of proteins/transcripts with known drug-interactions.

Correlation bait network plot. Interactive correlation bait networks are produced based on the user-uploaded protein/transcript data. Users can select a single, or multiple proteins/transcripts of interest, to visualize all interacting targets associated traits. Alternatively, users can select traits of interest to show all associated proteins/transcripts. In this example, a bait network was made for the trait CE (cholesterol ester lipid measurement), highlighting interactions with proteins including Tmem97 and Aldh1a7. These networks are highly customizable, allowing for the addition of drug-gene interactions or colouring the links by a variable of interest (variant effect, intragenic SNPs).

Upload protein/transcript data

Please start by preparing protein/transcript data files as described on the Welcome page. Your dataset should contain identifiers in the first column, which could either be gene names, UniProt IDs or ENSEMBL IDs. All other columns should contain numeric data, corresponding to the protein/transcript abundance per sample. CoffeeProt will convert all IDs to gene names after the data is uploaded.

Upload SNPs associated to a protein/transcript

Please start by preparing pQTL/eQTL data files as described on the Welcome page. pQTL/eQTL data files require the columns with information related to the SNP, the affected protein/transcript and a measure of the association. An optional haplotype / LD block file can be uploaded to annotate SNPs that are in linkage disequilibrium, which can be used in the network plots to summarize many SNP nodes into fewer LD nodes. Human or mouse SNPs (<500.000) can be annotated with variant effects and variant impacts.

GWAS Catalog publicly available datasets

Download generated table

Use P-value description column as trait?

Upload SNPs associated to a phenotype or molecular trait

Please start by preparing molQTL/GWAS data files as described on the Welcome page. The GWAS/molQTL format is similar to the pQTL/eQTL files, but only needs the following 6 columns: rsID, phenotype, SNP location, SNP chromosome, p-value and grouping.

Instructions

This tab displays a summary of the protein-protein correlation analysis. Prior to producing the plots, co-regulation is defined by the user by setting correlation coefficient and q-value cut-offs. For the correlation coefficient, a range with a minimum and maximum value can be selected, where any value outside of this range is considered as correlated. The histograms visualize the number of protein-protein interactions that meet these criteria. For each protein, the number of co-regulation partners is determined based on the user-specified criteria.

Instructions

Network plots are used to visualize interactions between co-regulated proteins in interactive plots. The user can produce networks for 1) All protein interactions, 2) all protein interactions involved in QTLs, 3) protein interactions in the CORUM database or 4) protein interactions in the BioPlex 3.0 database. If QTLs have been uploaded they can be added directly to the network plots. Finally, the nodes and edges in the interactive plot can be colored by nodetypes (protein / SNP) and the user-uploaded proxies or annotations. The interactive plot allows zooming in on, moving and highlighting sections of the network. Correlated protein pairs outside of the range (minimum and maximum value) are included in the network. To only visualize positively correlated pairs, set the lower correlation cut-off to -1.

Downloading network plots: Network plots can be downloaded as interactive HTML files using the download button next to the plot button. These HTML files can be opened in most browsers and saved to a PDF file using the save to PDF option (control + p). On some browsers the network plots can be directly saved to a SVG or PDF file (Right click network plot > save as ...)

Plot parameters (optional)

Node distance parameter

Instructions

Bait network plots are used to visualize interactions between co-regulated proteins in interactive plots. The bait refers to a single, or list of, proteins or phenotypes of interest.The nodes and edges in the interactive plot can be colored by nodetypes (protein / SNP) and the user-uploaded proxies. The interactive plot allows zooming in on, moving and highlighting sections of the network.

Downloading network plots: Network plots can be downloaded as interactive HTML files using the download button next to the plot button. These HTML files can be opened in most browsers and saved to a PDF file using the save to PDF option (control + p). On some browsers the network plots can be directly saved to a SVG or PDF file (Right click network plot > save as ...)

Plot parameters (optional)

Node distance parameter

Instructions

Export all plots & tables

Use this tab to download all plots or all tables compressed in a Zip folder. The plots are exported at the default square ratio and in .svg file format. To export plots in different dimensions or file types, use the individual plot export option.

Welcome to CoffeeProt!

CoffeeProt: An online tool for correlation and functional enrichment of proteome-wide systems genetics

Table of contents

CoffeeProt workflow

Data preparation OR using demo data

Data upload and processing

Analysis

Data exporting

App status

Browser compatibility

Contact

Citation