Abstract

Protein–protein interactions, or PPIs, constitute a basic unit of our understanding of protein function. Though substantial effort has been made to organize PPI knowledge into structured databases, maintenance of these resources requires careful manual curation. Even then, many PPIs remain uncurated within unstructured text data. Extracting PPIs from experimental research supports assembly of PPI networks and highlights relationships crucial to elucidating protein functions. Isolating specific protein–protein relationships from numerous documents is technically demanding by both manual and automated means. Recent advances in the design of these methods have leveraged emerging computational developments and have demonstrated impressive results on test datasets. In this review, we discuss recent developments in PPI extraction from unstructured biomedical text. We explore the historical context of these developments, recent strategies for integrating and comparing PPI data, and their application to advancing the understanding of protein function. Finally, we describe the challenges facing the application of PPI mining to the text concerning protein families, using the multifunctional 14-3-3 protein family as an example.

Introduction

Protein–protein interactions (PPIs) are a conceptual representation of how proteins serve their roles in nature. As with other biological relationships, an ‘interaction’ can describe a variety of potential protein relationships. The two interactors may bind tightly to yield structural changes, one may modify the other chemically, they may act in concert with other biomolecules to form a stable complex, or all of the above. In this review, we define PPIs as physical interactions detected through direct or indirect experimental means; we do not include genetic interactions (i.e. instances where a change in one gene is associated with a change in another) though they may reveal protein relationships. In their most abstract representation, protein interactions are binary relationships, i.e. Protein A interacts with Protein B. These binary interactions assemble neatly into PPI networks, and, given a sufficiently comprehensive network, we may begin to observe patterns among its interactors. Such networks reveal potential protein functions in the context of entire systems, including pathways, complexes, and more intricate phenomena such as metabolic processes or disease presentations.

Numerous biomolecular studies have sought to identify large PPI sets, including those covering entire proteomes or biological systems (i.e. interactomes). They have pioneered the development of correspondingly high-throughput interaction detection methods for detecting tens of thousands of interacting pairs within a single assay. The resulting data (e.g. from early yeast interactomes [1] to ongoing work toward a human interactome [2]) present many of the challenges common to large biological datasets: they are large, noisy, vary across methods, and can be difficult to interpret if some interactors are over- or under-represented. Interpretation of PPI networks, therefore, benefits from a comparison of multiple interaction datasets across different studies.

PPIs are actively curated, as evidenced by the size of interaction databases: BioGRID [3] and IntAct [4] each contain more than 870 000 physical protein interactions, respectively, with some redundancy within and across the datasets. A wealth of additional interaction data is also available, including genetic, chemical vs. protein, and complex interactions. We may maximize the value of interaction data by extracting new experimentally observed interactions in an automated manner. Besides the benefits to interaction dataset curation (i.e. automated methods may extract PPIs to be further evaluated by expert curators), extracting PPIs directly from text imposes valuable structure on otherwise unstructured data. Extracted PPIs can be assembled into graphs and combined with other sources of biomedical data to form knowledge graphs. These processes lead to intuitive query methods: researchers may ask ‘under what conditions does Protein A interact with Protein B?’ instead of combing the literature or databases for all mentions of these proteins. As compared with searching raw interaction data, extracting PPIs from the text supports the identification of the interactions most relevant to biological phenomena and relevant conclusions. Methods to support PPI extraction are particularly valuable in situations involving large sets of proteins of interest, recently characterized interactions, or unexpected biomolecular relationships. Previous findings can then inform experimental planning or contextualize results.

In this review, we highlight recent developments in the extraction of PPIs from unstructured biomedical text. We begin with a brief overview of historically relevant methods for comparison purposes, though we do not cover all previous developments. Many have been reviewed previously [5]. Most have been developed for the identification of physical interactions, and though the underlying strategy is appropriate for other interaction types (e.g. genetic interactions [6]), we focus on their intended applications here. We also do not include methods for PPI prediction based on structured data sources such as amino acid sequence or structure. Our focus in the body of this review is on new developments in PPI extraction, methods for the integration of PPI networks, and tools for learning from the resulting data. We close with a synopsis of the challenges present in this field, using the highly multifunctional 14-3-3 protein family as an illustrative example.

Traditional approaches for extracting PPIs

The core idea of PPI extraction remains similar to that defined 20 years ago by Blaschke et al. [7]: an automatic and accurate description of protein interaction networks is the goal. PPI extraction has since become a popular task for demonstrating relation extraction (RE) methods (see examples in Figure 1), as isolating PPI is conceptually similar to extracting other types of relationships from the text. Past approaches have generally involved some combination of the following strategies and resources. Rule-based methods and parsers, including pattern [8] and argument structure [9] parsers, were at one point de facto standards for the task. Predicate-argument extraction methods emerged around the same time, e.g. SemRep [10] and PASMED [11]; their results include a variety of biomedical relationships, including PPIs. Co-occurrence (i.e. modeling PPIs as two proteins named within a sentence) is an alternate strategy [12,13], though one with fairly low precision. Supervised machine learning has experienced a meteoric rise in popularity in the field, as with other topics. One prominent example, and a precursor to recent methods, is the rich feature vector and automated feature selection method [14]: it produces dependency graphs to identify potential PPI. The precision and recall afforded by newly developed PPI extraction methods have been measured through a set of labeled text corpora: IEPA [15], AIMed [16], LLL [17], BioInfer [18], and HPRD50 [19] are commonly used, with the occasional addition of relationships from PubTator [20] annotations. The BioCreative II.5 [21], V [22], and VI [23] challenges have also provided valuable training and testing corpora, the BioC data format (see Figure 1B for an example), and much of the motivation for new developments in PPI extraction and related tasks.

Examples of protein–protein interactions described in text and features relevant to their extraction.

Figure 1.
Examples of protein–protein interactions described in text and features relevant to their extraction.

(A) Two examples of interactions described within experimental documents. The top example from Lin et al. [81] is explicit while the lower example from Kim et al. [82] presents additional semantic complexity that may cause it to be missed by some extraction methods. The interactors in each example are highlighted in blue. (B) An example of an interaction contained within an interaction training set, specifically the AIMed corpus. A selection of the full BioC format record is shown for clarity. The original statement is from Clark et al. [83]. In this case, Raf-CRD refers to the cysteine-rich domain of Raf-1, though it is annotated in the same manner as a full protein. (C) A visual network representation of the example interactions. Though most interactions in this directed network are between two proteins (blue circle nodes), one node represents an entire protein family (blue square) and other nodes represent involvement in biological functions or activities described in the text (red diamonds). Interaction types are labeled; the group membership between 14-3-3 proteins is shown as dashed lines.

Figure 1.
Examples of protein–protein interactions described in text and features relevant to their extraction.

(A) Two examples of interactions described within experimental documents. The top example from Lin et al. [81] is explicit while the lower example from Kim et al. [82] presents additional semantic complexity that may cause it to be missed by some extraction methods. The interactors in each example are highlighted in blue. (B) An example of an interaction contained within an interaction training set, specifically the AIMed corpus. A selection of the full BioC format record is shown for clarity. The original statement is from Clark et al. [83]. In this case, Raf-CRD refers to the cysteine-rich domain of Raf-1, though it is annotated in the same manner as a full protein. (C) A visual network representation of the example interactions. Though most interactions in this directed network are between two proteins (blue circle nodes), one node represents an entire protein family (blue square) and other nodes represent involvement in biological functions or activities described in the text (red diamonds). Interaction types are labeled; the group membership between 14-3-3 proteins is shown as dashed lines.

New approaches for extracting PPIs from unstructured text

Focused efforts in RE have recently yielded impressive advances in the identification of PPIs within the text. Though these approaches differ methodologically, most have been tested with similar PPI corpora sets (i.e. those listed in the previous section) and address the specific objectives provided by the same computational challenges. Any single method may produce variable results across different evaluation metrics and test datasets. We, therefore, suggest that researchers seeking PPI extraction methods evaluate recent developments by the following qualitative criteria. First, how well does a given method perform on a labeled sample of your own intended text data? Does it or its component processes fail to capture proteins of interest or their interactions? Second, how easily may the method fit into the intended workflow? Is it likely to be more computationally demanding than available resources will allow? Finally, are the method's results interpretable? Will it be possible to determine why one phrase is labeled as a PPI while another is ignored? With these considerations in mind, we structure the following overview in terms of each method's intended use cases rather than methodological similarity alone. See Table 1 for the full list of methods and any corresponding locations of their implementation details, including public code or platforms.

Table 1
Details and scope of PPI extraction methods discussed in this review
Name of method Implementation details Scope 
Deep CNN for PPI extraction [29See paper Extraction of PPI pairs from individual sentences 
Deep learning method for PPI and mutation extraction with gene normalization [32https://github.com/bionlproc/biocppi_extraction Extraction of PPIm pairs 
DeepDive [38https://github.com/edoughty/deepdive_genegene_app Extraction of gene pairs suggesting PPIs 
DSTK [24See paper Extraction of PPI pairs from individual sentences 
Graph kernels for RE [26https://bitbucket.org/readbiomed/asm-kernel Extraction of PPI pairs from individual sentences and across multiple sentences in a document 
GRGT [25See paper Extraction of PPI pairs from individual sentences along with interaction term(s) 
Method by Fan et al. [33See paper Extraction of PPIm pairs 
MNM [34See paper Extraction of PPIm pairs 
OntoILPER [39See paper Extraction of PPI pairs from individual sentences 
PIPE [37See paper Extraction of PPI pairs from individual sentences 
PKDE4J [40http://informatics.yonsei.ac.kr/pkde4j/ Extraction of PPI pairs from individual sentences 
PPICurator [36https://ppicurator.hupo.org.cn Generation and visualization of PPI interaction networks from binary relations 
PPIm extraction method by Chen et al. [31https://bitbucket.org/readbiomed/biocreative-vi Extraction of PPIm pairs 
RNN/CNN method [27See paper Extraction of PPI pairs from individual sentences 
Variational autoencoder method [28See paper Extraction of PPI pairs from individual sentences 
Name of method Implementation details Scope 
Deep CNN for PPI extraction [29See paper Extraction of PPI pairs from individual sentences 
Deep learning method for PPI and mutation extraction with gene normalization [32https://github.com/bionlproc/biocppi_extraction Extraction of PPIm pairs 
DeepDive [38https://github.com/edoughty/deepdive_genegene_app Extraction of gene pairs suggesting PPIs 
DSTK [24See paper Extraction of PPI pairs from individual sentences 
Graph kernels for RE [26https://bitbucket.org/readbiomed/asm-kernel Extraction of PPI pairs from individual sentences and across multiple sentences in a document 
GRGT [25See paper Extraction of PPI pairs from individual sentences along with interaction term(s) 
Method by Fan et al. [33See paper Extraction of PPIm pairs 
MNM [34See paper Extraction of PPIm pairs 
OntoILPER [39See paper Extraction of PPI pairs from individual sentences 
PIPE [37See paper Extraction of PPI pairs from individual sentences 
PKDE4J [40http://informatics.yonsei.ac.kr/pkde4j/ Extraction of PPI pairs from individual sentences 
PPICurator [36https://ppicurator.hupo.org.cn Generation and visualization of PPI interaction networks from binary relations 
PPIm extraction method by Chen et al. [31https://bitbucket.org/readbiomed/biocreative-vi Extraction of PPIm pairs 
RNN/CNN method [27See paper Extraction of PPI pairs from individual sentences 
Variational autoencoder method [28See paper Extraction of PPI pairs from individual sentences 

Extraction of PPI directly from biomedical literature is the intended purpose for several recently described strategies. A comparison of F1 scores for these methods reported by their authors for various PPI test corpora is provided in Table 2. We note that this single statistic is an approximation of accuracy and results vary with optimization (e.g. hyperparameter tuning) for specific data. Kernel methods have proven to be particularly effective for the task. The Distributed Smoothed Tree kernel (DSTK) [24] builds on previously demonstrated linear and tree kernel methods by combining representations of both syntactic and semantic information, then merges the results with a feature-based kernel. Recent approaches have also seen advances through the representation of grammatical relationships as graph structures. GRGT [25] is a Grammatical Relationship Graph for Triplets designed to extract terms, relationships, and relationship types. GRGT assumes each pair of interactions is a triplet, then assembles a graph of these triplets and makes predictions on candidate triplets using a decision tree model. A related method employing graph kernels for RE [26] offers an alternative, though its authors note it likely requires parameter tuning for optimal results. In an RNN/CNN (recurrent neural network and convolutional neural network) model for RE [27], a dependency graph and the shortest dependency path (SDP) are generated for each candidate sentence. The SDP is treated as both a sequence of word dependencies and as a sequence of relations. RNNs and CNNs then learn the features of both the relations and word dependencies, eventually using them to output features for RE. Though the resulting hybrid model performs slightly better than a bi-LSTM (i.e. bidirectional long short-term memory) model alone on PPI sets, and especially good on LLL and IEPA sets, it has difficulty with complex semantic structures where entities are far apart and/or in different clauses or co-ordinate structures. Finally, a variational autoencoder method inspired by successful approaches in computer vision has shown some intriguing preliminary results [28].

Table 2
Results reported for PPI extraction method benchmarks
Name of method Best Reported F1 score for corpus 
AIMed BioInfer HPRD50 IEPA LLL 
Deep CNN for PPI extraction [2985.2 NR NR NR NR 
DSTK [2471.0 76.29 80.0 80.2 89.2 
Graph kernels for RE [2643.8 40.4 69.7 59.6 72.3 
GRGT [25NR NR 64.0 74.9 83.6 
OntoILPER [39NR NR 75.3 76.1 79.9 
PIPE [3760.6 69.4 71.5 71.4 80.6 
PKDE4J [4074.0 83.0 79.0 81.0 NR 
RNN/CNN method [2761.7 64.8 75.6 78.2 85.2 
Variational autoencoder method [28NR 62.2* NR NR NR 
Name of method Best Reported F1 score for corpus 
AIMed BioInfer HPRD50 IEPA LLL 
Deep CNN for PPI extraction [2985.2 NR NR NR NR 
DSTK [2471.0 76.29 80.0 80.2 89.2 
Graph kernels for RE [2643.8 40.4 69.7 59.6 72.3 
GRGT [25NR NR 64.0 74.9 83.6 
OntoILPER [39NR NR 75.3 76.1 79.9 
PIPE [3760.6 69.4 71.5 71.4 80.6 
PKDE4J [4074.0 83.0 79.0 81.0 NR 
RNN/CNN method [2761.7 64.8 75.6 78.2 85.2 
Variational autoencoder method [28NR 62.2* NR NR NR 

Highest score for each test corpus shown in bold. NR, not reported.

*

Zhang and Lu use a subset of 4000 of the 7666 samples in BioInfer.

Approaches based upon statistical representations of the contexts of words (i.e. word embeddings) are emerging as solid options for PPI extraction. A deep CNN for PPI extraction [29] learns word embeddings between protein names and generates both sentence feature and linguistic feature vectors. It then combines them into a global feature vector to generate a confidence value for whether two proteins interact. The resulting model appears to yield solid results on the AIMed test set, but its performance is variable. Though focused more on biocuration via document classification and experimental method identification, a method recently described by Burns et al. [30] also employs several different word embedding architectures for processing interaction evidence.

BioCreative VI introduced many biomedical information extraction researchers to the problem of identifying PPI impacted by mutations, or PPIm. This task adds complexity to the challenge of isolating PPI alone as methods must also consider the presence of a related gene and its relationship to the interacting proteins. An approach by Chen et al. [31] is relatively straightforward: following interactor name extraction and normalization, the authors use co-occurrence and a support vector machine for RE. One deep learning method involving gene normalization [32] addresses the task by generating word representations with a CNN-LSTM hybrid model for named entity recognition, normalizing gene names against NCBI Gene, then extracting PPIm sets with a CNN model trained on the same word embeddings as for entity recognition. An approach by Fan et al. [33] includes a supervised model (i.e. a support vector machine-based classifier) and incorporates pre-built word vectors for dependency tree features and some additional dependencies, along with negation. A memory network-based model [34] has also been applied to PPIm extraction: the proposed model first encodes triples (head entity, relation and tail entity) in knowledge bases (KBs) into a continuous vector space, at which point a knowledge representation is learned for each entity and relation. The learned knowledge representations are then introduced into the memory network through attention mechanisms to capture important context clues about each entity pair. Further approaches can be found in the results of the BioCreative VI Precision Medicine track [23] and in the manually curated International Molecular Exchange (IMEx) Consortium mutations data set [35]. In the latter case, curators have identified nearly 28 000 instances where point mutations impact physical interactions; the set, therefore, can serve as training or validation data for PPI extraction methods.

Few platforms or comprehensive frameworks for the analysis of PPIs extracted from the text have been released in recent years, though the few examples are very promising. PPICurator [36] is a web-based tool based on a SVM classifier. It determines confidence scores, interaction direction, and other annotations on PPIs and provides visualization with the resulting interaction networks. The web tool focuses on PubMed abstracts. PIPE [37] is an ‘interaction pattern generation module’ and interaction pattern tree kernel method intended to assist PPI database curators. Its authors refer to it as an interaction pattern tree kernel approach. Other examples of PPI curation approaches can be seen in the responses to the BioCreative V Collaborative Biocurator Assistant Task [22].

Though a focus on the exact details particular to PPI (e.g. protein names or phrases denoting interaction types) should logically provide more accurate extraction results, domain-agnostic methods and those developed for general biomedical purposes have shown promise. The biomedical inference engine DeepDive [38] performs distant supervision to identify entities and relations, following user-definition and training with an initial example set. Uniquely, DeepDive provides a mechanism for training and evaluating results based on existing PPI data, of which there is plenty, and is designed for use with full texts. In practice, the system identifies interactions by co-occurrence within a sentence, though DeepDive also uses CRFs to link candidate relations to each mention of a protein pair. The domain-independent Inductive Logic Programming-based OntoILPER [39] approach yields impressive results on PPI test corpora. Similarly, Song et al. [40] have tested their Public Knowledge Discovery Tool (PKDE4J) on several PPI corpora with reported results superior to several task-focused strategies. These general approaches could see a noticeable improvement with the incorporation of additional PPI-specific design considerations.

PPI extraction is subject to a variety of shortcomings which may limit the wider application of automated approaches. Specific challenges facing PPIm extraction, particularly regarding limitations of current biomedical natural language processing (NLP) tools and BioCreative tasks, were recently addressed in great detail by Chen et al. [41]. Named entity recognition and entity name normalization continue to be limiting factors for nearly all methods, as a failure to detect a protein in the text will generally cause its interactions to be similarly ignored. Several of the kernel-based approaches, in particular, may be challenged by complex sentences, e.g. those with four or more protein names. A comprehensive syntactic representation may be critical for the identification of PPIs with one–many, many–many, and many–one relationships. Several of the above methods, though capable of high precision in PPI extraction, demonstrate low recall. This may be evidence of limitations in protein name recognition or identification of semantic features denoting interactions. For PPIm extraction, methods are challenged by false positives, especially in cases where there may be PPI but not PPI and mutations. This may be the result of semantic ambiguity within the input text.

New strategies for integrating and comparing PPI data

Once candidate PPIs have been extracted, they often must be compared with other interaction networks, whether directly from experimental observations or from other text sources. Extracted PPIs should be compared with those currently in DBs: BioGRID [3], IntAct [4], and other members of the IMEx Consortium are prominent examples. The 10 organizations within IMEx use the IntAct platform and the standards developed by the Molecular interactions work group of the HUPO Proteomics Standards Initiative (PSI-MI) [42,43] to curate and produce high-quality interaction data. When consistently applied, the PSI-MI standards support the consistent integration of PPIs. Other integrated databases, such as STRING [44], demonstrate how PPI may be further integrated with a variety of predicted interactions, including those from automated text mining.

Comparing PPIs extracted from multiple document sets and/or those within databases can be a laborious task without computational assistance. Conveniently, we may take advantage of both each network's structure and the properties of its interacting proteins to identify similar subnetworks (Figure 2B). See Table 3 for the full list of methods and any corresponding locations of their code or other resources. If protein names are reasonably consistent, yet they originate from multiple species, a basic strategy is to combine them on the basis of orthology (e.g. with eggNOG orthologous groups [45]) to produce a cross-species interaction network (Figure 2A). In the case of multiple networks containing orthologous proteins, their interactions may be considered interologs. This approach requires proteins to be mapped to a consistent set of identifiers, however, and may improperly combine proteins of similar sequence but different function (e.g. 14-3-3 proteins, as detailed further below). INTERSPIA [46] is a tool and visualization platform for comparing PPI patterns across different species. IMMAN is an R/Bioconductor package for interolog protein network reconstruction, mapping, and mining analysis [47]; it provides additional options for quickly comparing multiple PPI networks based on their orthology.

Comparing extracted PPI networks based on orthology and low-level network features.

Figure 2.
Comparing extracted PPI networks based on orthology and low-level network features.

(A) Network comparison by orthology. Proteins belonging to the same orthologous group (OG) may be considered identical within a new, combined network. (B) Network comparison by topology. Two networks may share structural similarities, e.g. the same protein may demonstrate a similar pattern of interactions.

Figure 2.
Comparing extracted PPI networks based on orthology and low-level network features.

(A) Network comparison by orthology. Proteins belonging to the same orthologous group (OG) may be considered identical within a new, combined network. (B) Network comparison by topology. Two networks may share structural similarities, e.g. the same protein may demonstrate a similar pattern of interactions.

Table 3
Demonstrations and code of PPI network integration/comparison methods and other resources discussed in this review
Name of method Publicly available code and/or resource Scope 
AligNet [55http://bioinfo.uib.es/∼recerca/AligNet/ PPI network alignment 
CPredictor 5.0 [65See paper Detection of complexes within PPI networks 
CrowdCure [63See paper Platform for crowdsourced curation of PPI 
deepNF [50https://github.com/Vgligorijevic/deepNF PPI network integration 
Gu et al. graphlet method [53https://nd.edu/∼cone/colored_graphlets/ Biological interaction network alignment and integration 
IBNAL [52http://www.cs.uccs.edu/∼linclab/IBNAL/Documentation.html PPI network alignment 
IMMAN [47https://bioconductor.org/packages/release/bioc/html/IMMAN.html PPI network integration across species 
INTERSPIA [46http://bioinfo.konkuk.ac.kr/INTERSPIA/ PPI network integration across species; network visualization 
LocText [67https://www.tagtog.net/-corpora/LocText Text corpus of abstracts manually annotated for proteins, interactions, and subcellular location 
LTOM [68See paper Detection of complexes within PPI networks 
Mashup [49http://mashup.csail.mit.edu/ PPI network integration across species 
MELODI [64http://www.melodi.biocompute.org.uk/ Biomedical hypothesis generation based on relationships including PPI 
MetaBridge [74https://www.metabridge.org/ Generation of PPI networks from metabolomics data 
MultiMAGNA++ [54http://nd.edu/∼cone/multiMAGNA++/ Biological interaction network alignment 
Organization Measurement [69See paper Removing noise and finding missing interactions in PPI networks 
PL-PPF [71http://ecesrvr.kustar.ac.ae:8080/plppf/ Protein function prediction based on PPI and other features 
PTMOracle [72Available as App for Cytoscape v.3.1 or higher Analysis and visualization of post-translational modifications within PPI networks 
PURPOSE [70http://rebrand.ly/proteinpurpose Protein prioritization and visualization based on relationship to organ or disease 
Vicus Matrix [66See paper Representation of local structures in networks, including modules within PPI networks 
Name of method Publicly available code and/or resource Scope 
AligNet [55http://bioinfo.uib.es/∼recerca/AligNet/ PPI network alignment 
CPredictor 5.0 [65See paper Detection of complexes within PPI networks 
CrowdCure [63See paper Platform for crowdsourced curation of PPI 
deepNF [50https://github.com/Vgligorijevic/deepNF PPI network integration 
Gu et al. graphlet method [53https://nd.edu/∼cone/colored_graphlets/ Biological interaction network alignment and integration 
IBNAL [52http://www.cs.uccs.edu/∼linclab/IBNAL/Documentation.html PPI network alignment 
IMMAN [47https://bioconductor.org/packages/release/bioc/html/IMMAN.html PPI network integration across species 
INTERSPIA [46http://bioinfo.konkuk.ac.kr/INTERSPIA/ PPI network integration across species; network visualization 
LocText [67https://www.tagtog.net/-corpora/LocText Text corpus of abstracts manually annotated for proteins, interactions, and subcellular location 
LTOM [68See paper Detection of complexes within PPI networks 
Mashup [49http://mashup.csail.mit.edu/ PPI network integration across species 
MELODI [64http://www.melodi.biocompute.org.uk/ Biomedical hypothesis generation based on relationships including PPI 
MetaBridge [74https://www.metabridge.org/ Generation of PPI networks from metabolomics data 
MultiMAGNA++ [54http://nd.edu/∼cone/multiMAGNA++/ Biological interaction network alignment 
Organization Measurement [69See paper Removing noise and finding missing interactions in PPI networks 
PL-PPF [71http://ecesrvr.kustar.ac.ae:8080/plppf/ Protein function prediction based on PPI and other features 
PTMOracle [72Available as App for Cytoscape v.3.1 or higher Analysis and visualization of post-translational modifications within PPI networks 
PURPOSE [70http://rebrand.ly/proteinpurpose Protein prioritization and visualization based on relationship to organ or disease 
Vicus Matrix [66See paper Representation of local structures in networks, including modules within PPI networks 

More recently, graph embedding approaches have opened up new options for comparing and combining interaction networks (see Su et al. [48] for a recent review of biomedical network embedding methods). Producing vector relationships is immediately informative for PPI networks and has been incorporated within several projects. Mashup [49] is a framework for integrating interaction networks from multiple, heterogeneous data sources. It learns low-dimensional vector representation of genes and proteins within each network, eventually yielding compact representations of network topology and retaining the most informative features. The deepNF method [50] performs network integration based on multimodal deep autoencoders. Like Mashup, deepNF produces a vector representation of proteins in PPI networks, though it includes some adaptations to account for complex and non-linear network structure. An intensive approach developed by Wang et al. [51] also supports learning from PPI and other data sources, though with the goal of annotating and organizing genes by their function.

Comparing and combining PPI networks can be considered an application of network alignment, so methods developed for this purpose are generally applicable. IBNAL, an Indexes-Based Network Aligner [52], is designed to account for highly connected protein subnetworks and the value of local network topology in predicting protein function similarity. With that in mind, IBNAL produces very rapid PPI network alignments. A broader set of cases of heterogeneous network alignment is addressed by a graphlet-based method by Gu et al. [53]. The authors intend their method to solve the problem of integrating networks containing both different types of interactors and interactions, i.e. protein vs. protein and protein vs. disease phenotype. (The authors note that multimodal network alignment, a related problem where networks of the same nodes have edges of different types, has been explored, though only recently.) The same research group produced MultiMAGNA++ [54], a method for Multiple Network Alignment (MNA), or alignment of more than two networks at a time. This method is not entirely specific to PPI networks methodologically, but the authors test MultiMAGNA++ using multi-species PPI networks from BioGRID. The authors of the recently described AligNet [55] seek to produce both topologically consistent and biologically meaningful PPI network alignments. They do so by generating numerous local alignments, then combining them into a global alignment.

Making sense of massive PPI collections

The PPI extraction methods described above, though generally designed and tested with relatively small test corpora and well-defined annotations, should ideally be applied to much larger sets of biomedical text. Source articles be any subset of the more than 19.5 million abstracts on PubMed or the 5.5 million full-text articles in PubMed Central, both retrievable through the NCBI E-utilities [56]. Increasingly, pre-print servers such as bioRxiv [57] are also sources of documents describing protein interactions. In practice, working with unlabeled, largely unstructured corpora containing hundreds of thousands or even millions of documents requires steps beyond RE alone: each document must be processed to a consistent format, then specific proteins must be identified (i.e. with named entity recognition) and normalized (often through consistent database identifiers and/or ontologies). A formal metadata extraction approach, as we have defined for clinical documents [58,59], is advisable. Methods designed for use with abstracts as input will require re-training at a minimum if they are applied to full-text documents [60]. Some of the process may be bypassed through the use of annotation services and datasets such as PubTator [20] or SemMedDB [61]: these sets contain pre-annotated entities and relations, respectively, partially alleviating the burden of labeling individual proteins or potential interactions. Each PPI extraction approach requires some degree of algorithmic ‘glue’ to permit its assembly into a fully functional pipeline. Concerns specific to PPI extraction have been previously explored [5], including as part of an analysis of 15 million full-text research articles [62].

Following collection and comparison of a number of potential PPIs from a text corpus, the logical next step is to integrate these PPIs — preferably in network form — with other sources of functional data and methods for identifying informative network properties. Recent years have been an active time for work in this area; see Table 3 for the full list of methods and any corresponding locations of their code or other resources. Irrespective of the biological phenomena of interest, the most informative strategy is likely to involve an entire workflow of tools.

To date, no group has developed a system for external validation of PPI data (or most automatically curated biomedical data, for that matter) superior to human curation. Crowdsourced curation may be performed through the CrowdCure system [63]: as its name suggests, CrowdCure is designed for use by a set of many curators and includes both accommodations for bias and uncertainty as well as a customized interface for users to select the types of interactions they are most comfortable curating. In the absence of a set of expert curators, however, the literature itself may be used as another source of validation. The web application MELODI [64] can be used for this purpose as it supports the identification of concept similarities across biomedical articles. Though MELODI is not PPI-specific, it uses a graph database pre-populated with biomedical relationships from PubMed articles, MeSH terms and SemMedDB triples [61], with the goal of identifying mechanisms linking two sets of articles (e.g. relationships beyond those between sets of proteins).

A PPI network is a rich resource on its own, capable of providing support for the existence of biologically relevant complexes, modules, and functions. The CPredictor 5.0 method recently proposed for detecting protein complexes [65] combines topological properties of PPI networks with their membership in functional groups to yield subgraphs potentially corresponding to complexes. The Vicus Matrix [66], a variant of the Laplacian matrix, has also been applied to a small exploration of clustering within a PPI network. LocText [67] and LTOM [68] identify similarities among PPIs based on protein co-localization, or the spatio-temporal ability of two proteins to interact (i.e. the interactors must occupy similar cellular locations). The LocText authors also raise the important consideration that mined results may be from retracted sources — a concern applicable to all methods described here — so expert verification remains crucial. Finally, as PPI networks of any source are inherently noisy, recently described methods may be employed to handle the noise through Organization Measurement [69].

Integration of extracted PPIs with multiple sources of –omics data may provide crucial biomolecular context. Given a list of potential protein interactors, they can be prioritized for further study using Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores [70] and may have their functions predicted with PL-PPF, a system for inferring protein function from explicitly and implicitly stated relationships, including PPIs [71]. Post-translational modifications (PTMs) may be investigated within PPI networks with the Cytoscape app PTMOracle [72] and a method for in silico proteomics for PPIs dependent on PTMs [73]. MetaBridge [74] expands these options into integration between metabolomics data and interaction networks.

Extracting PPI for a protein family

The 14-3-3 protein family is a group of broadly multifunctional, highly evolutionarily conserved proteins with roles in cell cycle control, apoptosis, neurotransmitter synthesis, protein stability, and other functions [75,76]. Beyond its many biological functions, this protein family serves to provide illustrative examples of the challenges may researchers encounter in studying PPIs among entire protein families. Briefly, the 14-3-3 family includes seven isoforms conserved throughout mammalian species and beyond, though each isoform appears to participate in a distinct but at times redundant set of PPIs and mechanisms. To further complicate matters, 14-3-3 isoforms assemble into heterodimers. Their roles as interaction hubs in crucial biological processes highlight the value of 14-3-3 as potential therapeutic targets, as does their high conservation. The complexity of their interactions, unfortunately, renders them difficult to study comprehensively (see review by Pennington et al. [77]).

We can hypothesize that automated PPI extraction would assist with the identification of specific 14-3-3 PPIs, particularly in instances where different research fields have studied similar interaction contexts. We immediately face technical challenges upon pursuing this strategy. Within text, 14-3-3 isoforms may be referred to in a variety of manners, including by their collective name (e.g. ‘14-3-3 proteins recognize phosphorylated TFEB’ [78]), by their specific isoform (e.g. ‘14-3-3 sigma’, ‘14-3-3σ’, or its alternative name, ‘stratifin’), or by gene names (e.g. ‘YWHAZ’). Pre-processing steps may inconsistently normalize these identifiers or overlook them entirely, even for reasons as simple converting input text between Unicode and ASCII formats. Though these challenges apply to most proteins, and in fact may be a major factor limiting RE methods [41], protein families such as 14-3-3 risk having the individual identities of their constituent members lost during both manual and automated interaction extraction, particularly because their name does not resemble those of most protein families.

Without further curation, relations may contain unresolvable ambiguities. Results from PPICurator, for one example, include both ‘14-3-3 sigma’ and ‘14-3-3 protein’ (Figure 3). 14-3-3 proteins are also poorly covered within PPI test corpora: they are annotated just six times out of 225 documents within the AIMed interaction corpus [16] and not at all within the LLL [17], BioInfer [18], HPRD50 [19], or IEPA [15] corpora. The relatively small size of these corpora may contribute to the omission of this — and likely other — protein families. We, therefore, suggest that, while testing new PPI extraction methods is clearly an essential part of their development, demonstrating their ability to isolate interactions of biologically meaningful protein families may be more informative regarding their applicability to biomedical research.

Example of PPI extraction results.

Figure 3.
Example of PPI extraction results.

PPICurator [36] results for a query of ‘14-3-3 sigma’, after selecting the first 50 document results and filtering for interactions with a confidence score above 0.9. This results in 11 PPI across nine source documents. Note that the target protein appears as ‘14-3-3 sigma’ and as ‘14-3-3 sigma protein’.

Figure 3.
Example of PPI extraction results.

PPICurator [36] results for a query of ‘14-3-3 sigma’, after selecting the first 50 document results and filtering for interactions with a confidence score above 0.9. This results in 11 PPI across nine source documents. Note that the target protein appears as ‘14-3-3 sigma’ and as ‘14-3-3 sigma protein’.

To assist with future testing purposes, we assembled a set of five text segments, each containing at least one PPI involving a 14-3-3 protein. All source text is from the PubMed Central open access, Creative Commons subset (i.e. CC BY licensed) with documents identified by their PubMed IDs. We manually labeled the PPI through the brat tool [79] and converted the annotations to BioC format [80]; the annotated files are available on Figshare (https://doi.org/10.6084/m9.figshare.8847941). We intend this set to be a minimal example of the full extent of 14-3-3 PPI described through free, unstructured text within the biomedical literature.

Summary

  • Automatic extraction of protein–protein interactions from unstructured text renders disparate observations into a coherent dataset.

  • Though traditional approaches to PPI extraction rely upon carefully engineered rules and isolation of specific grammatical constructs, new methods increasingly leverage machine learning and deep learning to achieve the same goals.

  • Once PPI sets have been identified, they must be compared with other results from a variety of sources of varying degrees of structure and noise.

  • Making sense of large PPI sets remains a challenging prospect and success will depend upon the use case.

  • Highly multifunctional protein families such as 14-3-3 are ideal test sets for evaluating PPI methods in terms of their technical limitations and capability to provide informative value.

Abbreviations

     
  • CNN

    convolutional neural network

  •  
  • DSTK

    Distributed Smoothed Tree kernel

  •  
  • GRGT

    Grammatical Relationship Graph for Triplets

  •  
  • IMEx

    International Molecular Exchange

  •  
  • KB

    knowledge base

  •  
  • LSTM

    long short-term memory

  •  
  • MNA

    Multiple Network Alignment

  •  
  • NLP

    natural language processing

  •  
  • PPI

    protein–protein interaction

  •  
  • PPIm

    protein–protein interaction impacted by mutation

  •  
  • PTMs

    Post-translational modifications

  •  
  • RE

    relation extraction

  •  
  • RNN

    recurrent neural network

  •  
  • SDP

    shortest dependency path

Author Contribution

J.H.C. annotated data, planned the article, and wrote the article. P.P. supervised and reviewed the article.

Funding

This work was supported in part by National Heart, Lung, and Blood Institute: R35 HL135772 (to P.P.); National Institute of General Medical Sciences: U54 GM114833 (to P.P.), and the T.C. Laubisch endowment at UCLA (to P.P.).

Acknowledgements

The authors thank Henning Hermjakob for assistance in improving the manuscript.

Competing Interests

The Authors declare that there are no competing interests associated with the manuscript.

References

References
1
Schwikowski
,
B.
,
Uetz
,
P.
and
Fields
,
S.
(
2000
)
A network of protein-protein interactions in yeast
.
Nat. Biotechnol.
18
,
1257
1261
2
Rolland
,
T.
,
Taşan
,
M.
,
Charloteaux
,
B.
,
Pevzner
,
S.J.
,
Zhong
,
Q.
,
Sahni
,
N.
et al.  (
2014
)
A proteome-scale map of the human interactome network
.
Cell
159
,
1212
1226
3
Oughtred
,
R.
,
Stark
,
C.
,
Breitkreutz
,
B.-J.
,
Rust
,
J.
,
Boucher
,
L.
,
Chang
,
C.
et al.  (
2019
)
The BioGRID interaction database: 2019 update
.
Nucleic Acids Res.
47
,
D529
D541
4
Orchard
,
S.
,
Ammari
,
M.
,
Aranda
,
B.
,
Breuza
,
L.
,
Briganti
,
L.
,
Broackes-Carter
,
F.
et al.  (
2014
)
The MIntAct project — IntAct as a common curation platform for 11 molecular interaction databases
.
Nucleic Acids Res.
42
,
D358
D363
5
Papanikolaou
,
N.
,
Pavlopoulos
,
G.A.
,
Theodosiou
,
T.
and
Iliopoulos
,
I.
(
2015
)
Protein-protein interaction predictions using text mining methods
.
Methods
74
,
47
53
6
Proux
,
D.
,
Rechenmann
,
F.
and
Julliard
,
L.
(
2000
)
A Pragmatic Information Extraction Strategy for gathering Data on Genetic Interactions
.
Proceedings of the International Conference on Intelligent Systems for Molecular Biology
, American Association for Artificial Intelligence, pp.
279
285
7
Blaschke
,
C.
,
Andrade
,
M.A.
,
Ouzounis
,
C.
and
Valencia
,
A.
(
1999
)
Automatic extraction of biological information from scientific text: protein-protein interactions
.
Proceedings of the International Conference on Intelligent Systems for Molecular Biology
, American Association for Artificial Intelligence, pp.
60
67
8
Hao
,
Y.
,
Zhu
,
X.
,
Huang
,
M.
and
Li
,
M.
(
2005
)
Discovering patterns to extract protein-protein interactions from the literature: Part II
.
Bioinformatics
21
,
3294
3300
9
Yakushiji
,
A.
,
Tateisi
,
Y.
,
Miyao
,
Y.
and
Tsujii
,
J.
(
2001
)
Event extraction from biomedical papers using a full parser
.
Pac. Symp. Biocomput.
2001
,
408
419
PMID:
[PubMed]
10
Rindflesch
,
T.C.
and
Fiszman
,
M.
(
2003
)
The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text
.
J. Biomed. Inform.
36
,
462
477
11
Nguyen
,
N.T.H.
,
Miwa
,
M.
,
Tsuruoka
,
Y.
,
Chikayama
,
T.
and
Tojo
,
S.
(
2015
)
Wide-coverage relation extraction from MEDLINE using deep syntax
.
BMC Bioinformatics
16
,
107
12
He
,
M.
,
Wang
,
Y.
and
Li
,
W.
(
2009
)
PPI finder: a mining tool for human protein-protein interactions
.
PLoS ONE
4
,
e4554
13
Hakenberg
,
J.
,
Leaman
,
R.
,
Vo
,
N.H.
,
Jonnalagadda
,
S.
,
Sullivan
,
R.
,
Miller
,
C.
et al.  (
2010
)
Efficient extraction of protein-protein interactions from full-text articles
.
IEEE/ACM Trans. Comput. Biol. Bioinform.
7
,
481
494
14
Van Landeghem
,
S.
,
Saeys
,
Y.
,
De Baets
,
B.
and
Van de Peer
,
Y.
(
2008
)
Extracting protein-protein interactions from text using rich feature vectors and feature selection
.
MBM ‘08 : Proceedings of the Third Symposium on Semantic Mining in Biomedicine
,
Turku, Finland: Turku Centre for Computer Sciences (TUCS)
, pp.
77
84
15
Ding
,
J.
,
Berleant
,
D.
,
Nettleton
,
D.
and
Wurtele
,
E.
(
2002
)
Mining MEDLINE: abstracts, sentences, or phrases?
Pac. Symp. Biocomput.
2002
,
326
337
PMID:
[PubMed]
16
Bunescu
,
R.
,
Ge
,
R.
,
Kate
,
R.J.
,
Marcotte
,
E.M.
,
Mooney
,
R.J.
,
Ramani
,
A.K.
et al.  (
2005
)
Comparative experiments on learning information extractors for proteins and their interactions
.
Artif. Intell. Med.
33
,
139
155
17
Nédellec
,
C.
(
2005
)
Learning language in logic - genic interaction extraction challenge
.
Proceedings of the 4th Learning Language in Logic Workshop (LLL05)
, Association for Computing Machinery, pp.
31
37
18
Pyysalo
,
S.
,
Ginter
,
F.
,
Heimonen
,
J.
,
Björne
,
J.
,
Boberg
,
J.
,
Järvinen
,
J.
et al.  (
2007
)
Bioinfer: a corpus for information extraction in the biomedical domain
.
BMC Bioinformatics
8
,
50
19
Fundel
,
K.
,
Küffner
,
R.
and
Zimmer
,
R.
(
2007
)
Relex–relation extraction using dependency parse trees
.
Bioinformatics
23
,
365
371
20
Wei
,
C.-H.
,
Allot
,
A.
,
Leaman
,
R.
and
Lu
,
Z.
(
2019
)
Pubtator central: automated concept annotation for biomedical full text articles
.
Nucleic Acids Res.
47
,
W587
W593
21
Leitner
,
F.
,
Mardis
,
S.A.
,
Krallinger
,
M.
,
Cesareni
,
G.
,
Hirschman
,
L.A.
and
Valencia
,
A.
(
2010
)
An overview of BioCreative II.5
.
IEEE/ACM Trans. Comput. Biol. Bioinform.
7
,
385
399
22
Krallinger
,
M.
,
Pérez-Pérez
,
M.
,
Pérez-Rodriguez
,
G.
,
Blanco-Miguez
,
A.
,
Fdez-Riverola
,
F.
,
Capella-Gutierrez
,
S.
et al.  (
2017
)
The BioCreative V. 5 evaluation workshop: tasks, organization, sessions and topics
.
Proceedings of the BioCreative V5 Challenge Evaluation Workshop
, CNIO Centro Nacional de Investigaciones Oncológicas, pp.
8
10
23
Doğan R
,
I.
,
Kim
,
S.
,
Chatr-aryamontri
,
A.
,
Wei
,
C.-H.
,
Comeau
,
D.C.
,
Antunes
,
R.
et al.  (
2019
)
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine
.
Database
2019
,
bay147
24
Murugesan
,
G.
,
Abdulkadhar
,
S.
and
Natarajan
,
J.
(
2017
)
Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature
.
PLoS ONE
12
,
e0187379
25
Yu
,
K.
,
Lung
,
P.-Y.
,
Zhao
,
T.
,
Zhao
,
P.
,
Tseng
,
Y.-Y.
and
Zhang
,
J.
(
2018
)
Automatic extraction of protein-protein interactions using grammatical relationship graph
.
BMC Med. Inform. Decis. Mak.
18
,
42
26
Panyam
,
N.C.
,
Verspoor
,
K.
,
Cohn
,
T.
and
Ramamohanarao
,
K.
(
2018
)
Exploiting graph kernels for high performance biomedical relation extraction
.
J. Biomed. Semantics
9
,
7
27
Zhang
,
Y.
,
Lin
,
H.
,
Yang
,
Z.
,
Wang
,
J.
,
Zhang
,
S.
,
Sun
,
Y.
et al.  (
2018
)
A hybrid model based on neural networks for biomedical relation extraction
.
J. Biomed. Inform.
81
,
83
92
28
Zhang
,
Y.
and
Lu
,
Z.
(
2019
)
Exploring semi-supervised variational autoencoders for biomedical relation extraction
.
Methods
29
Choi
,
S.-P.
(
2018
)
Extraction of protein–protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings
.
J. Inf. Sci.
44
,
60
73
30
Burns
,
G.A.
,
Li
,
X.
and
Peng
,
N.
(
2019
)
Building deep learning models for evidence classification from the open access biomedical literature
.
Database.
2019
,
baz034
31
Chen
,
Q.
,
Chandrasekarasastry
,
N.P.
,
Elangovan
,
A.
,
Davis
,
M.
and
Verspoor
,
K.M.
(
2017
)
Document Triage and Relation Extraction for Protein-Protein Interactions affected by Mutations
.
Proceedings of the BioCreative VI Workshop 2017
, Association for Computing Machinery
32
Tran
,
T.
and
Kavuluru
,
R.
(
2018
)
An end-to-end deep learning architecture for extracting protein-protein interactions affected by genetic mutations
.
Database
2018
,
1
13
33
Fan
,
Z.
,
Soldaini
,
L.
,
Cohan
,
A.
and
Goharian
,
N.
(
2018
)
Relation Extraction for Protein-protein Interactions Affected by Mutations
.
Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB ‘18
,
ACM Press
,
New York, New York, U.S.A.
, pp.
506
507
34
Zhou
,
H.
,
Liu
,
Z.
,
Ning
,
S.
,
Yang
,
Y.
,
Lang
,
C.
,
Lin
,
Y.
et al.  (
2018
)
Leveraging prior knowledge for protein-protein interaction extraction with memory network
.
Database
2018
,
bay071
35
IMEx Consortium Curators
,
Del-Toro
,
N.
,
Duesbury
,
M.
,
Koch
,
M.
,
Perfetto
,
L.
,
Shrivastava
,
A.
et al.  (
2019
)
Capturing variation impact on molecular interactions in the IMEx consortium mutations data set
.
Nat. Commun.
10
,
10
.
36
Li
,
M.
,
He
,
Q.
,
Ma
,
J.
,
He
,
F.
,
Zhu
,
Y.
,
Chang
,
C.
et al.  (
2019
)
PPICurator: a tool for extracting comprehensive protein-protein interaction information
.
Proteomics
19
,
e1800291
37
Chang
,
Y.-C.
,
Chu
,
C.-H.
,
Su
,
Y.-C.
,
Chen
,
C.C.
and
Hsu
,
W.-L.
(
2016
)
PIPE: a protein-protein interaction passage extraction module for bioCreative challenge
.
Database
2016
,
baw101
38
Mallory
,
E.K.
,
Zhang
,
C.
,
,
C.
and
Altman
,
R.B.
(
2015
)
Large-scale extraction of gene interactions from full-text literature using DeepDive
.
Bioinformatics
32
,
btv476
39
Lima
,
R.
,
Espinasse
,
B.
and
Freitas
,
F.
(
2019
)
A logic-based relational learning approach to relation extraction: the OntoILPER system
.
Eng. Appl. Artif. Intell.
78
,
142
157
40
Song
,
M.
,
Kim
,
M.
,
Kang
,
K.
,
Kim
,
Y.H.
and
Jeon
,
S.
(
2018
)
Application of public knowledge discovery tool (PKDE4J) to represent biomedical scientific knowledge
.
Front. Res. Metr. Anal.
3
,
7
41
Chen
,
Q.
,
Panyam
,
N.C.
,
Elangovan
,
A.
and
Verspoor
,
K.
(
2018
)
Biocreative VI precision medicine track system performance is constrained by entity recognition and variations in corpus characteristics
.
Database.
2018
,
bay122
42
Perfetto
,
L.
,
Acencio
,
M.L.
,
Bradley
,
G.
,
Cesareni
,
G.
,
Del Toro
,
N.
,
Fazekas
,
D.
et al.  (
2019
)
CausalTAB: the PSI-MITAB 2.8 updated format for signalling data representation and dissemination
.
Bioinformatics
,
2019
,
1
7
,
btz132
43
Sivade
,
M.
,
Alonso-López
,
D.
,
Ammari
,
M.
,
Bradley
,
G.
,
Campbell
,
N.H.
,
Ceol
,
A.
et al.  (
2018
)
Encompassing new use cases - level 3.0 of the HUPO-PSI format for molecular interactions
.
BMC Bioinformatics
19
,
134
44
Szklarczyk
,
D.
,
Gable
,
A.L.
,
Lyon
,
D.
,
Junge
,
A.
,
Wyder
,
S.
,
Huerta-Cepas
,
J.
et al.  (
2019
)
STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
.
Nucleic Acids Res.
47
,
D607
D613
45
Huerta-Cepas
,
J.
,
Szklarczyk
,
D.
,
Forslund
,
K.
,
Cook
,
H.
,
Heller
,
D.
,
Walter
,
M.C.
et al.  (
2016
)
eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences
.
Nucleic Acids Res.
44
,
D286
D293
46
Kwon
,
D.
,
Lee
,
D.
,
Kim
,
J.
,
Lee
,
J.
,
Sim
,
M.
and
Kim
,
J.
(
2018
)
INTERSPIA: a web application for exploring the dynamics of protein-protein interactions among multiple species
.
Nucleic Acids Res.
46
,
W89
W94
47
Ashtiani
,
M.
,
Nickchi
,
P.
,
Jahangiri-Tazehkand
,
S.
,
Safari
,
A.
,
Mirzaie
,
M.
and
Jafari
,
M.
(
2019
)
IMMAN: an R/Bioconductor package for interolog protein network reconstruction, mapping and mining analysis
.
BMC Bioinformatics
20
,
73
48
Su
,
C.
,
Tong
,
J.
,
Zhu
,
Y.
,
Cui
,
P.
and
Wang
,
F.
(
2018
)
Network embedding in biomedical data science
.
Brief Bioinform.
,
bby117
49
Cho
,
H.
,
Berger
,
B.
and
Peng
,
J.
(
2016
)
Compact integration of multi-network topology for functional analysis of genes
.
Cell Syst.
3
,
540
548.e5
50
Gligorijević
,
V.
,
Barot
,
M.
and
Bonneau
,
R.
(
2018
)
deepNF: deep network fusion for protein function prediction
.
Bioinformatics
34
,
3873
3881
51
Wang
,
S.
,
Ma
,
J.
,
Yu
,
M.K.
,
Zheng
,
F.
,
Huang
,
E.W.
,
Han
,
J.
et al.  (
2018
)
Annotating gene sets by mining large literature collections with protein networks
.
Pac. Symp. Biocomput.
23
,
602
613
PMID:
[PubMed]
52
Elmsallati
,
A.
,
Msalati
,
A.
and
Kalita
,
J.
(
2018
)
Index-based network aligner of protein-protein interaction networks
.
IEEE/ACM Trans. Comput. Biol. Bioinform.
15
,
330
336
53
Gu
,
S.
,
Johnson
,
J.
,
Faisal
,
F.E.
and
Milenković
,
T.
(
2018
)
From homogeneous to heterogeneous network alignment via colored graphlets
.
Sci. Rep.
8
,
12524
54
Vijayan
,
V.
and
Milenkovic
,
T.
(
2018
)
Multiple network alignment via multiMAGNA++
.
IEEE/ACM Trans. Comput. Biol. Bioinform.
15
,
1669
1682
55
Alberich
,
R.
,
Alcala
,
A.
,
Llabrés
,
M.
,
Rosselló
,
F.
and
Valiente
,
G.
(
2019
)
Alignet: alignment of protein-protein interaction networks
.
ArXiv 190207107 [Preprint]
56
Sayers
,
E.
(
2010
) A General Introduction to the E-utilities. In
Entrez Programming Utilities Help
.
National Center for Biotechnology Information (US)
, https://www.ncbi.nlm.nih.gov/books/NBK25497/
57
Cold Spring Harbor Laboratory
.
bioRxiv.org - the preprint server for Biology
58
Caufield
,
J.H.
,
Liem
,
D.A.
,
Garlid
,
A.O.
,
Zhou
,
Y.
,
Watson
,
K.
,
Bui
,
A.A.T.
et al.  (
2018
)
A metadata extraction approach for clinical case reports to enable advanced understanding of biomedical concepts
.
J. Vis. Exp.
139
,
e58392
59
Caufield
,
J.H.
,
Zhou
,
Y.
,
Garlid
,
A.O.
,
Setty
,
S.P.
,
Liem
,
D.A.
,
Cao
,
Q.
et al.  (
2018
)
A reference set of curated biomedical data and metadata from clinical case reports
.
Sci. Data.
5
,
180258
60
Matthies
,
F.
and
Hahn
,
U.
(
2017
)
Scholarly information extraction is going to make a quantum leap with pubmed central (PMC)
.
Stud. Health Technol. Inform.
245
,
521
525
PMID:
[PubMed]
61
Kilicoglu
,
H.
,
Shin
,
D.
,
Fiszman
,
M.
,
Rosemblat
,
G.
and
Rindflesch
,
T.C.
(
2012
)
SemMedDB: a PubMed-scale repository of biomedical semantic predications
.
Bioinformatics
28
,
3158
3160
62
Westergaard
,
D.
,
Stærfeldt
,
H.-H.
,
Tønsberg
,
C.
,
Jensen
,
L.J.
and
Brunak
,
S.
(
2018
)
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts
.
PLoS Comput. Biol.
14
,
e1005962
63
Jamil
,
H.M.
and
Sadri
,
F.
(
2018
)
Crowd enabled curation and querying of large and noisy text mined protein interaction data
.
Distrib. Parallel Databases
36
,
9
45
64
Elsworth
,
B.
,
Dawe
,
K.
,
Vincent
,
E.E.
,
Langdon
,
R.
,
Lynch
,
B.M.
,
Martin
,
R.M.
et al.  (
2018
)
MELODI: mining enriched literature objects to derive intermediates
.
Int. J. Epidemiol.
47
,
369
379
65
Yao
,
H.
,
Shi
,
Y.
,
Guan
,
J.
and
Zhou
,
S.
(
2019
)
Accurately detecting protein complexes by graph embedding and combining functions with interactions
.
IEEE/ACM Trans. Comput. Biol. Bioinform.
66
Wang
,
B.
,
Huang
,
L.
,
Zhu
,
Y.
,
Kundaje
,
A.
,
Batzoglou
,
S.
and
Goldenberg
,
A.
(
2017
)
Vicus: Exploiting local structures to improve network-based analysis of biological data
.
PLoS Comput. Biol.
13
,
e1005621
67
Cejuela
,
J.M.
,
Vinchurkar
,
S.
,
Goldberg
,
T.
,
Prabhu Shankar
,
M.S.
,
Baghudana
,
A.
,
Bojchevski
,
A.
et al.  (
2018
)
Loctext: relation extraction of protein localizations to assist database curation
.
BMC Bioinformatics
19
,
15
68
Cheng
,
L.
,
Liu
,
P.
,
Wang
,
D.
and
Leung
,
K.-S.
(
2019
)
Exploiting locational and topological overlap model to identify modules in protein interaction networks
.
BMC Bioinformatics
20
,
23
69
Brito Correia
,
F.
,
Coelho
,
E.D.
,
Luis Oliveira
,
J.
and
Arrais
,
J.P.
(
2019
).
Handling noise in protein interaction networks
.
BioRxiv 527606 [Preprint]
70
Yu
,
K.-H.
,
Lee
,
T.-L.M.
,
Wang
,
C.-S.
,
Chen
,
Y.-J.
,
,
C.
,
Kou
,
S.C.
et al.  (
2018
)
Systematic protein prioritization for targeted proteomics studies through literature mining
.
J. Proteome Res.
17
,
1383
1386
71
Taha
,
K.
,
Iraqi
,
Y.
and
Al Aamri
,
A.
(
2019
)
Predicting protein functions by applying predicate logic to biomedical literature
.
BMC Bioinformatics
20
,
71
72
Tay
,
A.P.
,
Liang
,
A.
,
Wilkins
,
M.R.
and
Pang
,
C.N.I.
(
2019
)
Visualizing post-translational modifications in protein interaction networks using PTMOracle
.
Curr. Protoc. Bioinformatics
66
,
e71
73
Schmoker
,
A.M.
,
Driscoll
,
H.E.
,
Geiger
,
S.R.
,
Vincent
,
J.J.
,
Ebert
,
A.M.
and
Ballif
,
B.A.
(
2018
)
An in silico proteomics screen to predict and prioritize protein–protein interactions dependent on post-translationally modified motifs
.
Bioinformatics
34
,
3898
3906
74
Hinshaw
,
S.J.
,
Lee
,
A.H.Y.
,
Gill
,
E.E.
and
Hancock R
,
E.W.
(
2018
)
Metabridge: enabling network-based integrative analysis via direct protein interactors of metabolites
.
Bioinformatics
34
,
3225
3227
75
Kleppe
,
R.
,
Martinez
,
A.
,
Døskeland
,
S.O.
and
Haavik
,
J.
(
2011
)
The 14-3-3 proteins in regulation of cellular metabolism
.
Semin. Cell Dev. Biol.
22
,
713
719
76
Ballone
,
A.
,
Centorrino
,
F.
and
Ottmann
,
C.
(
2018
)
14-3-3: a case study in PPI modulation
.
Molecules
23
,
1386
77
Pennington
,
K.
,
Chan
,
T.
,
Torres
,
M.
and
Andersen
,
J.
(
2018
)
The dynamic and stress-adaptive signaling hub of 14-3-3: emerging mechanisms of regulation and context-dependent protein–protein interactions
.
Oncogene
37
,
5587
5604
78
Xu
,
Y.
,
Ren
,
J.
,
He
,
X.
,
Chen
,
H.
,
Wei
,
T.
and
Feng
,
W.
(
2019
)
YWHA/14-3-3 proteins recognize phosphorylated TFEB by a noncanonical mode for controlling TFEB cytoplasmic localization
.
Autophagy
15
,
1017
1030
79
Stenetorp
,
P.
,
Pyysalo
,
S.
,
Topić
,
G.
,
Ohta
,
T.
,
Ananiadou
,
S.
and
Tsujii
,
J
. (
2012
)
brat: a Web-based Tool for NLP-Assisted Text Annotation
. In
Proceedings of the Demonstrations Session at EACL
, Association for Computational Linguistics (ACL), pp.
2012
80
Comeau
,
D.C.
,
Islamaj Dogan
,
R.
,
Ciccarese
,
P.
,
Cohen
,
K.B.
,
Krallinger
,
M.
,
Leitner
,
F.
et al.  (
2013
)
Bioc: a minimalist approach to interoperability for biomedical text processing
.
Database
2013
,
bat064
81
Lin
,
J.-P.
,
Fan
,
Y.-K.
and
Liu
,
H.M.
(
2019
)
The 14-3-3η chaperone protein promotes antiviral innate immunity via facilitating MDA5 oligomerization and intracellular redistribution
.
PLoS Pathog.
15
,
e1007582
82
Kim
,
Y.H.
,
Choi
,
J.
,
Yang
,
M.J.
,
Hong
,
S.P.
,
Lee
,
C.-K.
,
Kubota
,
Y.
et al.  (
2019
)
A MST1-FOXO1 cascade establishes endothelial tip cell polarity and facilitates sprouting angiogenesis
.
Nat. Commun.
10
,
838
83
Clark
,
G.J.
,
Drugan
,
J.K.
,
Rossman
,
K.L.
,
Carpenter
,
J.W.
,
Rogers-Graham
,
K.
,
Fu
,
H.
et al.  (
1997
)
14-3-3 zeta negatively regulates raf-1 activity by interactions with the Raf-1 cysteine-rich domain
.
J. Biol. Chem.
272
,
20990
20993