Efficiently mining protein interaction dependencies from large text corpora

Köster, Johannes; Zamir, Eli; Rahmann, Sven

doi:10.1039/c2ib00126h

J. Köster (Johannes), Zamir, E. (Eli) and S. Rahmann (Sven)

2012-07-01

Efficiently mining protein interaction dependencies from large text corpora

Integrative Biology , Volume 4 - Issue 7 p. 805- 812

Biochemical research has yielded an extensive amount of information about dependencies between protein interactions, as generated by allosteric regulations, steric hindrance and other mechanisms. Collectively, this information is valuable for understanding large intracellular protein networks. However, this information is sparsely distributed among millions of publications and documented as freely styled text meant for manual reading. Here we develop a computational approach for extracting information about interaction dependencies from large numbers of publications. First, keyword-based tokenization reduces full papers to short strings, facilitating an efficient search for patterns that are likely to indicate descriptions of interaction dependencies. Sentences that match such patterns are extracted, thereby reducing the amount of text to be read by human curators. Application of this approach to the integrin adhesome network extracted from 59933 papers 208 short statements, close to half of which indeed describe interaction dependencies. We visualize the obtained hypernetwork of dependencies and illustrate that these dependencies confine the feasible mechanisms of adhesion sites assembly and generate testable hypotheses about their switchability.

Additional Metadata
Persistent URL	doi.org/10.1039/c2ib00126h
Journal	Integrative Biology
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Köster, J., Zamir, E. (Eli)& Rahmann, S. (2012). Efficiently mining protein interaction dependencies from large text corpora. Integrative Biology, 4(7), 805–812.https://doi.org/10.1039/c2ib00126h

View at Publisher

Efficiently mining protein interaction dependencies from large text corpora

Publication

Publication

Address

CWI researchers

Questions or comments?

Efficiently mining protein interaction dependencies from large text corpora

Publication

Publication

Workflow

Workflow

Add Content