GoPubMed Ontology Generation Plugin for OBO-Edit 2

by Thomas Wächter

translated January 29, 2009




The ontology learning process is structured in three major steps.

  1. Term Generation
    extracting the terminology from text
  2. Definition Generation
    finding phrases suitable for natural language definitions for terminology
  3. Add to Ontology
    providing suggestions for place to add newly defined terms in the ontology

In the following a work flow will be demonstrated on the example of the creation of an ontology on stem cells.

Open the plug-in

The plug-in can be found in the menu point ``Tools/GoPubMed Ontology Generation Tool''. Click on ``GoPubMed Ontology Generation Tool'' to open the plugin. You might also want to add the ``Ontology Tree Editor'' which can be found under the menu point ``Editors/Ontology Tree Editor''.

Load the ontology

Load an ontology e.g. cell.obo using the ``File/Load Ontologies...''. The ontology can be loaded via the URL http://obo.cvs.sourceforge.net/checkout/obo/obo/ontology/anatomy/cell_type/cell.obo.

Figure: Read Ontology dialog of OBO-Edit2

Image OBOEdit-Ontogen1_ReadOntologyDialog

Figure: Ontology Tree Editor with Gene Ontology loaded from OBO flat file.

Image OBOEdit-Ontogen2_LoadedGO

Step 1: Term Generation

Submit text and extract the terminology

Paste the text in the text area and on click ``Generate'' the terminology is extracted. The terms will be presented by importance (most important first);

Figure: Candidate terms extracted from text.

Image OBOEdit-Ontogen3_TermGenerationFromText

Query PubMed and extract the terminology

Submit the query stem cell. The abstracts get retrieved from PubMed and the terminology is extracted. The terms will be presented by importance (most important first);

Figure: Candidate terms extracted from PubMed abstracts for query stem cell.

Image OBOEdit-Ontogen3_TermGenerationStemCell

Search or Filter terms with regular expressions

The list of candidate terms can be easily searched and filtered using regular expression patterns. The patterns are evaluated case insensitive. For further documentation on regular expressions visit http://en.wikipedia.org/wiki/Regular_expression

In the following some examples for regular expressions are given:

Figure: Searching and selecting bone marrow.

Image OBOEdit-Ontogen4_SearchBoneMarrow

Figure: Filtering candidate terms that end with stem cell or stem cells.

Image OBOEdit-Ontogen5_FilterStemCell

Select terms, adding terms to clipboard

By clicking the checkboxes in the beginning of the each row or by hitting the space bar on a selected row selects a term and automatically adds it to the clipboard.

Figure: Selecting candidate terms.

Image OBOEdit-Ontogen6_SelectTerm

Several queries for a topic can be issued, the selections are stored as long as the plugin window exists. For the example on stem cell, one might want search for human stem cell as well to get more specific existing terminology.

Figure: Refine results by generating candidate terms for the query human stem cell

Image OBOEdit-Ontogen7_TermGenerationHumanStemCell

Figure: Filtered results for the query human stem cell

Image OBOEdit-Ontogen7_TermGenerationHumanStemCell_filtered

Loading, saving clipboard and clipboard behaviour

The GoPubMed Ontology Generation plugin allows you to save the clipboard content to file and loading it from file respectively. Termsin the clipoard are getting enriched with synonyms and abbreviations when found in later queries. All generated candidate terms matching terms labels from the clipboard get automatically ticked and the merged representation is use henceforth.

Figure: Save file dialog, when saving the clipboard

Image OBOEdit-Ontogen8_ClipboardSave

Figure: Simple tab delimited file containing the clipboard content.
human embryonic stem cell [hESC]
hemopoietic stem cells [HSC|HSCs]
embryonic stem cell [ESC]
mesenchymal stem cell [MSCs|MSC]
neural stem cells [NSCs|NSC]
hematopoietic stem cell [HSC|HSCs]
human mesenchymal stem cell [hMSC|hMSCs]
adipose-derived stem cells [ADSCs] adipose-derived stem cells
human neural stem cells [hNSC] human neural stem cells
human neural stem cell [hNSC] human neural stem cell
peripheral blood stem cells [PBSC]
prostate cancer stem cell []
limbal stem cells [LSCs|LSC]

Step 2: Definition Generation

Generated Definitonal Phrases

When clicking on the icon labeled DEF the definitions table below is filled with definitional phrases for the candidate term. The information button labeled ``I'' links to the web page the definition was originally retrieved from.

Figure: Generate definitional phrases for the term mesenchymal stem cell.

Image OBOEdit-Ontogen9_DefinitionGeneration_msc

Again, definitions can be filtered as shown for candidate terms above. When a definitional phrases was selected it apears in the editing area below the definition table. Known abbreviations found in the analysed texts are shown next to it. With the help of the generated definitional statements, the user can quickly write a definition for the term. ``Save definition'' will make OBOEdit remember the definition till it is closed

Figure: Edit the definition for the term mesenchymal stem cell.

Image OBOEdit-Ontogen11_DefinitionEditing_msc

Step 3: Add to Ontology

In the final step the newly discovered and defined term needs to be placed in the ontology. The plug-in provides support to find the likly parent terms in the ontology and enables adding in a user friendly way. The following ways exist to select the future parent terms:

  1. The parent term is an existing ontology term found in the definition of the term to add.
  2. The parent term is in the list of terms the user searched for.
  3. The parent term is the selected term in the Ontology Tree Editor.
  4. The parent term is one of the parent terms of the currently selected term.
  5. The parent term is a similar existing terms from the ontology.

Figure: Proposing suitable parent terms for the term human mesenchymal stem cell.

Image OBOEdit-Ontogen12_AddToOntology

Finally the new term was added to the Ontology developed in OBO-Edit and can be altered using the e.g. ``Text Editor''.







For feedback please contact the developer of the plug-in:
 
Thomas Wächter
Bioinformatics Group
Biotechnology Center
Techical University Dresden
01062 Dresden
 
email: thomas.waechter@tu-dresden.de
 













About this document ...

GoPubMed Ontology Generation Plugin for OBO-Edit 2

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 0 -no_navigation -show_section_numbers obo-edit-ontogen.tex

The translation was initiated by Thomas Waechter on 2009-01-29


Thomas Waechter 2009-01-29