GPRO provides two editing programs available via the Editor Tab (the third icon on the menu). The first is a database (plain text) editor designed for editing and manipulating database files (plain files in FASTA) while the second is sequence editor aimed at the molecular analysis of nucleotide and amino acid sequences.
The database editor lets you to edit and browse sequences and their contents in plain fasta databases and files. Basic actions, such as copy, paste cut of sequence can be done by just right clicking the mouse. As shown in Figure 6.1, the GUI of database editor offers an intuitive menu (highlighted in red) and two graphical components; 1) the editing framework where sequences can be edited at any time, users can in fact write any kind of text but it is preferable to follow the FASTA format; and 2) the Fasta Explorer, which is a list of the names of all sequences contained in the edited file.
Figure 6.1 Database editor screenshot: 1) Database editor frame; 2) Fasta Explorer |
Almost all the utilities provided by the database editor are organized into a menu bar displayed at the top of the Editor (highlighted in with a red rectangle in Figure 6.1). There follows a description for each Tab according to its provided tools
This is to open, save and close files
The database editor provides "Search" and "Search and replace" utilities over the sequence names using three distinct options ("Exact term", "Case sensitive" and "Regular expression"). The first (Exact term) permits the user to search sequences according to their exact name. The second (case sensitive) distinguishes uppercases from lowercases in the search. The third option (Regular expression) considers particular characters, words or patterns written in formal language allowing users to identify parts that match the specification provided by a particular grammar (for instance a consensus pattern)
This tool is for selecting a set of selected sequences (or their fasta headers only) from a fasta file and to remove them or to export them to another file. By clicking on this tab a new window (Figure 6.2) will appear showing a summary of all sequences names and different options. You only need to enter a particular term of interest (for instance MOV as it is in figure) in the ‘Search’ box and then to choose a Filter options. The tool provides four options. Three of them are the usual "Exact match", "Case sensitive" and "Regular expression". The fourth option - "Append selections" - lets you to perform a new selection using a different key term adding this result to a previous selection. If the tool finds any sequence labeled according to the search it will be highlighted in the summary. Finally, the tool allows to export or to remove the selected sequences from the database file, as sequences or as fasta headers only
This is for searching sequence patterns (motifs) by parsing all sequences in a nucleotide or protein sequence file. Found motifs can be exported to a new file text or CSV file.
To cancel or reverse the effects or results of a previous editing action
Figure 6.2. "Export and remove Seqs dialog”. |
As previously shown in Figure 6.1, the Fasta Explorer is a summary of all sequences contained in the edited file based on their fasta headers. By clicking on any header listed in the Fasta Explorer the editor navigates the file and drives the user to the position of the selected sequence within the file. The Fasta Explorer is ideal when searching and selecting particular sequences from large database files (for instance a file containing a full length genome) which cannot be usually edited by conventional text editors because of their high size. The fasta Explore lets you to navigate the file sequence-to-sequence allowing you to easily make editing actions on the selected sequence.
TIME (Munoz-Pomer et al. 2011) is the other editor that can be launched via the Editor Tab. TIME is a sequence editor that permits editing, displaying and molecular analysis of sequences up to 2 x 109 bases (two gigabases or amino acids), which will suffice for the largest chromosomes known to date [54]. TIME is implemented as a GPRO plug-in. Its GUI is organized in three components (Figure 6.3). First, the menu bar which lets you to access the distinct TIME functions and utilities. Second, the Sequence Editing Frame where you can select, cut, paste and edit sequences and frames within sequences with the mouse. And third, the Results Table where you can display a summary of results derived from ORF or motif searches performed via menu or export their annotation in a CSV file or a fasta database.
Figure 6.3. Screenshot of TIME. 1) TIME menu; 2) Sequence Editing Frame; 3) Results Table |
TIME implements a horizontal menu consisting on the following Tabs and functions:
To create, open, save and close amino acid and nucleotide sequence FASTA files and quit the program. If the chosen file is a database containing multiple sequences, TIME will open a tab with a summary of all available sequences. By double clicking on a sequence name in that summary the tool will show it in the sequence editing frame.
This tab contains the cut, copy and paste functions. With this tab you can also undo and redo each individual change using the “Undo” and “Redo” utilities and unlock the sequence for editing if you want to make edits on it ( Sequences are locked by default).
As shown in Figure 6.4 This tab invokes a pop-up dialog allowing you to translate the sequence in analysis to all six reading frames (or only any of them by checking the boxes). The standard genetic code is used by default. However, clicking on “Edit” beside “Custom genetic code” will take you to the genetic code editor. In addition to editing the translation codons, users can rename it, save it to a file or to open a previously saved custom code. The default colors for start and stop codons are blue and red, respectively, but they can be changed to color using the palettes shown in the dialog.The Translate utility of TIME can also open Gene Runner’s translation table format (.trt files) and a native plain text format, which can be easily created in the editor of your choice. In these files, lines starting with the hash symbol (“#”) are interpreted as comments and ignored, with the exception of the first line, which holds the name of the code; however, this line is not mandatory. Each following line is formed by a codon (RNA and DNA are allowed), a hyphen and a greater than symbol (“->”) followed by an amino acid symbol (according to the 1-letter IUPAC codes).
Figure 6.4. TIME editor screenshot and pop-up for protein translation and genetic code. |
This tab, allows you to change from DNA to RNA and vice versa, and to view either RNA or DNA sequences as a single strand or a double strand.
To switch among the following options: reverse, complementary and reverse-complementary.
Using this tab you can search and retrieve ORFs in both forward and reverse frames specifying a condition of minimum ORF length. Then a report with all ORFs fitting the length conditions specified and their coordinates is summarized in the Results table of the editor (to the right, number 3 in Figure 6.3). By double-click the row of summarized ORF such an ORF will be selected and highlighted in the Sequence Editing Frame. Yu can annotate the whole set of ORFs summarized (or many of them) by exporting those selected in the checkbox left the column description (see Figure 6.3), as a CSV or as a fasta file. In the second case, the tool lets you to export the ORFs as a nucleotide sequence or as a protein.
You can perform searches for particular protein or nucleotide motifs (binding sites, restriction sites, etc) over the sequence in analysis, using the tab “Find motifs" (Figure 6.5A). Results are shown at the Results table with identical format to that of Find ORFs search results. You can also use the “Find motifs” tool to search multiple patterns and or motifs as single occurrences or as clusters of motifs. By clicking on the “Multiple Motif Editor” button below and to the right within the “Find motif” dialog a new dialog will be opened (Figure 6.5B). In doing so, you can add and remove as motifs as necessary and give a name for each and then select at the top-right of this dialog the “Multiple motif” search mode. The search can be performed for single occurrences (motifs will be searched independently) or as Clustered motifs (motifs falling together in a sequence frame).By selecting the latter you will allowed to specify three new parameters in the search; 1) minimum cluster size (for instance a frame of 500 nucleotides); 2) minimum number of motif within the cluster; 3) decide if clusters overlap (overlapping clusters option) or not (Disjoint clusters). Note that you can also add motifs loaded from a file using the tab Load file. For instance you can use a FASTA file list of enzyme restriction sites downloaded from the Rebase web site.
Figure 6.5. Find motifs screenshot. A) “Find Motifs” dialog. By clicking on the tab to the right and below you will open the multiple motif editor (B and to the right) where you can add motifs and performing searches as single occurrences or as cluster of motifs. |
Llorens, C., Futami, R., Covelli, L., Dominguez-Escriba, L., Viu, J.M., Tamarit, D., Aguilar-Rodriguez, J. Vicente-Ripolles, M., Fuster, G., Bernet, G.P., Maumus, F., Munoz-Pomer, A., Sempere, J.M., LaTorre, A., Moya, A. (2011) The Gypsy Database (GyDB) of Mobile Genetic Elements: Release 2.0 Nucleic Acids Research (NARESE) 39 (suppl 1): D70-D74 doi: 10.1093/nar/gkq1061