Introduction and overview

Return to Index

Tutorials

GPRO - Quick walk (spanish)


Introduction and Overview

NGS bioinformatic analysis

Bioinformatic analysis of omic data from next generation sequencing (NGS) usually consists of four essential steps:

  • Pre-processing of raw data including quality analysis, cleaning trimming, clipping and/or demultiplex if proceed
  • Alignment approaches consisting on mapping over reference or de novo assembly of reads into contigs, isotigs, scaffolds etc
  • Analysis and annotation
  • Post-processing and downstream analysis

In the last years, NGS analysis based on the steps summarized above has become one of the most important and productive bioinformatics topics in terms of design and development of databases and software. However, NGS analysis is anything but easy. First of all, conventional PCs are not recommendable for performing NGS analyses because of the high computational requirements to simultaneously deal with thousand and millions of sequences. Therefore, if you are involved in a NGS project you will probably need a workstation or more powerful tool such as a computing server (this is the usual hardware). Secondly, data format and software protocols normally vary depending on the employed NGS platform (Solexa-Illumina, Abi-Solid, Roche, Ion Torrent, etc) and goals and background of a NGS project, it can Whole deNovo genome sequencing, Target Genome/Exome Mapping, Transcriptomics/RNA-seq, Metagenomics, Chip-Seq, etc. So, although bioinformaticians must have an essential background in biology (normally in genetics and molecular biology) they must also have also expertise in informatics, operative systems (usually Unix/Linux) and syntax of commands. On the other hand, much has been done in terms of development but there is still much to do. Indeed, one of the most interesting challenges for the future in Bioinformatics is to provide versatility and automation to the whole bioinformatic NGS analysis so that to let any bio-researcher or lab technician to easily manage complex protocols and pipelines just with the skill levels of a usual PC user. With this aim, we designed and launched GPRO.

About GPRO

GPRO is a "bioinformatic proprietary project" or "an integrative professional solution" in continuous development for genetic analysis and management of NGS (and other sequence) data and databases consisting of two components, a stand-alone installable software coupled online with an infrastructure of computing pipelines. The software is a Java application structured as an eclipse-like workbench of utilities managed by a central menu implementing; sequence and database editors; a worksheet system for annotation and functional analysis; a suite of tools for data-mining and management of files and folders, a FTP protocol and friendly-to-use collection of interfaces for managing the pipelines in a remote server. All actions of the software are quite intuitive. You can launch an analysis by just selecting a file or a folder, dragging it to a box of options and then making click. It is however recommended to read the manual before beginning to work with GPRO. The online component of GPRO is a package of pipelines enabling the users to run intensive computational jobs in remote private sessions. These jobs may be BLAST or HMM Searches, Mapping/Assembling Runs, Exome Analysis with SNP/Indel Calling, Gene Prediction, Mobilome annotation, GO-annotation, RNA-seq, Metagenome Analysis, Downstream Analysis, etc. Some of these are not yet ready because development and availability of pipelines is a work in continuous progression where we follow protocols based on the implementation of well-known free-source tools, together with a collection of tailored scripts and graphic interfaces (available in the GPRO software menu) to automate the whole flow of data.

By default the distinct GPRO pipelines are installed on the high-end computing servers of GyDB (Llorens et al. 2011), which give the package its name. The term “GPRO” is the acronym of “Gypsy Database PROfessional”. The default package also includes, for each GPRO licensee, user accounts in the computing server with space in the hard disk and computing time as well as a FTP protocol with which users can easily transfer files from their PCs to their user accounts. Of course, you can acquire and install the GPRO package in your own server or workstation but bear in mind that the package does not include third party software. We provide the scripts and interfaces for easily running the pipelines but you must to download and to install all the free-source code tools over which we design a pipeline flowchart. If you are not an expert Bioinformatician, do not worry about this, we give technical support for all the steps needed for installing and running GPRO. In summary, GPRO is an ideal tool for experts of laboratory interested in excellence bioinformatics but maintaining computing skill levels as simple software-users because it implements multiple NGS functions accessible through various easily handleable menus and an intuitive layout organized in graphical interfaces and easy-to-use mouse actions. However, it could also be interesting to highlight the idea of that GPRO is also useful for bioinformatic departments and/or sequencing services interested in providing to their users an integrative tool to navigate and manage the results and databases derived from the annotations and NGS projects.

Current version and further implementations

The current GPRO version is 1.1. To manage GPRO you will probably need familiarization with the most basic concepts of bioinformatics and computational biology. This wikisite constitutes the manual of GPRO that will be updated in parallel to the progression of new versions of GPRO. We will try to upload practical examples, videos, etc. Anyway, if you are new to the subject, it would also be good for you to read some essential bibliography in the topic such as the following references (Durbin et al. 2009; Higgs and Attwood 2005). By clicking this link you can access again the main web site of GPRO where you can purchase the tool, find a trial version, and/or find additional information. Bear in mind that GPRO project is an autosustainable initiative that we maintain and update without grants, just with the funding support of clients. This means that the way to get and use GPRO is by purchasing a license (for more details go to the download site). The good news is however that the project is constantly upgraded and updated according to the comments, needs and feedbacks provided by our users.

Return to Index




Welcome to the Gypsy Database (GyDB) an open editable database about the evolutionary relationship of viruses, mobile genetic elements (MGEs) and the genomic repeats where we invite all authors to contribute with their knowledge to improve and expand the topics.
Cite this project:

Llorens, C., Futami, R., Covelli, L., Dominguez-Escriba, L., Viu, J.M., Tamarit, D., Aguilar-Rodriguez, J. Vicente-Ripolles, M., Fuster, G., Bernet, G.P., Maumus, F., Munoz-Pomer, A., Sempere, J.M., LaTorre, A., Moya, A. (2011) The Gypsy Database (GyDB) of Mobile Genetic Elements: Release 2.0 Nucleic Acids Research (NARESE) 39 (suppl 1): D70-D74 doi: 10.1093/nar/gkq1061

Contact - Announcements - Acknowledgments - Terms of use and policy - Help - Donate
Donating legal disclaimer - Terms and conditions of the donation