HOMOLOGY SEARCHES: BLAST (Basic Local Alignment Search Tool) & FASTA
BACKGROUND INFORMATION: The three BLAST programs that one will commonly use are BLASTN, BLASTP and BLASTX. BLASTN will compare your DNA sequence with all the DNA sequences in the nonredundant database (nr). BLASTP will compare your protein sequence with all the protein sequences in nr. In BLASTX your nucleotide sequence will be translated in all six reading frames and the products compared with the nr protein database. A tutorial is available at NCBI.
BLAST Homepage - (NCBI)
Nucleotide BLAST (BLASTN)
Protein BLAST (BLASTP) N.B. This program is also coupled with a motif search.
Translated BLAST (BLASTX)
Blast with Microbial Genomes (BLASTN, TBLASTN, TBLASTX etc.). Permits one to compare a nucleic acid or protein sequence against finished & unfinished archaeal and bacterial genomes.
N.B. Depending upon the time of day your results may appear almost immediately or your search may be delayed or not accepted at all. Be prepared for plenty of results. You may only want to print the first few pages (e.g.1-5). Alternatively under "Format" change the "Number of Descriptions" from 100 (default) to 10 or 50 and the "Number of Alignments" from 50 (default) to 10. Please note that by clicking on the sequence identifier you can gain further information on the sequence (ENTREZ) and the literature reference (MEDLINE).
EMB BLAST - (European Molecular Biology network - Swiss node). Very convenient since it permits one to specifically search databases such as prokaryote, bacteriophage, fungal, & 16S rRNA using BLASTN, and specific bacterial genomes or SwissProt using BLASTX or BLASTN.
ParAlign (CMBN Bioinformatics Group, University of Oslo, Norway) - employs a heuristic method for sequence alignment. In essence, ParAlign is about as sensitive as Smith-Waterman but runs at the speed of BLAST. Nice graphics.
GTOP Sequence Homology Search (Laboratory for Gene-Product Informatics, National Institute of Genetics, Japan) - offers BLASTP search capability against individual Archaea, Bacteria, Eukaryota, and viruses.
T4-like Phage NCBI MegaBLAST (Tulane Univ., New Orleans, U.S.A. & CNRS, Toulouse, France) - includes a growing list of T4-like completed phage sequences as well as those in the draft and contig stages of completion.
WU-BLAST (Washington University BLAST) - The emphasis of this tool is to find regions of sequence similarity quickly, with minimum loss of sensitivity. This will yield functional and evolutionary clues about the structure and function of your novel sequence.
Batch BLAST (Greengene web server; developed by Michael V. Graves for DNA or protein BLAST sequence analysis against the NCBI databases. It allows one to submit a file that contains multiple sequences and then will organize the results by each individual sequence contained in the file. The results will be available on the server as individual html files. An alternative site is here .
For more sophisticated studies you might want to employ:
PSI-BLAST or PHI-BLAST search - (NCBI) Position-Specific Iterative BLAST creates a profile after the initial search. This is used subsequent searches. Tutorial.
BLAST 2 - (NCBI) BLAST two sequences against one another. N.B. This utilizes BLASTN, P, X as well as TBLASTN and TBLASTX.
Gene Context Tool - is an incredible tool for visualizing the genome context of a gene or group of genes (synteny). In the following diagram an RpoN (Sigma54) protein was analyzed. (Reference: R. Ciria et al. (2004) Bioinformatics 20: 2307-2308).
Other search engines include:
Fasta33 - (EBI) I particularly like the Visual Fasta presentation of the data. I think that it is better than what one gets on BLAST searches.
TC-BLAST (Saier Laboratory Bioinformatics Grp, Univ. San Diego, U.S.A.) - Scans the transport protein database (TC-DB) producing alignments and phylogenetic trees. The TC-DB details a comprehensive classification system for membrane transport proteins known as the Transport Commission (TC) system.
MEROPS BLAST - permits one to screen protein sequences against an extensive database of characterized peptidases (Rawlings, N.D., O'Brien, E. A. & Barrett, A.J. (2002) MEROPS: the protease database. Nucleic Acids Res. 30, 343-346).
SEARCHGTr - is a web-based software for the analysis of glycosyltransferases involved in the biosynthesis of a variety of pharmaceutically important compounds like adriamycin, erythromycin, vancomycin etc. This software has been developed based on a comprehensive analysis of sequence/structural features of 102 GTrs of known specificity from 52 natural product biosynthetic gene clusters (Reference: Kamra, P. et al. 2005. Bioinformatics 33 (Web Server Issue): W220-W225).
PipeAlign (Laboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire, France ) offers an integrated approach to protein family analysis through a cascade of five different sequence analysis programs (BALLAST, DbClustal multiple alignment program, Rascal alignment analysis, removal of any sequences that do not belong to the protein family are performed by the NorMD, and clustered into potential functional subfamilies using Secator or DPC. Reference: F. Plewniak et al. 2003. Nucleic Acids Research, 31: 3829-3832.
MPsrch (EMBL-EBI) - this sequence sequence comparison tool implements the true Smith and Waterman algorithm identifying hits in cases where Blast and Fasta fail and also reports fewer false-positives. Provides information on: Match %; % Query Match (% of the query sequence matched); Conservative changes; Mismatches; Indels; and Gaps.
GOAnno (University of Strasbourg, France) - this web tool automatically annotates proteins according to the Gene Ontology using hierarchised multiple alignments. Positioning the query protein in its aligned functional subfamily represents a key step to obtain highly reliable predicted GO annotation based on the GOAnno algorithm.
COMPASS - is a profile-based method for the detection of remote sequence similarity and the prediction of protein structure. The server features three major developments: (i) improved statistical accuracy; (ii) increased speed from parallel implementation; and (iii) new functional features facilitating structure prediction. These features include visualization tools that allow the user to quickly and effectively analyze specific local structural region predictions suggested by COMPASS alignments.(Reference: R.I. Sadreyev et al. 2009. Nucl. Acids Res. 37(Web Server issue:W90-W94)
Unique search engine:
MineBlast - performs BLASTP searches in UniProt to identify names and synonyms based on homologous proteins and subsequently queries PubMed, using combined search terms in order to find and present relevant literature. This tool only allows max. 100 queries per user per day. (Reference: G. Dieterich et al. 2005. Bioinformatics 21: 3450-3451).
Comparison of homology between two small genomes:
SCAN2 (Softberry.com) provides one with a colour-coded graphical alignment of genome length DNAs in Java. In the top panel regions of high sequence identity are presented in red. By highlighting the gray, yellow, green, black boxes one can select specific regions for examination of the sequence alignment. For additional information on the output see here. This site appears to work best with Internet Explorer.
Advanced PipMaker (Schwartz et al. Genome Research Vol. 10, Issue 4, 577-586, April 2000) aligns two DNA sequences and returns a percent identity plot of that alignment, together with a traditional textual form of the alignment. You might want to download Laj (Penn State - Bioinformatics Group, U.S.A.) for viewing and manipulating the output from pairwise alignment programs such as PipMaker representations of the alignments.
JDotter: A Java Dot Plot Viewer ( Viral Bioinformatics Resource Center, University of Victoria, Canada) - a dot matrix plotter for Java. Produces similar diagrams to the above mentioned programs, but with better control on output.
multi-zPicture: multiple sequence alignment tool (Comparative Genomics Center, Lawrence Livermore National Laboratory, U.S.A.) - provides nice dotplot graphs and dynamic visualizations. If simple gene locations are provided in the form (e.g. > 2000 5000 RNA_polymerase; indicates the the RNA polymerase gene is found on the plus strand between bases 2000 and 5000) this data will be added to the dynamic visualization. zPicture alignments can be automatically submitted to rVista to identify conserved transcription factor binding sites.
GeneOrder 3.0 (D. Seto, Bioinformatics & Computational Biology, George Mason Univ., U.S.A.) is ideal for comparing small GenBank genomes (up to 2 Mb). Each gene from the Query sequence is compared to all of the genes from the Reference sequence using BLASTP. There are two display formats: graphical and tabular. Currently the graph is an applet and must be saved as a "SCREEN SHOT". If your data is not present in GenBank use this site.
CoreGenes (D. Seto, Bioinformatics & Computational Biology, George Mason Univ., U.S.A.) is designed to analyze two to five genomes simultaneously, generating a table of related genes - orthologs and putative orthologs. These entries are linked to their GenBank data. It has a limit of 0.35 Mb, while the newer version CoreGenes 2.0 extends the limit to approx. 2.0Mb. If your data is not present in GenBank use this site. The following diagram is from an analysis of coliphages T3, T7, Yersinia phage phi-YeO3-12 and Roseophage S10I.
CoreGenes 3.0 - is the latest member in the CoreGenes family of tools. It determines unique genes contained in a pair of proteomes. (Caveat: Currently only supports a
single pairs of genomes). This has proved exctremely useful in determining unique genes in comparisons between large Myoviridae
REPEATS, SECONDARY STRUCTURE & MELTING TEMPERATURE
REPEATS, SECONDARY STRUCTURE
DNA often contains reiterated sequences of differing length. These include direct (e.g. GAAT-N6-GAAT) and inverted (GAAT-N6-ATTC) repeats. The later, if sufficiently close may form stable stem-loop structures. For secondary structures of RNA or DNA I recommend most highly Michael Zuker’s sites:
For RNA folding use MFold (Michael Zuker, Rensselaer Polytechnic Institute, U.S.A.). N.B. The data can be presented in a number of graphic formats. For DNA sequences use this site.
Vienna RNA secondary structure prediction (University of Vienna, Austria). I have found this site useful for drawing tRNAs in cloverleaf format.
pknotsRG (Universität Bielefeld, Germany) - is a series of 3 tools for folding RNA secondary structures, including the class of simple recursive pseudoknots. Unfortunately to optimally view the results one needs Microsoft.NET framework (massive) and PseudoViewer2 (School of Computer Science and Engineering, Inha University, Korea).
REPuter - fast computation of maximal repeats in complete genomes (S. Kurtz & C. Scheiermacher @ Universitat Bielefeld, Germany) - interesting graphical representation of repeats.
REPFIND (ZLAB, Dr. Zhiping Weng, Boston University, U.S.A.) - on sequences of less than 20kb it provides graphical and statistical analysis on direct repeats.
einverted, palindrome and equicktandem - (EMBOSS) - find inverted and tandem repeats
CRISPRfinder Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) present a curious repeat structure found in many prokaryotic genomes. They show characteristics of both tandem and interspaced repeats. (Reference: I. Grissa et al. 2007. Nucl. Acids Res. 35(Web Server issue): W52-W57).
For those with a fuller knowledge of DNA secondary structure you might want to visit ICGEB: International Centre for Genetic Engineering and Biotechnology (Italy)
bend.it and plot.it
model-it - (K. Vlahovicek & S. Pongor) produces incredible pictures of DNA using a variety of parameters. Right click on screen to download the picture, which may not be visible. N.B. you will require Rasmol to visualize the results (*.pdb file).
Sequence-Directed DNA Curvature Wedge Model Formalism and DNA Path Computation (Haifa Genome Diversity Center, University of Haifa, Israel) - this site also gives an elegant output which requires a VRML plug-in and converter for visualizing and manipulating the results. I found that Internet Explorer worked best.
DNAcurve - (D. Wheeler, Marine Biological Laboratory, U.S.A.) also analyzes DNA for curvature using a dinucleotide wedge model. The diagrams are not as high quality. Appears to work best with Netscape.
MELTING TEMPERATURE
Knowing the melting temperature of a fragment of DNA or of an oligonucleotide is invaluable in the determining optimal conditions for carrying out hybridizations. All of the PCR design sites will provide information on oligonucleotides the following will accommodate longer sequences:
Poland service request form (Heinrich Heine-Universität Düsseldorf, Germany). Please note, in Netscape hit the "stop" button to turn off the animation which may interfere with your ability to use this site.
DAN (Le Centre de Bioinformatique de Bordeaux, France ) - provides one with a plot (in postscript). For a complete picture of your sequence change "window size" to the size of your fragment, change "shift increment" to zero, and click on "Produce a plot".
Hybridization of two different strands of DNA or RNA - Computations consider 5 different ensembles of structures. Partition function calculations are performed for the heterodimer, the two possible homodimers, and for folding of both single strand. Ensemble free energies are computed, leading to simulation of heat capacity, Cp, as a function of temperature. Base pair probabilities are computed and combined with published extinction coefficients to simulate UV absorbance as a function of temperature. (Reference: N.R. Markham & M. Zuker 2005. Nucl. Acids Res. 33: W577-W581).
Homodimer simulations - This simulation considers both the folding and dimerization of one single-stranded DNA or RNA molecule. (Reference: N.R. Markham & M. Zuker 2005. Nucl. Acids Res. 33: W577-W581).
DESIGN PCR PRIMERS
BACKGROUND INFORMATION: For sites describing PCR theory, as well as companies marketing PCR products you might want to begin by visiting Horizon Press. For PCR techniques see PCRlink.com.
There are several excellent sites for designing PCR primers:
Primer3: WWW primer tool (University of Massachusetts Medical School, U.S.A.) – This site has a very powerful PCR primer design program permitting one considerable control over the nature of the primers, including size of product desired, primer size and Tm range, and presence/absence of a 3’-GC clamp.
GeneFisher - Interactive PCR Primer Design (Universitat Bielefeld, Germany) - a very good site allowing great control over primer design.
PCR Now (Computational Biology Group, PathoGene, Southwestern Medical Center, U.S.A.) - created to design Real-Time Polymerase Chain Reaction (RT-PCR) primers for any number of user-defined coding sequences. Great control over primer properties. If you are interested in designing primers specific to published organismal or viral genes see the related site PathoGene.
Primer3Plus - a new improved web interface to the popular Primer3 primer design program (Reference: A. Untergasser et al. 2007. Nucl. Acids Res. 35(Web Server issue):W71-W74)
OligoCalc: an online oligonucleotide properties calculator - (Reference: W.A. Kibbe. 2007. Nucl. Acids Res. 35(Web Server issue):W43-W46)
Primer-BLAST was developed at NCBI to help users make primers that are specific to the input PCR template. It uses Primer3 to design PCR primers and then submits them to BLAST search against user-selected database. The blast results are then automatically analyzed to avoid primer pairs that can cause amplification of targets other than the input template.
JOG 1.01 Javascript Oligonucleotide Generator (R.D. Mosteller) - will generate Fixed length and composition or Random length and composition oligonucleotides.
RAPD-primer generator (J.Wöstemeyer, Institute of General Microbiology and Microbial Genetics, Germany)
PCR primers based upon protein sequence:
If you has the protein sequence and want the DNA sequence the best sites are Reverse Translate a Protein (Colorado State, U.S.A.) or the Java tool Protein backtranslation (Entelechon, Germany). This site provides one with a wealth of options include organism-specific codon usage. If you are interested in changing a specific amino acid into another you should consult Reverse Translator (EMBL). One other site is CODEHOP (Fred Hutchinson Cancer Research Center, Washington, U.S.A.). The acronym is from COnsensus-DEgenerate Hybrid Oligonucleotide Primers, and is used to design primers based upon multiple sequence alignments.
PCR primers based upon multialignments:
Primaclade (Molecular Systematics Laboratory at the University of Missouri - St. Louis, U.S.A.) - this application accepts a multiple species nucleotide alignment file as input and identifies a set of PCR primers that will bind across the alignment.
PriFi - upload a file containing Fasta-formatted DNA sequences or alternatively a *.aln file, select the control one wants over the primer design from an extensive list and press "Find primers in alignment." (Reference: J. Fredslund et al. 2005. Nuc. Acids Res. 33: W516-W520).
Genomic scale primers: (N.B. also see the JAVA page for additional downloadable programs)
The PCR Suite (Klinische Genetica, Erasmus MC Rotterdam, Netherlands) - this is a suite of four programs based upon Primer3 for genomic primer design. All offer considerable control on primer properties:
Overlapping_Primers - creates multiple overlapping PCR products in one sequence.
Genomic_Primers - designs primers around exons in genomic sequence. All you need is a GenBank file containing your gene.
SNP_Primers - designs primers around every SNP in a GenBank file.
cDNA_Primers - designs primers around open reading frames. Simply upload a GenBank file containing your genes.
MuPlex: multi-objective multiplex PCR assay design - designed for large-scale multiplex PCR assay design in an automated high-throughput environment, where high coverage is required. (Reference: J. Rachlin et al. 2005. Nucl. Acids Res. 33: W544-W547).
Overlapping primer sets:
Overlapping Primersets - This software is based on the Primer3 program developed by the Whitehead Institute (see above). A closely related site is Multiple Primer Design with Primer 3
GenoFrag - is a software package to design primers optimized for whole genome scanning by long-range PCR. It was developed for the analysis of Staphylococcus aureus genome plasticity by whole genome amplification in ~10 kb-long fragments. Site is in French. (Reference: N. Ben Zakour et al. 2004. Nucl. Acids Res. 32: 17-24)
Short interfering RNA (siRNA) design:
SiRNA Selector - Small interfering RNA (siRNA) guides sequence-specific degradation of the homologous mRNA, thus producing "knock-down" cells. siRNA design tool scans a target gene for candidate siRNA sequences that satisfy user-adjustable rules. The program evaluates siRNA functionality and specificity. (Reference: N. Levenkova et al. (2003) Bioinformatics 2004 20: 430-432). Other similar programs are siRNA Target Designer (Promega, U.S.A.), or siRNA Target Finder (Ambion, U.S.A.).
siRNA Design Software - compares existing design tools, including those listed above. They also attempt to improve the MPI principles and existing tools by an algorithm that can filter ineffective siRNAs. The algorithm is based on some new observations on the secondary structure. (Reference: S. M. Yiu et al. (2004) Bioinformatics 21: 144-151).
Realtime PCR primer design:
RealTimeDesign (Biosearch Technoloogies) - free but requires registration.
QuantPrime - is a flexible program for reliable primer design for use in larger qPCR experiments. The flexible framework is also open for simple use in other quantification applications, such as hydrolyzation probe design for qPCR and oligonucleotide probe design for quantitative in situ hybridization. (Reference: S. Arvidsson et al. 2008. BMC Bioinformatics 9:465)
GenScript Real-time PCR (TaqMan) Primer Design (GenScript Corporation)
For additional physicochemical data on the primers the following six sites are useful:
NetPrimer (Premier Biosoft International, U.S.A.) - In my opinion the best site since it provides one with Tm, thermodynamic properties and most stable hairpin & dimers.BUT it takes a while for the program to load.
dnaMATE - calculates a consensus Tm) for short DNA sequence (16-30 nts) using a merged method that is based on three different thermodynamic tables. The consensus Tm value is a robust and accurate estimation of melting temperature for short DNA sequences of practical application in molecular biology. Accuracy benchmarks using all experimental data available indicate that the consensus Tm prediction errors will be within 5 ºC from the experimental value in 89% of the cases. (Reference: A. Panjkovich et al. 2005. Nucl. Acids Res. 33: W570-W572.).
OligoAnalyzer (Integrated DNA Technologies, Inc., U.S.A.) - in addition to hairpin and self-dimer analysis of existing primers this site provides one with the opportunity to BLAST the sequence against NCBI's database and measure the impact of incorporating 5'-modifications into the sequence. The oligos can then be ordered directly.
Another excellent site is Oligonucleotide Properties Calculator (Northwestern University Medical School, Chicago, U.S.A.) which provides one with detailed information on the calculations. Also permits analysis of 6-FAM, HEX, or TAMRA-labelled oligos.
Biopolymer Calculator (Yale University, U.S.A.) - Not yet functional.
Melting: enthalopy, entropy and melting temperature (N. Le Novere, Pasteur Institute, Paris, France).
Introduction of silent mutations:
WatCut (Michael Palmer, University of Waterloo, Canada) - takes an oligonucleotide and introduces silent mutations in potential restriction sites such that the amino acid sequence of the protein is unaltered.
When you are ready to set-up your PCR reaction see:
PCR Box Titration Calculator (Allotron Biosensor Corporation) - for figuring out the amounts of each reagent to use in a two-dimensional box titration for PCR. For standard PCR reactions adjust volume, and change "row" and "column" number to "1", click on all the "top" or "bottom" and "done".
PCR Reaction Mixture Setup (R. Kalendar, University of Helsinki, Finland) - very nice site.
Primer presentation on the DNA sequence:
Sequence Extractor (Paul Stothard) - generates a clickable restriction map and PCR primer map of a DNA sequence (Accepted formats are: raw, GenBank, EMBL, and FASTA) offering a great deal of control on output. Protein translations and intron/exon boundaries are also shown. Use Sequence Extractor to build DNA constructs in silico.