1. Arka 2. Bioperl
3. Chemtool 4. ClustalW
5. COALESCE 6. fastDNAml
7. FLUCTUATE 8. GeneSplicer
9. GP 10. HMMER
11.Lucy 12. NUT
13. PdbAlign 14. PHYLIP
15. ProFit 16. RasMol
17. SeWeR 18. STRIDE
19. XYLEM  
  1. Arka is a program that: 1. serves as a graphical interface for the programs from the GP package 2. has some interesting functions on it. Main scope of the program is the manipulation and visualisation of DNA/RNA/protein sequences. The GP package contains many command-line utilities which fullfill a whole bunch of tasks (from DNA sequence searches to restriction analysis and determining the melting temperature of oligonucleotides). While those programs are convenient to use in batch processing and CGI scripts (which was the purpose of those programs), they lack a nice GUI. Arka remembers the options for the GP programs and knows what both the programs and the options do. Besides, it has some gadgets on its own. It requires GTK+, but doesn't need GNOME. Also, it is small and quick: look, I write and use my programs on an old 486 laptop. It should run like hot butter on your computer. Unless, of course, it is a 386 SX. The name comes from the "UAG" stop codon, which is traditionally called "arka codon".


  2. Bioperl. The Bioperl server provides an online resource for modules, scripts, and web links for developers of Perl-based software for life science research. They can also provide web, FTP and CVS space for individuals and organizations wishing to distribute or otherwise make freely available standalone scripts and code.


  3. Chemtool is a small program for drawing chemical structures on Linux and Unix systems using the GTK toolkit under X11. A short and possibly outdated description of the available functions is available here. Chemtool relies on transfig by Brian Smith for postscript printing and exporting files in PicTeX and EPS formats. Its companion program, XFig, is recommended for enhancing the output of chemtool, and for creation of 2D diagrams and schematics in general. Both are included with most distributions of Linux, and are available through a number of websites including www.xfig.org. If you want to import chemtool drawings into word processing programs other than LaTeX, you will probably want to add a preview bitmap to them, as neither StarOffice/OpenOffice nor that software from Redmond seem to be able to display postscript inserts on screen without them. For this purpose, using either ps2epsi, which comes with ghostscript, or epstool, a part of gsview is recommended. Since chemtool-1.6, this option is supported directly (through the equivalent function offered by recent versions of transfig). Chemtool was originally written by Thomas Volk, then a student of chemistry and biology at the University of Ulm, Germany. His version, which was described in an article in the German periodical LinuxMagazin, was using plain X11. A more recent review of chemtool appeared in Nachr. Chem. Tech. Lab. 49 (2001) 1310-1314.


  4. ClustalW is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms. Multiple alignments of protein sequences are important tools in studying sequences. The basic information they provide is identification of conserved sequence regions. This is very useful in designing experiments to test and modify the function of specific proteins, in predicting the function and structure of proteins, and in identifying new members of protein families. Sequences can be aligned across their entire length (global alignment) or only in certain regions (local alignment). This is true for pairwise and multiple alignments. Global alignments need to use gaps (representing insertions/deletions) while local alignments can avoid them, aligning regions between gaps. ClustalW is a fully automatic program for global multiple alignment of DNA and protein sequences. The alignment is progressive and considers the sequence redundancy. Trees can also be calculated from multiple alignments. The program has some adjustable parameters with reasonable defaults. EBI provides a version of Clustal W that can be executed over the Internet on their computers. In addition, you can download a copy of the basic software to run on your own computer. Versions exist for UNIX, DOS, Windows XP (command line mode only) and Mac OSX.


  5. COALESCE. Metropolis-Hastings Markov Chain Monte Carlo genealogy sampler. For use in cases without recombination, selection or migration and with constant population size. This program takes as input a set of aligned DNA or RNA sequences from different individuals in a population and uses them to make a maximum likelihood estimate of the parameter "theta," using the method described in Kuhner et al. (1995). Theta is defined as 4 times the effective population size times the mutation rate in a diploid organism, or 2 times the effective population size times the mutation rate in a haploid. (Note that this is mutation rate per site, not per locus.) COALESCE assumes that the sampled population is of constant size, and that the loci sampled are not affected by selection or recombination. If these assumptions are violated the results may be erroneous. The algorithm begins with a genealogy for the sequences and sequentially makes modifications in it, accepting or rejecting the modifications based on the sequence data, and sampling the current genealogy at intervals. From the sampled genealogies it constructs a likelihood curve and maximum likelihood estimate for theta. The aim is to preferentially sample those genealogies which can contribute substantial information to the estimate of theta, avoiding the myriads of possible but unlikely and thus uninformative genealogies. If more than one locus is analyzed, likelihoods from all loci are summed to make an overall likelihood curve and estimate of theta. The basic unit of progress of the program is a "step"--one proposed change to the genealogy, which may be accepted or rejected. A continuous series of steps, all using the same parameter values, is a "chain".


  6. fastDNAm1 is a program for estimating maximum likelihood phylogenetic trees from nucleotide sequences. Much of this program is based on version 3.3 of Joseph Felsenstein's DNAML program. Reference G. J. Olsen, H. Matsuda, R. Hagstrom, and R. Overbeek. 1994. fastDNAml: A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10: 41-48
    • Available Downloads
      • fastDNAml -- The current release of the program.
      • mpi_fastDNAml and pvm_fastDNAml -- Parallel versions based upon MPI or PVM are available from Indiana University.


  7. FLUCTUATE fits the model which has a single population which has been growing (or shrinking) according to an exponential growth law. It estimates 4Nu and g, where N is the effective population size, u is the neutral mutation rate per site, and g is the growth rate of the population. If you have a PowerMac, you will want to fetch the PowerMac binary, or if you have an Intel processor with Windows 95/98/NT/2000/XP you want the exe file.


  8. HMMER. Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER is a freely distributable implementation of profile HMM software for protein sequence analysis Documentation is the User's guide
    • Available Downloads All distributions below come with full source code, the User's Guide (PDF format), UNIX man pages, and other documentation. Once you download, uncompress (gunzip), and un-tar (tar xf), see the file INSTALL for quick installation instructions. HMMER should build cleanly on any UNIX platform, including Mac OS/X. It should also compile on Microsoft Windows platforms, but you would have to work around the GNU configure script and UNIX makefiles. Porting to other non UNIX operating systems such as VAX/VMS should not be difficult. The code is standard ANSI/POSIX C.
      • Source code
      • AMD Opteron/Linux
      • Apple Macintosh PowerPC OS/X
      • Compaq Alpha Tru64
      • Compaq Alpha Linux
      • Hewlett/Packard IA64 (Itanium2), Linux
      • Hewlett/Packard IA64 (Itanium2), HP/UX
      • IBM Power4, Linux
      • IBM Power4, AIX
      • Intel FreeBSD Intel GNU/Linux
      • Intel GNU/Linux as RPM
      • Intel OpenBSD
      • Intel Solaris
      • Silicon Graphics IA64 (Itanium2), Linux
      • Silicon Graphics MIPS IRIX


  9. GeneSplicer. A fast, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been trained and tested successfully on Plasmodium falciparum (malaria), Arabidopsis thaliana, human, Drosophila, and rice. Training data sets for human and Arabidopsis thaliana are included. Use the GeneSplicer Web Interface to run GeneSplicer directly, or see below for instructions on downloading the complete system including source code.

    GeneSplicer is released as source code and was tested on Linux RedHat 6.x+, Sun Solaris, and Alpha OSF1, but should work on any Unix system.


  10. GP is a set of small utilities written in ANSI C to manipulate DNA sequences in a Unix fashion, fit for combining within shell and cgi scripts. They are fast and quite reliable, and playing with large numbers of sequences is much more convenient with command line interface then with standard GUI tools. The programs are supposed to compile fine under any ANSI C compiler, but I never tried any platform other then Unix / Linux. You will find more details online on the GP man pages. And here is an example of a site using GP programs in CGI scripts to do promoter searches on-the-fly.


  11. Lucy is a Sequence Cleanup Program. Lucy is a utility that prepares raw DNA sequence fragments for sequence assembly, possibly using the TIGR Assembler. The cleanup process includes quality assessment, confidence reassurance, vector trimming and vector removal. The primary advantage of Lucy over other similar utilities is that it is a fully integrated, stand alone program.
    Reference H. H. Chou and M. H. Holmes. 2001. DNA sequence quality trimming and vector removal. Bioinformatics. 17(12): 1093-1104.
    • Available Downloads
      • Lucy [Unix version]
      • Lucy2 [Hui-Hsien Chou's Windows version]


  12. NUT is an open-source free nutrition software that records what you eat and analyzes your meals for nutrient levels in terms of the "Daily Value" or DV which is the standard for food labeling in the US. The program uses the free food composition database from the USDA. By experimenting, you can find the optimal level of the various nutrients and how to implement this with foods available to you. NUT can help reconstruct the lost instruction manual to your care and feeding because, when the authorities and crackpots disagree on the proper human diet, you can design an experiment using the food composition tables to discover the truth!
    • Features of NUT include:
      • 7538 foods and 143 nutrients--the complete, latest USDA database
      • Foods easy to find and add to daily meals
      • Configurable for 1-19 meals per day and any dietary plan--including low carb, zone, low fat
      • Comprehensive meal analysis for any number of consecutive meals
      • Presents both easy-to-read percentage summaries and in-depth nutrient analysis, including Omega-3 and Omega-6 essential fatty acids
      • Defaults to ounces or grams based on user input
      • Suggests foods based on current diet
      • Can easily create additional databases for other family members
      • Auto-transfer of successful dietary strategies from analysis screen to configuration settings
      • Allows recording of recipes and customary meals for fast data entry
      • Guesses recipes of packaged foods
      • Creates graphs of nutrient intake showing daily and monthly trends
      • Sorts foods richest in each of the 136 nutrients
      • Reveals which foods contribute most to user's nutrition
      • Runs on Linux, *nix, Windows (DOS); allows dual-boot PC systems to share the same data; and has no dependencies on other programs
      • The price is right--it's free! And you can read and modify the source code.


  13. PdbAlign. Given a GCG multiple sequence alignment file (a GCG MSF file), which a includes a sequence of known structure, the program pdbalign maps the sequence variability onto the known structure. The central premise is of course, that for a closely related family of proteins (sequence ID > 40%) the 3-D structures will not be significantly different.
    Reference Roger A. Sayle, Mansoor A. S. Saqi, M. Weir, Andrew Lyall. 1995. PdbAlign, PdbDist and DistAlign: tools to aid in relating sequence variability to structure. Computer Applications in the Biosciences. 11(5): 571-573.
    Documentation: README
    • Available Downloads


  14. PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). It is available free over the Internet, and written to work on as many different kinds of computer systems as possible. The source code is distributed (in C), and executables are also distributed. In particular, already-compiled executables are available for Windows (95/98/NT/2000/me/xp), MacOS 8 and 9, MacOS X, and Linux systems. Complete documentation is available on documentation files that come with the package.
    Methods that are available in the package include parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus trees. Data types that can be handled include molecular sequences, gene frequencies, restriction sites and fragments, distance matrices, and discrete characters.
    The programs are controlled through a menu, which asks the users which options they want to set, and allows them to start the computation. The data are read into the program from a text file, which the user can prepare using any word processor or text editor (but it is important that this text file not be in the special format of that word processor -- it should instead be in "flat ASCII" or "Text Only" format). Some sequence analysis programs such as the ClustalW alignment program can write data files in the PHYLIP format. Most of the programs look for the data in a file called "infile" -- if they do not find this file they then ask the user to type in the file name of the data file.
    Output is written onto special files with names like "outfile" and "outtree". Trees written onto "outtree" are in the Newick format, an informal standard agreed to in 1986 by authors of a number of major phylogeny packages.
    At this stage they do not have a mouse-windows interface for PHYLIP.


  15. ProFit (pronounced Pro-Fit, not profit!) is designed to be the ultimate program for performing least squares fits of two protein structures. It performs a very simple and basic function, but allows as much flexibility as possible in performing this procedure. Thus one can specify subsets of atoms to be considered, specify zones to be fitted by number, sequence, or by sequence alignment.
    ProFit does not try to address the question of sorting out equivalent atoms for you beyond doing a sequence alignment. There are other programs such as SSAP and GAFIT which address that problem. You must specify which residues and atoms you consider to be equivalent although the program supports internal sequence alignment to set the zones automatically.
    • Documentation
    • Available Downloads. ProFit is freely available for use by not-for-profit organisations and for commercial organisations (providing they inform the author that they are using it). It may not be distributed without the author's permission, but must be obtained from this site. It is supplied as a gzipped tar file of source code and as an Linux binary. Bernhard Rupp has kindly provided a ZIP file of ProFit compiled for Windows (Win32). This is only available for Version 2.3 of ProFit.


  16. RasMol is a molecular graphics program intended for the visualisation of proteins, nucleic acids and small molecules. The program is aimed at display, teaching and generation of publication quality images. The program has been developed at the University of Edinburgh's Biocomputing Research Unit and the Biomolecular Structures Group at Glaxo Research and Development, Greenford, UK.

    RasMol reads in molecular co-ordinate files in a number of formats and interactively displays the molecule on the screen in a variety of colour schemes and representations. Currently supported input file formats include Brookhaven Protein Databank (PDB), Tripos' Alchemy and Sybyl Mol2 formats, Molecular Design Limited's (MDL) Mol file format, Minnesota Supercomputer Center's (MSC) XMol XYZ format, CHARMm format, MOPAC format, CIF format and mmCIF format files. If connectivity information and/or secondary structure information is not contained in the file this is calculated automatically. The loaded molecule may be shown as wireframe, cylinder (drieding) stick bonds, alpha-carbon trace, spacefilling (CPK) spheres, macromolecular ribbons (either smooth shaded solid ribbons or parallel strands), hydrogen bonding and dot surface. Atoms may also be labelled with arbitrary text strings. Alternate conformers and multiple NMR models may be specially coloured and identified in atom labels. Different parts of the molecule may be displayed and coloured independently of the rest of the molecule or shown in different representations simultaneously. The space filling spheres can even be shadowed. The displayed molecule may be rotated, translated, zoomed, z-clipped (slabbed) interactively using either the mouse, the scroll bars, the command line or an attached dials box. RasMol can read a prepared list of commands from a `script' file (or via interprocess communication) to allow a given image or viewpoint to be restored quickly. RasMol can also create a script file containing the commands required to regenerate the current image. Finally the rendered image may be written out in a variety of formats including both raster and vector PostScript, GIF, PPM, BMP, PICT, Sun rasterfile or as a MolScript input script or Kinemage.
    RasMol will run on a wide range of architectures and systems including SGI, sun4, sun3, sun386i, SGI, DEC, HP and E&;S workstations, IBM RS/6000, Cray, Sequent, DEC Alpha (OSF/1, OpenVMS and Windows NT), IBM PC (under Microsoft Windows, Windows NT, OS/2, Linux, BSD386 and *BSD), Apple Macintosh (System 7.0 or later), PowerMac and VAX VMS (under DEC Windows). UNIX and VMS versions require an 8bit, 24bit or 32bit X Windows frame buffer (X11R4 or later). The X Windows version of RasMol provides optional support for a hardware dials box and accelerated shared memory rendering (via the XInput and MIT-SHM extensions) if available.


  17. SeWeR is an acronym, stands for SEquence analysis using WEb Resources. It serves you a single door to all the common web-based services for sequence analysis. And it sews. It sews all these services together. For a refined mind, SeWeR is an integrated portal to common web-based services in bioinformatics. SeWeR is cross-browser DHTML. It is written entirely in JavaScript1.2. Hence it will run only in Netscape 4.0 or higher and Internet Explorer 4.0 or higher.
    Reference M. K. Basu. 2001. SeWeR: a customizable and integrated dynamic HTML interface to bioinformatics services. Bioinformatics. 17(6): 577-578.
    • Available Downloads: SeWeR is feather-light! The whole package is just around 300K. You can even run it from a floppy. The zip archive is available at two locations:


  18. STRIDE is a program to recognize secondary structural elements in proteins from their atomic coordinates. It performs the same task as DSSP by Kabsch and Sander but utilizes both hydrogen bond energy and mainchain dihedral angles rather than hydrogen bonds alone. It relies on database-derived recognition parameters with the crystallographers' secondary structure definitions as a standard-of-truth. Please see Frishman and Argos for detailed description of the algorithm.
    Reference D. Frishman & P. Argos. 1995. Knowledge-based secondary structure assignment. Proteins. 23: 566-579.
    • Available Downloads: Executables of STRIDE for several UNIX platforms, VAX/VMS, OpenVMS, Dos and Mac together with documentation and source code are available by anonymous FTP from ftp.ebi.ac.uk (directories /pub/software/unix/stride, /pub/software/dos/stride, /pub/software/vms/stride, /pub/software/mac/stride).
      Data files with STRIDE secondary structure assignments for the current release of the PDB databank are in the directory /pub/databases/stride of the same site. Atomic coordinate sets can be submitted for secondary structure assignment through electronic mail to This e-mail address is being protected from spambots. You need JavaScript enabled to view it. . A mail message containing HELP in the first line will be answered with appropriate instructions. See also WWW page http://www.embl-heidelberg.de/stride/stride_info.html.


  19. XYLEM. XYLEM(1) is a package of tools designed to exploit the Unix environment to enable the user to identify, extract and manipulate data from major databases such as GenBank, EMBL and PIR. SPLITDB splits database files into annotation, sequence, and index files for more efficient searching. Fundamental to the power of these programs is the ability to perform operations on groups of sequences, represented by names or accession numbers which function as virtual database subsets. Keyword searches can be performed by FINDKEY. Hits can be retrieved using FETCH. The most powerful program is FEATURES, which uses the GETOB parser to evaluate GenBank/EMBL/DDBJ Features Table expressions, thereby extract features (eg. mRNA, sig_peptide, intron) from lists of entries. Additional programs perform operations such as translation or randomization of datasets, and formatting of multiply-aligned sequences for publication. XYLEM is compatible with the Fristensky Sequence Analysis Package, and the Pearson FASTA programs(2), and can be used from within the Genetic Data Environment (GDE) of Steven Smith(3).
    Reference:
    1. B. Fristensky. 1993. Feature expressions: creating and manipulating sequence datasets. NAR. 21: 5997-6003.
    2. W. R. Pearson and D. J. Lipman. 1988. Improved tools for biological sequence comparison. PNAS. 85: 2444-2448.
    3. S. W. Smith, R. Overbeek, C. R. Woese, W. Gilbert and P. M. Gillevet. 1994. The genetic data environment an expandable GUI for multiple sequence analysis. Computer Applications in the Biosciences. 10:671-675

Login Form

Who's Online

We have 40 guests and no members online

Students Menu