MetaPIGA, an Application for Phylogeny Inference
MetaPIGA is a robust implementation of several stochastic heuristics
for large phylogeny inference (under maximum likelihood), including a
random-restart hill climbing, a simulated annealing algorithm, a
classical genetic algorithm, and the metapopulation genetic algorithm
(metaGA) together with complex substitution models, discrete Gamma
rate heterogeneity, and the possibility to partition data.
MetaPIGA, application for plylogeny inference. see http://www.metapiga.org
For Site Admins:
| Application version | Cores/Job | Memory/Job | Input/Job | Output/Job | Walltime/Job | Users |
|---|---|---|---|---|---|---|
| 2 | 1 | 500MB | 1-100MB | 1-100MB | 0-12 hours | 2 |
Estimated number of jobs per day ( or week or month): 100
Estimated job submission frequency (constant flow or occasional
submission): constant flow
Planned start date: 01.09.2012
Planned completion date: 31.12.2012
Contact
Institution: UNIGE
Application contact person name and email: Prof. Michel Milinkovitch,
Michel.Milinkovitch AT unige.ch
SMSCG contact person: Marko Niinimaki
Application and Project Description
In a metaPIGA calculation, we start from a input dataset, a file that
contains DNA sequences of different taxa. A DNA sequence is composed
of k nucleotides (A,C, G and T). In metaGA terminology, a tree is
called an individual. The next step consist in finding the most
suitable individual, ie the phylogenetic tree that has the greatest
probability of having generating the correspondent dataset. To do, the
client program (metaPIGA) generates randomly P independent populations
of individuals (trees), which will undergo mutations and selections,
but the topological changes are guided through population comparisons.
These tasks seems to be heavy and, consequently, needs a lot of
machine resources. In the end, a consensus process, running in the
client side, will construct the final best tree based upon on the
previous P solutions issued from metaGA. It is worth reminding here
that we can not know in advance the number of trees needed to obtain
the final robust solution, meaning that metaPIGA will continue running
others analysis until the consensus process provides the best tree.
Value
MetaPIGA is a well-known and widely used used implementation of the
MetaGA-algorithm.
INSTALLATION INSTRUCTIONS
Runtime file:
APPS/BIO/METAPIGA-2
!/bin/bash
# shared directory for application installation
export application_base_path='/YOUR/DIR/HERE'
case "$1" in
0 )
# nothing to do on the frontend
# Note: for SGE LRMS here is the place for setting the Parallel Environment
# export joboption_nodeproperty_0=mpich
;;
1 )
# source the environment on the node and prepare MPIARGS
#
# Note: site needs also to make sure mpi environment works
properly ( this is normally site dependent )
#
export PATH=$PATH:/usr/bin:$application_base_path
;;
2 )
;;
* )
# Now, calling argument is wrong or missing.
# If call was made from NorduGrid ARC, it is considered
# an error. If this script is to be used also to initialize
# MPI environment for local jobs in cluster, raising error here
# could be improper.
return 1
;;
esac
#$application_base_path/metapiga.sh exec file:
# is there a parameter?
if [ $# -eq 0 ]
then
echo "This script requires a NEXUS input file as a parameter" > /dev/stderr
exit 1
fi
#what is the location of this script?
mydir=`dirname $0`
java -jar $mydir/mp2.jar noupdate silent $*
Metapiga mp2.jar file: http://www.metapiga.org/download.html
FOR END USERS
Example data file and xrsl file.
metapiga.xrsl
&(executable="$application_base_path/metapiga.sh")
(arguments="sample.nex")
(inputFiles= ("sample.nex" ""))
(gmlog="gmlog")
(stdout="std.out")
(stderr="std.err")
(jobname="mp test 4")
(outputFiles=("RESULTS" ""))
(runTimeEnvironment="APPS/BIO/METAPIGA-2")
sample.nex
Begin data;
Dimensions ntax=4 nchar=15;
Format datatype=dna symbols="ACTG" missing=? gap=-;
Matrix
Species1 atgctagctagctcg
Species2 atgcta??tag-tag
Species3 atgttagctag-tgg
Species4 atgttagctag-tag
;
End;
Begin Metapiga;
Settings
Dir = RESULTS
;
End;
