Hapler

Latest software: Hapler version 1.60
Documentation: PDF
Older versions:
Hapler version 1.53
Hapler version 1.52
Hapler version 1.5
Hapler version 1.0

Sample files:
pz_p450_CP6B6_pop.tigr - Butterfly, TIGR format
agambiae.1925488521.sam - mosquito, SAM format

Hapler is a tool for assembling robust haplotype regions given alignments (from de-novo assembly or mapping to a reference) of genetically diverse sequence data. Hapler compares each sequence to every other, and groups sequences together into sets that don't have any conflicts (minimum coloring of the sequence 'conflict graph'). This can be done in O(n^3) time, because Hapler assumes that sequences contain no gaps (e.g., it ignores mate-pair information). Because such a minimum coloring is usually not unique, Hapler by default produces many pseudo-random colorings and only keeps haplotype groupings which are common to all. In practice this drastically increases the correctness of results.

Because Hapler strives for correctness in assembling Haplotype regions, in practice they are often shorter than the alignment as a whole. Thus, Hapler also includes functionality to reconstruct a full-length consensus sequence minimizing and identifying possible chimeric points.

Hapler takes as input multiple alignments in either TIGR or SAM format; these can (optionally) be read on standard input, and Hapler outputs to standard output in easy-to-parse format. Three SNP callers are included by default: simple (any sequence difference induces a SNP), simplestrict (minority allele must be present twice, except in regions where coverage is less than 10X), and 454 (ignores homopolymer run-like variants). Hapler can also take as input a list of user-defined SNP loci.

Hapler works on any alphabet (RNA, DNA, Protein, ...). '-' characters are treated as conflict-causing indel alleles, and '~' characters are treated as unknown "gaps," which cause the sequences to be split into smaller sequences not containing '~'s (similar to ignoring mate-pair information).


Example usages:

cat pz_p450_CP6B6_pop.tigr | java -jar Hapler.jar --snp-caller 454

java -jar Hapler.jar --input agambiae.1925488521.sam --alignment-type sam --human-readable false