HomeWork 4
Due: Wednesday,
Read the paper following paper on enhanced suffix arrays:
http://www.zbh.uni-hamburg.de/staff/kurtz/papers/AboKurOhl2002.pdf
(1) Build the enhanced suffix array for the following string:
S = agttacgtacgatga$
The enhanced suffix array must have at least the index, suftab, lcptab and the suffix fields. Assume the lexicographical order a < c < g < t < $.
(2) Give two reasons why suffix arrays are superior to suffix trees for whole genome analysis.
(3) What is a tandem repeat? What is a branching Tandem Repeat?
(4) Give an intuitive explanation(not necessarily the optimal approach) of how you
would check for a branching tandem repeat of length at least 2l using this enhanced suffix array. Is
there a branching tandem repeat of length 2l with l³ 4 in S? If yes, what is
it?