Tandem repeats (TRs) are frequently not perfect, containing a number of mutations accumulated during evolution. One of the main problems is to distinguish between the sequences that contain highly imperfect TRs and the aperiodic sequences. The majority of proteins with TRs in sequences have repetitive arrangements in their 3D structures. Therefore, the 3D structures of proteins can be used as a benchmarking criterion for TR detection in sequences. Different TR detection tools use their own scoring procedures to determine the boundary between repetitive and non-repetitive protein sequences. Here we described these scoring functions and benchmark them by using known structural TRs. Our survey shows that none of the existing scoring procedures are able to achieve an appropriate separation between genuine structural TRs and non-TR regions. This suggests that if we want to obtain a collection of structurally and functionally meaningful TRs from a large scale analysis of proteomes, the TR scoring metrics need to be improved.

You do not currently have access to this content.