To study the sequence requirements for addition of O-linked N-acetylgalactosamine to proteins, amino acid distributions around 174 O-glycosylation sites were compared with distributions around non-glycosylated sites. In comparison with non-glycosylated serine and threonine residues, the most prominent feature in the vicinity of O-glycosylated sites is a significantly increased frequency of proline residues, especially at positions -1 and +3 relative to the glycosylated residues. Alanine, serine and threonine are also significantly increased. The high serine and threonine content of O-glycosylated regions is due to the presence of clusters of several closely spaced glycosylated hydroxy amino acids in many O-glycosylated proteins. Such clusters can be predicted from the primary sequence in some cases, but there is no apparent possibility of predicting isolated O-glycosylation sites from primary sequence data.

