Abstract
Heparan sulfate (HS) and chondroitin sulfate (CS) proteoglycans (PG) consist of a core protein to which the glycosaminoglycan (GAG) chains, HS or CS, are attached through a common linker tetrasaccharide. In the extracellular space, they are involved in the regulation of cell communication, assuring development and homeostasis. The HSPG biosynthetic pathway has documented 51 genes, with many diseases associated to defects in some of them. The phenotypic consequences of this genetic variation in humans, and of genetic ablation in mice, and their expression patterns, led to a phenotypically centered HSPG biosynthetic pathway model. In this model, HS sequences produced by ubiquitous NDST1, HS2ST and HS6ST enzymes are essential for normal development and homeostasis, whereas tissue restricted HS sequences produced by the non-ubiquitous NDST2-4, HS6ST2-3, and HS3ST1-6 enzymes are involved in adaptative behaviors, cognition, tissue responsiveness to stimuli, and vulnerability to disease. The model indicates that the flux through the HSPG/CSPG pathways and its diverse branches is regulated by substrate preferences and protein-protein-interactions. This results in a privileged biosynthesis of HSPG over that of CSPGs, explaining the phenotypes of linkeropathies, disease caused by defects in genes involved in the biosynthesis of the common tetrasaccharide linker. Documented feedback loops whereby cells regulate HS sulfation, and hence the interactions of HS with protein partners, may be similarly implemented, e.g., protein tyrosine sulfation and other posttranslational modifications in enzymes of the HSPG pathway. Together, ubiquitous HS, specialized HS, and their biosynthesis model can facilitate research for a better understanding of HSPG roles in physiology and pathology.
Introduction
The extracellular proteome and glycome, which mediate cell communication, enabled the transition from unicellularity, where natural selection acts on the individual cell, to multicellularity where it acts on the organism. Proteoglycans (PG), consisting of a core protein (CP) on which are synthesized glycosaminoglycan (GAG) chains, are key players in regulating virtually all aspects of development and homeostasis in multicellular organisms by virtue of their vast interactome, e.g., reviewed [1]. These GAG chains, heparan sulfate (HS), chondroitin sulfate (CS), dermatan sulfate (DS), and keratan sulfate are secondary gene products, synthesized in the Golgi by a battery of enzymes and remodeled after secretion. Of the PGs, the HS chains of HSPGs possess the most complex sulfated saccharide structures able to interact with over 800 protein partners, depending not only on their structures, but also on their cellular and tissue locations [1–3]. Other PGs, such as CSPGs are structurally simpler and have fewer protein partners. Among PGs, HSPGs and CSPGs have a common biosynthetic starting point, the synthesis of a tetrasaccharide linker on the core protein (GAG-CP linker).
The biosynthesis of HSPGs on the endoplasmic reticulum and then in the Golgi involves the products of 51 genes (Table S1), each of which is associated with a particular structural feature of the PG product. There is a large body of information on the encoded proteins, and considerable progress has been made in understanding the synthesis of HS and CS chains and their interactions with proteins (reviewed in [1]). This has demonstrated that HS is the most structural diverse macromolecule in biology, a consequence of which is its large interactome, and has led to new ideas relating to the control of cell communication (reviewed in [1]). There is an equally large body of information on the phenotypic consequences of mutations in the genes encoding these proteins in humans and of their genetic knock out in transgenic mice (Tables 1-4 and SI2). Intriguingly, mutations in only some of these 51 genes cause overt phenotypes, which distinguish essential genes from non-essential or redundant ones. However, it is noteworthy that under stress, transgenic mice lacking a non-essential gene involved in HSPG biosynthesis can exhibit some phenotypes.
Gene . | Associated human disease (traits of clinical variants) . | Phenotype of the null mouse . | Most affected organs or tissues . | Reference . |
---|---|---|---|---|
GROUP I Essential widely expressed | ||||
HSPG2 | From mild to lethal diseases: Lethal Kniest-like syndrome, Schwartz Jampel syndrome type 1, Stuve-Wiedemann syndrome, dyssegmental dysplasia of Silverman-Handmaker type (various degrees of bone dysplasia, cardiovascular and pulmonary defects, myotonia, hyperthermia, etc.). | Embryonic lethality (E10-12) with severe chondrodysplasia, defective skeletal and cephalic development. | Cartilage, bone, bone marrow, cardiac muscle, brain, vasculature, etc. | [25,26,28] |
AGRN | Congenital myasthenic syndrome with distal muscle weakness and atrophy with or without synaptic dysfunction. | Embryonic lethal or die at birth due to respiratory failure, altered growth plate, central nervous system defects, etc. | Neuromuscular junctions, muscle, brain, growth plate. | [29,30,32] |
COL18A1 | Knobloch syndrome (severe eye and encephalocele defects). | Viable but with defects in the vasculature, ocular tissue, kidney proximal tubules, heart valves, epidermis, choroid plexus. | Eyes, kidney, heart, vasculature, brain, and epidermis. | [33–35,160] |
GPC3 | Simpson-Golabi-Behmel syndrome (pre/post-natal overgrowth, skeletal, facial, visceral, neurological anomalies). Wilms tumor (nephroblastoma). | Perinatally die with kidney and lung dysplasia, cystic kidneys, etc. Recapitulates Simpson-Golabi-Behmel syndrome. | Bone, brain, kidney, lung, muscle, etc. | [36,37] |
GPC4 | Keipert Syndrome (craniofacial and digital abnormalities, cognitive impairment, deafness, etc.). Associated to Autism spectrum disorders. | Craniofacial and digital abnormalities, behavioral alterations. | Brain, cranial bone, limb, hearing. | [39,40] |
GPC6 | Omodysplasia (facial dysmorphism, short stature, shortened limbs…). SNP association with osteoporosis and multiple sclerosis. | Die at birth with severe facial dysmorphism and short bones. | Skeleton, facial bones. | [41–43] |
GROUP III Non-essential widely expressed | ||||
GPC1 | No pathogenic variants reported. SNP association with risk of schizophrenia and biliary atresia. | Viable, fertile, healthy. However, reduced brain size. | Brain. | [91,92,161] |
SDC1 | No pathogenic variants reported. SNP association with risk of coronary plaque in patients with coronary artery disease and with risk of biliary atresia. | Viable, fertile, and healthy. However, altered response to stimuli during wound healing, inflammation, fibrosis, re-vascularization, infection… | Several tissues: altered response to instigating agents and angiogenesis. | [86,87,162] |
SDC2 | No pathogenic variants reported. SNP association with risk of posttraumatic stress disorder. | Mouse has not been generated. However, altered adaptative responses during tissue recovery after injury (fibrosis, inflammation). | Several tissues: altered response to instigating agents and angiogenesis. | [80,162–164] |
SDC3 | No pathogenic variants reported. SNP association with resistance to obesity and risk to metabolic syndrome and female hyperandrogenism. | Viable, fertile, and healthy. However, altered resistance to obesity and feeding behavior. | Brain, immune and adrenal systems. | [81,83,84,90,162,165] |
SDC4 | No pathogenic variants reported. SNP association with miss-regulation of whole-body energy metabolism and with longevity and lipid profile in healthy elderly. | Viable, fertile, and healthy life. However, altered reactivity to stimuli related to wound healing. | Several tissues: altered response to instigating agents and angiogenesis. | [82,88,166] |
GROUP IV Non-essential restricted expression | ||||
GPC2 | No pathogenic variants reported. SNP increase polygenic risk for Alzheimer’s disease. | Mouse has not been generated. However, transcript and protein levels increase when neurogenesis is stimulated in the adult mice brain and decrease when neurogenesis is ablated. | Brain. | [102,167] |
GPC5 | No pathogenic variants reported. SNP association with risk for multiple sclerosis, autoimmune thyroid disease, and acquired nephrotic syndrome. | Mouse has not been generated. | Kidney and testis. | [104–106] |
Gene . | Associated human disease (traits of clinical variants) . | Phenotype of the null mouse . | Most affected organs or tissues . | Reference . |
---|---|---|---|---|
GROUP I Essential widely expressed | ||||
HSPG2 | From mild to lethal diseases: Lethal Kniest-like syndrome, Schwartz Jampel syndrome type 1, Stuve-Wiedemann syndrome, dyssegmental dysplasia of Silverman-Handmaker type (various degrees of bone dysplasia, cardiovascular and pulmonary defects, myotonia, hyperthermia, etc.). | Embryonic lethality (E10-12) with severe chondrodysplasia, defective skeletal and cephalic development. | Cartilage, bone, bone marrow, cardiac muscle, brain, vasculature, etc. | [25,26,28] |
AGRN | Congenital myasthenic syndrome with distal muscle weakness and atrophy with or without synaptic dysfunction. | Embryonic lethal or die at birth due to respiratory failure, altered growth plate, central nervous system defects, etc. | Neuromuscular junctions, muscle, brain, growth plate. | [29,30,32] |
COL18A1 | Knobloch syndrome (severe eye and encephalocele defects). | Viable but with defects in the vasculature, ocular tissue, kidney proximal tubules, heart valves, epidermis, choroid plexus. | Eyes, kidney, heart, vasculature, brain, and epidermis. | [33–35,160] |
GPC3 | Simpson-Golabi-Behmel syndrome (pre/post-natal overgrowth, skeletal, facial, visceral, neurological anomalies). Wilms tumor (nephroblastoma). | Perinatally die with kidney and lung dysplasia, cystic kidneys, etc. Recapitulates Simpson-Golabi-Behmel syndrome. | Bone, brain, kidney, lung, muscle, etc. | [36,37] |
GPC4 | Keipert Syndrome (craniofacial and digital abnormalities, cognitive impairment, deafness, etc.). Associated to Autism spectrum disorders. | Craniofacial and digital abnormalities, behavioral alterations. | Brain, cranial bone, limb, hearing. | [39,40] |
GPC6 | Omodysplasia (facial dysmorphism, short stature, shortened limbs…). SNP association with osteoporosis and multiple sclerosis. | Die at birth with severe facial dysmorphism and short bones. | Skeleton, facial bones. | [41–43] |
GROUP III Non-essential widely expressed | ||||
GPC1 | No pathogenic variants reported. SNP association with risk of schizophrenia and biliary atresia. | Viable, fertile, healthy. However, reduced brain size. | Brain. | [91,92,161] |
SDC1 | No pathogenic variants reported. SNP association with risk of coronary plaque in patients with coronary artery disease and with risk of biliary atresia. | Viable, fertile, and healthy. However, altered response to stimuli during wound healing, inflammation, fibrosis, re-vascularization, infection… | Several tissues: altered response to instigating agents and angiogenesis. | [86,87,162] |
SDC2 | No pathogenic variants reported. SNP association with risk of posttraumatic stress disorder. | Mouse has not been generated. However, altered adaptative responses during tissue recovery after injury (fibrosis, inflammation). | Several tissues: altered response to instigating agents and angiogenesis. | [80,162–164] |
SDC3 | No pathogenic variants reported. SNP association with resistance to obesity and risk to metabolic syndrome and female hyperandrogenism. | Viable, fertile, and healthy. However, altered resistance to obesity and feeding behavior. | Brain, immune and adrenal systems. | [81,83,84,90,162,165] |
SDC4 | No pathogenic variants reported. SNP association with miss-regulation of whole-body energy metabolism and with longevity and lipid profile in healthy elderly. | Viable, fertile, and healthy life. However, altered reactivity to stimuli related to wound healing. | Several tissues: altered response to instigating agents and angiogenesis. | [82,88,166] |
GROUP IV Non-essential restricted expression | ||||
GPC2 | No pathogenic variants reported. SNP increase polygenic risk for Alzheimer’s disease. | Mouse has not been generated. However, transcript and protein levels increase when neurogenesis is stimulated in the adult mice brain and decrease when neurogenesis is ablated. | Brain. | [102,167] |
GPC5 | No pathogenic variants reported. SNP association with risk for multiple sclerosis, autoimmune thyroid disease, and acquired nephrotic syndrome. | Mouse has not been generated. | Kidney and testis. | [104–106] |
Genes are organized based on their expression in human tissues and traits of their clinical variants and/or phenotype of null mice.
Gene . | Traits of human clinical variants . | Phenotype of the null mouse . | Most affected organs or tissues . | Reference . |
---|---|---|---|---|
GROUP I Essential widely expressed | ||||
XYLT1 | Desbuquois dysplasia (DBQD) type 2 (also known as Baratela–Scott syndrome), characterized by with short stature, joint laxity, advanced carpal ossification, mental retardation. | Non-available. However, a mouse carrying a Xylt1 mutation shows chondrocyte premature maturation, early ossification, and dwarfism. | Connective and skeletal tissue, brain. | [44,45] |
XYLT2 | Spondyloocular syndrome (post-developmental osteoporosis, cataract, hearing and learning impairment). | Develops normally but show post-developmental defects in liver, biliary epithelial and renal cysts, low body weight, adipose tissue loss and lipodystrophy. | Skeletal, adipose and connective tissues, liver, eye, bone, brain, kidney. | [46,140,168] |
B4GALT7 | Spondylodysplastic Ehlers-Danlos Syndrome (spEDS)-progeroid type (connective and skeletal tissues with radioulnar synostosis) and Reunion Island Larsen-like syndrome (large spectrum). | Not reported. | Bone, cartilage, skin, brain, eye. | [47,48,169] |
FAM20B | Short limb dysplasia resembling XYLT1-related DBQD with mid-face and thoracic hypoplasia (leading to respiratory failure), very short stature, mesomelic limbs shortening and multiple joint dislocations). | Die as embryos and show severely stunted growth, multisystem organ hypoplasia, and delayed development of the skeletal system, eyes, lung, gastrointestinal tract, and liver. | Multiorgan hypoplasia, delayed growth and development. | [52,170,171] |
B3GALT6 | spEDS progeroid type 2 (affecting connective and skeletal systems with sever skeletal affections), spondyloepimetaphyseal dysplasia with joint laxity, learning defects. | Not reported. | Bone, cartilage, skin tendons, ligaments, brain. | [49,50] |
B3GAT3 | Larsen-like syndrome with variable phenotypes similar to Antley-Bixler, Shprintzen-Goldberg, and Geroderma osteodysplastica syndromes. | Embryonic lethality before the 8-cell stage because failed cytokinesis. | Bone, cartilage, skin tendons, ligaments, heart, brain. | [51] |
CANT1 | Various skeletal phenotypes related to DBQD and Larsen syndrome with growth retardation, short extremities, progressive scoliosis, joint laxity, severe prenatal and postnatal growth retardation, etc. | Continuum of skeletal dysplasia phenotypes including Desbuquois dysplasia and multiple epiphyseal dysplasia. | Skeletal and connective tissues, brain. | [9,146] |
SLC10A7 | Shortened long bones, growth plate and tooth enamel anomalies. | Short stature, amelogenesis imperfect, and skeletal dysplasia with scoliosis. | Skeletal and connective tissues, brain. | [11] |
SLC26A2 | Inherited chondrodysplasias including, in order of decreasing severity, achondrogenesis 1B, atelosteogenesis 2, diastrophic dysplasia (DTD) and recessive multiple epiphyseal dysplasia. | Knock-in mouse with a partial loss of function: growth retardation, skeletal dysplasia and joint contractures recapitulating human phenotype. | Skeletal and connective tissue. | [147,172] |
SLC35A2 | Early-onset epileptic encephalopathies (EOEE) with symptoms such as epilepsy and autism. | Perinatal lethal with chondrodysplasia recapitulating Human vertebral anomalies. | Skeletal, brain. | [148] |
SLC35A3 | Severe epileptic encephalopathy with skeletal abnormalities (arthrogryposis, dorso-lumbar convex scoliosis microcephaly) and severe intellectual disability. | Lethal chondrodysplasia with vertebral anomalies. | Skeletal, brain. | [173] |
Gene . | Traits of human clinical variants . | Phenotype of the null mouse . | Most affected organs or tissues . | Reference . |
---|---|---|---|---|
GROUP I Essential widely expressed | ||||
XYLT1 | Desbuquois dysplasia (DBQD) type 2 (also known as Baratela–Scott syndrome), characterized by with short stature, joint laxity, advanced carpal ossification, mental retardation. | Non-available. However, a mouse carrying a Xylt1 mutation shows chondrocyte premature maturation, early ossification, and dwarfism. | Connective and skeletal tissue, brain. | [44,45] |
XYLT2 | Spondyloocular syndrome (post-developmental osteoporosis, cataract, hearing and learning impairment). | Develops normally but show post-developmental defects in liver, biliary epithelial and renal cysts, low body weight, adipose tissue loss and lipodystrophy. | Skeletal, adipose and connective tissues, liver, eye, bone, brain, kidney. | [46,140,168] |
B4GALT7 | Spondylodysplastic Ehlers-Danlos Syndrome (spEDS)-progeroid type (connective and skeletal tissues with radioulnar synostosis) and Reunion Island Larsen-like syndrome (large spectrum). | Not reported. | Bone, cartilage, skin, brain, eye. | [47,48,169] |
FAM20B | Short limb dysplasia resembling XYLT1-related DBQD with mid-face and thoracic hypoplasia (leading to respiratory failure), very short stature, mesomelic limbs shortening and multiple joint dislocations). | Die as embryos and show severely stunted growth, multisystem organ hypoplasia, and delayed development of the skeletal system, eyes, lung, gastrointestinal tract, and liver. | Multiorgan hypoplasia, delayed growth and development. | [52,170,171] |
B3GALT6 | spEDS progeroid type 2 (affecting connective and skeletal systems with sever skeletal affections), spondyloepimetaphyseal dysplasia with joint laxity, learning defects. | Not reported. | Bone, cartilage, skin tendons, ligaments, brain. | [49,50] |
B3GAT3 | Larsen-like syndrome with variable phenotypes similar to Antley-Bixler, Shprintzen-Goldberg, and Geroderma osteodysplastica syndromes. | Embryonic lethality before the 8-cell stage because failed cytokinesis. | Bone, cartilage, skin tendons, ligaments, heart, brain. | [51] |
CANT1 | Various skeletal phenotypes related to DBQD and Larsen syndrome with growth retardation, short extremities, progressive scoliosis, joint laxity, severe prenatal and postnatal growth retardation, etc. | Continuum of skeletal dysplasia phenotypes including Desbuquois dysplasia and multiple epiphyseal dysplasia. | Skeletal and connective tissues, brain. | [9,146] |
SLC10A7 | Shortened long bones, growth plate and tooth enamel anomalies. | Short stature, amelogenesis imperfect, and skeletal dysplasia with scoliosis. | Skeletal and connective tissues, brain. | [11] |
SLC26A2 | Inherited chondrodysplasias including, in order of decreasing severity, achondrogenesis 1B, atelosteogenesis 2, diastrophic dysplasia (DTD) and recessive multiple epiphyseal dysplasia. | Knock-in mouse with a partial loss of function: growth retardation, skeletal dysplasia and joint contractures recapitulating human phenotype. | Skeletal and connective tissue. | [147,172] |
SLC35A2 | Early-onset epileptic encephalopathies (EOEE) with symptoms such as epilepsy and autism. | Perinatal lethal with chondrodysplasia recapitulating Human vertebral anomalies. | Skeletal, brain. | [148] |
SLC35A3 | Severe epileptic encephalopathy with skeletal abnormalities (arthrogryposis, dorso-lumbar convex scoliosis microcephaly) and severe intellectual disability. | Lethal chondrodysplasia with vertebral anomalies. | Skeletal, brain. | [173] |
Genes are organized based on their expression in human tissues and on the traits of their clinical variants and/or phenotype of the corresponding null mice.
Gene . | Traits of human clinical variants . | Phenotype of the null mouse . | Most affected organs or tissues . | Reference . |
---|---|---|---|---|
GROUP I Essential widely expressed | ||||
EXTL3 | Severe autosomal recessive skeletal dysplasia, including epispondylo-metaphyseal dysplasia, with developmental delay, immunodeficiency, and neuromotor and brain development delay. | Embryonically lethal at around 9 days post-fecundation with HS non-detected and over production of CS in chondrogenic tissues. | All tissues. | [55,174] |
EXT1 | Hereditary multiple exostoses (multiple cartilaginous tumors), scoliosis, seizures, macrocephaly, defects in pancreas, lung, heart, cornea, etc. SNP. | Ext1−/−Ext2−/− mice die at embryo due to gastrulation failure. Ext1−/− form several exostoses. | Cartilage and bone. | [56,175] |
EXT2 | Hereditary multiple exostoses (less severe that that caused by EXT1. | Ext1−/−Ext2−/− mice die at embryo due to gastrulation failure. Ext1−/− form several exostoses. | Cartilage and bone. | [56,57,175] |
NDST1 | Compound heterozygous mutations cause developmental delays, muscular hypotonia, ataxia, history of seizures, intellectual disability, epilepsy, gastroesophageal reflux, minor malformation, etc. Partially compensated by NDST2 except in brain (expressing NDST3 and 4), and skeletal muscle. | Ndst1−/− perinatally die by respiratory failure and show skeletal, brain, and heart defects. Heterozygous mice show severe developmental defects of the forebrain and forebrain-derived structures. | Brain, lung, skeletal and hearth muscles, stomach, etc. | [59,62] |
GLCE | No pathogenic variants reported possibly because lethal. SNP associations with hypertension, and cerebrovascular events. | Neonatally lethal with severe developmental abnormalities on kidney, lung, skeleton, spleen, thymus, lymph node, etc. | Several organs: kidney, lung; skeleton, spleen, thymus, lymph node, etc. | [13,58,176] |
HS2ST1 | No pathogenic variants reported possibly because lethal. SNP has been associated to n birds and pig has been associated with low metabolic rate and longevity in pigs and birds. | Neonatally lethal with onset of abnormalities after mid-gestation leading to traits including complete failure of kidney development. | Several tissues, particularly kidney. | [61,177] |
HS6ST1 | No pathogenic variants reported possibly because lethal. SNP association with delayed puberty and idiopathic hypogonadotrophic hypogonadism. | Lethal at late embryonic stages, with abnormalities in lung morphology, angiogenesis, and retinal axon guidance. If viable, mice showed growth retardation and lung defects. | Several organs: lung, kidney, cartilage, vascular system, reproduction organs, etc. | [60] |
SULF2 | No pathogenic variants reported. SNP associations with hypertension, and cerebrovascular events. | Significant lethality with reduction in brain mass during neuronal development with some kidney, lung, and skeletal defects. SNP are reported as risk factor for altered regulation of lipoprotein metabolism. | Various, including brain, kidney, lung, skeletal tissue, etc. | [176,178] |
GROUP III Non-essential widely expressed | ||||
EXTL2 | No pathogenic variants reported. | Normal embryonic development. Healthy during adult life but show altered recovery after tissue injury. | Alters tissue homeostasis when lost. | [93,94] |
SULF1 | No pathogenic variants reported. SNP association with multiple sclerosis, fetus failure in IVF technique, and Preeclampsia. | Non-overt phenotype, viable with no phenotypic or histological defects. | Alters tissue homeostasis when lost. | [42,87,96,97] |
HPSE | No pathogenic variants reported. SNP association with chronic graft-versus-host disease. | Viable, anatomically normal, and fertile. | Alters tissue homeostasis when lost. | [98,100,179] |
Gene . | Traits of human clinical variants . | Phenotype of the null mouse . | Most affected organs or tissues . | Reference . |
---|---|---|---|---|
GROUP I Essential widely expressed | ||||
EXTL3 | Severe autosomal recessive skeletal dysplasia, including epispondylo-metaphyseal dysplasia, with developmental delay, immunodeficiency, and neuromotor and brain development delay. | Embryonically lethal at around 9 days post-fecundation with HS non-detected and over production of CS in chondrogenic tissues. | All tissues. | [55,174] |
EXT1 | Hereditary multiple exostoses (multiple cartilaginous tumors), scoliosis, seizures, macrocephaly, defects in pancreas, lung, heart, cornea, etc. SNP. | Ext1−/−Ext2−/− mice die at embryo due to gastrulation failure. Ext1−/− form several exostoses. | Cartilage and bone. | [56,175] |
EXT2 | Hereditary multiple exostoses (less severe that that caused by EXT1. | Ext1−/−Ext2−/− mice die at embryo due to gastrulation failure. Ext1−/− form several exostoses. | Cartilage and bone. | [56,57,175] |
NDST1 | Compound heterozygous mutations cause developmental delays, muscular hypotonia, ataxia, history of seizures, intellectual disability, epilepsy, gastroesophageal reflux, minor malformation, etc. Partially compensated by NDST2 except in brain (expressing NDST3 and 4), and skeletal muscle. | Ndst1−/− perinatally die by respiratory failure and show skeletal, brain, and heart defects. Heterozygous mice show severe developmental defects of the forebrain and forebrain-derived structures. | Brain, lung, skeletal and hearth muscles, stomach, etc. | [59,62] |
GLCE | No pathogenic variants reported possibly because lethal. SNP associations with hypertension, and cerebrovascular events. | Neonatally lethal with severe developmental abnormalities on kidney, lung, skeleton, spleen, thymus, lymph node, etc. | Several organs: kidney, lung; skeleton, spleen, thymus, lymph node, etc. | [13,58,176] |
HS2ST1 | No pathogenic variants reported possibly because lethal. SNP has been associated to n birds and pig has been associated with low metabolic rate and longevity in pigs and birds. | Neonatally lethal with onset of abnormalities after mid-gestation leading to traits including complete failure of kidney development. | Several tissues, particularly kidney. | [61,177] |
HS6ST1 | No pathogenic variants reported possibly because lethal. SNP association with delayed puberty and idiopathic hypogonadotrophic hypogonadism. | Lethal at late embryonic stages, with abnormalities in lung morphology, angiogenesis, and retinal axon guidance. If viable, mice showed growth retardation and lung defects. | Several organs: lung, kidney, cartilage, vascular system, reproduction organs, etc. | [60] |
SULF2 | No pathogenic variants reported. SNP associations with hypertension, and cerebrovascular events. | Significant lethality with reduction in brain mass during neuronal development with some kidney, lung, and skeletal defects. SNP are reported as risk factor for altered regulation of lipoprotein metabolism. | Various, including brain, kidney, lung, skeletal tissue, etc. | [176,178] |
GROUP III Non-essential widely expressed | ||||
EXTL2 | No pathogenic variants reported. | Normal embryonic development. Healthy during adult life but show altered recovery after tissue injury. | Alters tissue homeostasis when lost. | [93,94] |
SULF1 | No pathogenic variants reported. SNP association with multiple sclerosis, fetus failure in IVF technique, and Preeclampsia. | Non-overt phenotype, viable with no phenotypic or histological defects. | Alters tissue homeostasis when lost. | [42,87,96,97] |
HPSE | No pathogenic variants reported. SNP association with chronic graft-versus-host disease. | Viable, anatomically normal, and fertile. | Alters tissue homeostasis when lost. | [98,100,179] |
Genes are organized based on their expression in human tissues and on the traits of their clinical variants and/or phenotype of the corresponding null mice.
Gene . | Human clinical variant . | Null mice phenotype . | Most affected organs or tissues . | Reference . |
---|---|---|---|---|
GROUP II Essential restricted expression | ||||
HPSE2 | Urofacial syndrome (UFS, Ochoa disease). | Viable, anatomically normal, and fertile. | Urinary bladder. | [78,79,180] |
GROUP IV Non-essential restricted expression | ||||
EXTL1 | No pathogenic variants reported. SNP association with risk of muscle lipid composition. | Not reported. However, overexpression alters B-cell maturation. | In immune system. | [181] |
NDST2 | No pathogenic variants reported. SNP association with risk of coronary artery disease and chronic kidney disease. | Viable, healthy, and fertile. Mast cell shows decreased sulfation in HS. | Mast cells. | [132,182] |
NDST3 | No pathogenic variants reported. SNP association with risk of mental disorders as schizophrenia and bipolar disorders. | Develop normally, are fertile, and show subtle hematological and behavioral abnormalities, with reduced anxiety. | Brain. | [112–114] |
NDST4 | No pathogenic variants reported. SNP association with risk of reading disability and language impairment. | No pathological outcomes, but some anomalies observed in histology of proximal colon (gene is not expressed). | In cartilage, brain, immune system. | [108,109] |
HS6ST2 | No pathogenic variants reported. SNP association with risk of X-linked intellectual disability. | Viable, healthy, and fertile. | Brain and cartilage. | [116] |
HS6ST3 | No pathogenic variants reported. SNP association with lower risk to obesity and diabetic retinopathy. | Not reported. | Brain, eye. | [118,183] |
HS3ST1 | No pathogenic variants reported. SNP association with risk of arteriosclerosis, coronary artery diseases, and Alzheimer’s disease. | Viable, healthy, and fertile but show proinflammatory phenotype when submitted to pathologic stimuli. | Vascular system and brain. | [184,185] |
HS3ST2 | No pathogenic variants reported. Expression is increased in Alzheimer’s disease. SNP association with risk of schizophrenia. | Viable, healthy, and fertile. Responds to adrenergic response to light stimuli. | Brain. | [15,149,186] |
HS3ST3A1 | No pathogenic variants reported. SNP association with risk of HSV1 and P. falciparum infection. | Viable, healthy, and fertile. Minor alterations in adult salivary gland function. | Vascular system. Kidney | [187,188] |
HS3ST3B1 | No pathogenic variants reported. SNP association with risk of P. falciparum infection. | Viable, healthy, and fertile. Minor alterations in adult salivary gland function. | Bain and Immune system. peripheral T lymphocytes and Jurkat T cells response. | [152,187–189] |
HS3ST4 | No pathogenic variants reported. SNP association with risk of altered verbal declarative memory, age-related macular degeneration, schizophrenia, and Alzheimer’s disease. | Not reported. | Brain. | [125,128,183,190] |
HS3ST5 | No pathogenic variants reported. SNP association with risk of intellectual disability and gray matter volume in schizophrenia. | Not reported. | Brain. | [128,191] |
HS3ST6 | No pathogenic variants reported. SNP association with risk of altered response to stimuli that triggers angioedema. | Viable, healthy, and fertile. | Skin, vascular system. | [129,130,187] |
Gene . | Human clinical variant . | Null mice phenotype . | Most affected organs or tissues . | Reference . |
---|---|---|---|---|
GROUP II Essential restricted expression | ||||
HPSE2 | Urofacial syndrome (UFS, Ochoa disease). | Viable, anatomically normal, and fertile. | Urinary bladder. | [78,79,180] |
GROUP IV Non-essential restricted expression | ||||
EXTL1 | No pathogenic variants reported. SNP association with risk of muscle lipid composition. | Not reported. However, overexpression alters B-cell maturation. | In immune system. | [181] |
NDST2 | No pathogenic variants reported. SNP association with risk of coronary artery disease and chronic kidney disease. | Viable, healthy, and fertile. Mast cell shows decreased sulfation in HS. | Mast cells. | [132,182] |
NDST3 | No pathogenic variants reported. SNP association with risk of mental disorders as schizophrenia and bipolar disorders. | Develop normally, are fertile, and show subtle hematological and behavioral abnormalities, with reduced anxiety. | Brain. | [112–114] |
NDST4 | No pathogenic variants reported. SNP association with risk of reading disability and language impairment. | No pathological outcomes, but some anomalies observed in histology of proximal colon (gene is not expressed). | In cartilage, brain, immune system. | [108,109] |
HS6ST2 | No pathogenic variants reported. SNP association with risk of X-linked intellectual disability. | Viable, healthy, and fertile. | Brain and cartilage. | [116] |
HS6ST3 | No pathogenic variants reported. SNP association with lower risk to obesity and diabetic retinopathy. | Not reported. | Brain, eye. | [118,183] |
HS3ST1 | No pathogenic variants reported. SNP association with risk of arteriosclerosis, coronary artery diseases, and Alzheimer’s disease. | Viable, healthy, and fertile but show proinflammatory phenotype when submitted to pathologic stimuli. | Vascular system and brain. | [184,185] |
HS3ST2 | No pathogenic variants reported. Expression is increased in Alzheimer’s disease. SNP association with risk of schizophrenia. | Viable, healthy, and fertile. Responds to adrenergic response to light stimuli. | Brain. | [15,149,186] |
HS3ST3A1 | No pathogenic variants reported. SNP association with risk of HSV1 and P. falciparum infection. | Viable, healthy, and fertile. Minor alterations in adult salivary gland function. | Vascular system. Kidney | [187,188] |
HS3ST3B1 | No pathogenic variants reported. SNP association with risk of P. falciparum infection. | Viable, healthy, and fertile. Minor alterations in adult salivary gland function. | Bain and Immune system. peripheral T lymphocytes and Jurkat T cells response. | [152,187–189] |
HS3ST4 | No pathogenic variants reported. SNP association with risk of altered verbal declarative memory, age-related macular degeneration, schizophrenia, and Alzheimer’s disease. | Not reported. | Brain. | [125,128,183,190] |
HS3ST5 | No pathogenic variants reported. SNP association with risk of intellectual disability and gray matter volume in schizophrenia. | Not reported. | Brain. | [128,191] |
HS3ST6 | No pathogenic variants reported. SNP association with risk of altered response to stimuli that triggers angioedema. | Viable, healthy, and fertile. | Skin, vascular system. | [129,130,187] |
Genes are organized based on their expression in human tissues and on the traits of their clinical variants and/or phenotype of the corresponding null mice.
Little is known about how the HSPG biosynthetic machinery is organized to produce HS specific structures that sustain different essential or non-essential biological roles, that is how the biochemistry of HSPG biosynthesis may explain their functions in physiology and actions in pathology. Moreover, the biochemical basis of why genetic variability in only a subset of these genes leads to deleterious or lethal diseases, whereas variability in the remaining genes is not deleterious and only marks vulnerability to disease is unknown. This review highlights the data which provide an explanation for these questions, which in turn leads to a model of HSPG biosynthesis. The model indicates that the synthesis by the essential genes of HSPGs is required in development and to maintain physiological homeostasis, whereas the non-essential genes enable the synthesis of HSPGs with tissue-specific functions, which appear to be related to higher order functions, e.g., cognition, behavior, and response to external stimuli.
Overview of HS biosynthesis
While HSPG core proteins are encoded by a single gene, i.e. HSPG2, AGRN, COLXVIIA, SRGN, GPCs, or SDCs (all full time HSPG) [4], the synthesis of the GAG-CP linker and its associated HS chain requires the expression of multiple genes (Figure 1). The GAG-CP linker biosynthesis requires the glycosyltransferases XYLT1, XYLT2, B4GALT7, B3GALT6, B3GAT3 [5,6], the kinase FAM20B [7], and the phosphatase PXYLP [8]. In contrast, nucleosidase CANT1 [9], nucleoside transporters SCL35A3 and SCL35A3 [10], and ion transporters SLC10A7 [11] are required from the first step of the synthesis of both the GAG-CP linker and the GAG (HS and CS) chains. After formation of the GAG-CP linker, HS chain synthesis commences [12]. Briefly, chain polymerization starts by the addition of N-acetyl glucosamine (GlcNAc) to the GAG-CP linker through the action of EXTL3. Then, chain elongation is assured by the sequential addition of glucuronic acid (GlcA) and GlcNAc, catalyzed by EXT1 and EXT2. Extensive modifications follow N-sulfation of GlcNAc residues by N-deacetyl-N-sulfotransferases NDST1, NDST2, NDST3, and/or NDST4, epimerization of some glucuronic acid units (GlcA) to iduronic acid (IdoA) by C5 epimerase (GLCE) [13], and O-sulfation of the glycan chain by 2-O-sulfotransferase HS2ST1, 6-O-sulfotransferases HS6ST1, HS6ST2, and HS6ST3, and 3-O-sulfotransferases HS3ST1, HS3ST2, HS3ST3A1, HS3ST3B1, HS3ST4, HS3ST5, and HS3ST6 [14,15]. In the Golgi, synthesis of HS chains can be terminated by HS-unproductive glycosyltransferases EXTL2 and EXTL1 [16], whereas outside the cell the two 6-O-sulfatases SULF1 and SULF2 and heparanase (HPSE) can, respectively, remove 6-O-sulfates and cleave the HS chain [17–19]. A resumed review on each of these genes is available in SI2.
Schematic representation of HSPG biosynthesis and the genes coding for the biosynthetic machinery
Distinct structural domains are found in the HS chain [20], and these are considered to be the product of the ‘Major Pathway’ of HS chain biosynthesis [1,21]: ‘NA’ domains (tracts of GlcA-GlcNAc disaccharides occasionally sulfated by the minor pathway); transition or ‘NA/NS’ domains having one disaccharide in two or three with N-sulfated GlcNS, with some GlcA epimerized to IdoA and some O-sulfation; ‘NS’ domains containing contiguous GlcNS disaccharides and have the highest levels of IdoA and O-sulfation, though both these remain well below the maximum possible. This domain model (discussed in [1]), embraces a vast diversity as to the length, number and level of sulfation of these domains. It is worth noting that while exceptionally diverse at the structural level, HS are usually referred as a single molecular species (mostly called ‘heparan sulfate’ rather than ‘heparan sulfates’), whereas they are a family of molecules. It is this structural diversity that enables the HS chains of PGs to bind and so regulate the activities of over 800 extracellular proteins [1]. The GAGosome concept, whereby the biosynthetic enzymes were clustered into a higher order structure was proposed [22] to account, amongst others, for the speed of biosynthesis of such HS chains.
Essential and non-essential HSPG biosynthetic genes
To delve into the structural diversity of HS chains from a physiological/phenotypic perspective, the consequences of genetic variability in the 51 Human genes required for HSPG biosynthesis to health status were reviewed using the ClinVar archive (ClinVar), Orphanet, and dbSNP databases (Figure S1). Retrieved information was confirmed with the Database of Genomic Variants Archive (DGVa), web platforms such as Online Mendelian Inheritance in Man (OMIM), and by the review of clinical case reports and of available literature relating to the phenotypes of the corresponding gene null mouse (Figure S1). The result was that the 51 genes formed two clusters (Figure 2). The first cluster, here referred as ‘Essential’, included all genes in which variability led to a pathologic condition and/or for which the corresponding null mouse had a lethal or overt phenotype (Tables 1-3 and SI2). The second cluster, here referred as ‘Non-essential’ (Figure 2), included all genes for which genetic variability was not causative of human diseases or syndromes and/or for which the corresponding null mouse was viable, fertile, lived normally, and showed no overt phenotype (Tables 1, 3 and 4). Interestingly, single nucleotide polymorphisms (SNPs) in several of the non-essential genes were associated with vulnerability to develop altered behaviors, infection, or complex diseases that commonly develop with ageing (Tables 1, 3 and 4; SI2). Similarly, although no deleterious phenotypes have been observed in the corresponding non-essential gene null mice, these animals showed altered responses to external stimuli leading to altered behaviors or altered vulnerability to pathological incursions (Tables 1, 3 and 4; SI2).
Essential and non-essential HSPG biosynthetic genes
Ubiquitous and tissue restricted HSPG biosynthetic genes
To establish a relation between clinical variants and the expression levels of the essential and non-essential genes in human tissues and organs, the 51 genes were re-clustered depending on their expression, as documented by RNAseq databases of healthy human organs and tissues (Figs S1 and Figure 3). As most of these genes are altered in most if not all cancer types [19,23,24], changes in expression of the 51 genes in tumor development and growth was not considered. Genes with transcripts detectable in all analyzed tissues/organs were considered as ‘widely expressed’ (ubiquitous), whereas genes whose expression was only detected in some tissues/organs were considered to have ‘restricted expression’ or be ‘specialized’ (Figure 3). The clustering based on clinical traits associated with genetic variants followed by re-clustering based on sites of gene expression resulted in the 51 genes falling into 4 Groups (Groups I-IV) (Figure 4).
The human HSPG biosynthetic machinery organized by gene essentiality and expression levels
Simplified organization of the HSPG biosynthetic genes
Group I. Essential HSPG biosynthetic genes that are ubiquitously expressed
The Group I cluster consisted of genes encoding for proteins that are widely expressed and are established as essential for normal development and homeostasis (Figures 3 and 4, Tables 1-3 and SI 2). This group includes genes encoding for the pericellular core proteins perlecan (HSPG2) [25–28], agrin (AGRN) [29–32], and collagen 18 (COL18A1) [33–35] (Table 1 and SI2), for the membrane-associated core proteins glypicans 3, 4, and 6 (GPC3 [36–38], GPC4 [39,40], GPC6 [41–43]) (Table 1 and SI2), and for all the genes encoding proteins involved in the synthesis of the GAG-CP linker including glycosyl UDP-Xyl transferases (XYLT1 [44,45] and XYLT2 [46]), UDP-Gal transferases (B4GALT [47,48] and B3GALT6 [49,50]), UDP-GlcA transferases (B3GAT3 [51]), the kinase FAM20B (FAM20B) [52], and the phosphatase PXYLP1 (PXYLP1) [8], as well as other Golgi enzymes or ion and nucleotide transporters (CANT1 [9], SLC10A7 [11], SLC26A2, SCL35A2 and SCL35A3 [10,53]) (Table 2 and SI2). Importantly, Group I also includes genes encoding for the HS glycosyltransferases involved in HS chain initiation and elongation (EXTL3 [54,55], EXT1 [56], EXT2 [57]), for epimerase (GLCE) [58], and for three sulfotransferases (NDST1, HS2ST1, and HS6ST1) [59–62] (Table 3 and SI2), thus enabling the synthesis of the more common N-, 2-O- and 6-O- sulfated sequences in HS. We term this class of HS chain ubiquitous (uHS), as their biosynthetic enzymes are expressed in all tissues.
Since Group I genes are essential, uHS can then be expected to be required for normal development and homeostasis. uHS fine structures will be diverse depending on the relative level of expression of these genes, on their subsequent translation and posttranslational regulation, and on the activity of the encoded enzyme in each cell or tissue [63–66]. Analysis of the in vitro activity of NDST1 shows that this enzyme synthesizes tracts of continuous N-sulfated glucosamine responding to the definition of NS domains [60,67–71], although moderately sulfated compared to the most highly sulfated motifs made in conjunction with NDST2 [69,70,72,73]. This is consistent with the analysis of disaccharides of HS from mouse skeletal muscle, a tissue which expressed very low levels of HS6ST2 and does not express detectable levels of NDST2-4 [74] (Figure 3). The disaccharides in the latter HS also indicate that uHS unsulfated NA domains contain significant IdoA [67,75], consistent with the activity of the minor pathway [1,21]. Moreover, the in vitro analysis of the substrate specificities of the HS6STs indicate that the essential HS6ST1, in contrast to HS6ST2 and HS6ST3, has a preference for non-sulfated IdoA residues adjacent to the target GlcNS residue [69,70,73]. Thus, together with the absence of HS3STs from Group 1, these data indicate that the NS domains of uHS will have a lower sulfate content than the HS synthesized in the presence of enzymes in Group III, which are described below. There remain important open questions: (i) do differences in the activities of the NDSTs and of the HS6STs impact on the length and prevalence of NS and NA/NS domains? (ii) do the essential core proteins in Group I carry HS chains made of uHS sequences in all cells and tissues, or also carry specialized HS (sHS) sequences in the same chains? (iii) does the presence of SULF2 in Group 1 have consequences to the formation or processing of uHS sequences?
Group II. Essential HSPG biosynthetic genes that are restrictedly expressed
Only one gene, HPSE2, was found to cluster in Group II. This encodes HPSE2 (also called heparanase 2), predominantly expressed in urinary bladder, prostate and endometrium (Figure 3). Compared with HPSE, which clustered in Group III (see below), HPSE2 does not exhibit enzymatic activity, but binds to HS with higher affinity [76]. HPSE2 may then compete with HPSE for HS processing, protecting HS from HPSE and thus regulating HS processing in the extracellular space. This is supported by the proposed HPSE2 capacity to inhibit HPSE enzymatic activity [76,77], underlining its role in HS remodeling in tissues where it is expressed. Accordingly, HPSE2 loss of function mutations are responsible for the urofacial syndrome (UFS, Ochoa disease) (Table 4 and SI2) [78], suggesting an essential role in the fine functioning of urinary bladder, in which it shows its higher expression (Figure 3). As expected, Hpse2−/− mice exhibit similar phenotypes and die within one month after birth (Table 4 and SI2) [79]. Further studies are needed to understand the physiological mechanisms of action of HPSE2, particularly whether it involves interactions with HPSE and/or HSPG.
Group III. Non-essential HSPG biosynthetic genes that are ubiquitously expressed
Genes in Group III were characterized by being widely expressed, but non-essential for normal development and homeostasis (Figures 2 and 3, Tables 1 and 3 and SI2). However, as for those in Group IV (see below), these genes are required for appropriate tissue responsiveness to stimuli and SNPs in them can increase or decrease vulnerability to peripheral diseases, infection, neurological disorders, etc. (Tables 1 and 3 and SI2). All the genes encoding syndecan (SDC1-4) core proteins [80–90] and one glypican, GPC1 [91,92] clustered in Group III (Figures 3 and 4, Table 1 and SI2). In relation to HS chain biosynthesis, the Group III only clustered genes whose products are involved in HS chain remodeling EXTL2 [93,94], SULF1 [87,95–97], and HPSE [98–100]. The encoded proteins control chain length by arresting HS chain initiation in the Golgi (EXTL2), by extracellularly processing 6-O-sulfation (SULF1), and by extracellularly cleaving HS chain (HPSE) (Figures 2-4, Table 3 and SI2). Open questions on these genes include whether the Group III core proteins carry uHS, sHS or both.
Group IV. Non-essential HSPG biosynthetic genes that are restrictedly expressed
The Group IV cluster included non-essential genes expressed in a tissue restricted manner. Genes in this group include those coding for two HSPG core proteins GPC2 and GPC5 (GPC2 [101–103] and GPC5 [104–106]), NDST2-4 [107–114], HS6ST2 [115,116] and HS6ST3 [117,118], and all seven HS3STs [119–127] (Figure 3 and 4, Tables 1 and 4; SI2). The Group IV genes are not essential for normal development and homeostasis and are not associated with disease. Accordingly, when available, Group IV gene null mice develop normally, are fertile, and show non deleterious phenotypes (Tables 1 and 4; SI2). It is interesting to note that all the HS sulfotransferases encoding genes that do not cluster in Group I are in this Group IV (Figure 4). It is also noteworthy that both 6-O-sulfotransferases in this group (HS6ST2 and HS6ST3) can have as substrates HS sequences with higher 2-O-sulfation and GlcA adjacent to the GlcNS [69,70], whereas the HS3STs in this Group can add this rarer modification, which usually results in two sulfates being on adjacent (C2-N and C3) carbons surrounded by other sulfations in more distant positions that might contribute to specific 3-O-sulfated patterns in sHS sequences [119–130]. This indicates that the Group IV HS6ST2, HS6ST3, and HS3STs can generate in sHS more highly sulfated sequences than those found in uHS [15,67,69–71,73,75,111,131–135]. These sequences include the trisulfated disaccharide common in heparin. Heparin, used clinically as an anticoagulant, is a sHS produced by mast cells, which express NDST2, and is also characterized by carrying long NS domains. Thus, NDST2 expression is associated with more extensive NS domains than those produced by the Group I NDST1 [132]. The longer NS domains provide greater substrates for the Group IV O-sulfotransferases [132]. There is clearly a need for detailed characterization of the catalytic activities of NDST2-4 alone, and in combinations with each other and NDST1, to understand how they affect the lengths of NS domains, as well as the NS domains number and position along a HS chain. Moreover, there is the possibility that NDST2-4 may associate differently with other proteins involved in HS biosynthesis, so enabling the synthesis of different specific sHS structures in individual tissues and cells.
Interestingly, SNPs in Group IV genes have been associated with vulnerability or resistance of the specific tissue or organ in which the gene is expressed to develop disease or altered behaviors (brain regions), and to the capacity of the expressing cells and tissues to respond to stimuli (Tables 1 and 4; S12). These processes include inflammation, immunity, infection, and neurological stimuli leading to altered adaptative behaviors. Examples of the latter are alertness and satiety during feeding, and response to stress (Tables 1 and 4; SI2). Accordingly, genetic variation in humans and analysis of null mice globally indicates that particular classes of sHS are required for adaptive behaviors in brain. In this way the Group IV genes provide additional fine structural specificities to HS, such that sHS may have more subtle interactions with protein partners necessary for such specific or fine responses.
Implications of the new model of HS biosynthesis: pathway flux and protein-protein interactions explain phenotypes
Genetic defects alter substrate flux through the branches of the GAG-CP linker biosynthesis pathway
The GAG-CP linker tetrasaccharide is synthetized on the core protein by a biosynthetic pathway common to HSPGs and CSPGs [14]. A family of diseases called linkeropathies are caused by genetic variability in the genes associated with the synthesis of this linker. Because of this shared starting point, it would be expected that defects in any of the GAG-CP linker biosynthetic genes would affect production of both PG types in all tissues. However, linkeropathies are mainly characterized by phenotypes indicative of altered CSPG biosynthesis, such as skeletal and soft tissue-associating phenotypes [136]. The analysis of data on clinical variants, transcript expression, enzyme selectivities, and protein clusters (Table 2, Figure 3, SI1) provides an explanation for these counterintuitive observations (Figure 5): the linker biosynthetic pathway is regulated such that if substrate flux is altered, as in the case of a genetic defect, the substrate pool will predominantly deserve the HS pathway at the expense of the CS pathway (as shown by arrow thickness in Figure 5A,B); the synthesis of CSPG is thus sacrificed to maintain HSPG synthesis. Pathway flux is regulated by substrate-enzyme and protein-protein affinities which define how the substrate pool is channeled to the different branches (Figure 5A,B). This regulates not only the relative synthesis of HSPG vs CSPG, but also channeling of substrate between the CSPG-aggrecan/neurocan (skeletal phenotypes) vs the CSPG-decorin branches (skin phenotypes) and between the CSPG-aggrecan vs CSPG-neurocan branches, with the overarching consideration that HSPG synthesis is always privileged.
Schematic representation of HSPG biosynthesis
Effect of XYLT1 defects on pathway channeling
Biosynthesis of GAG-CP linker is initiated by addition of a xylose (Xyl) from UDP-Xyl to specific serine residues in core proteins (Ser-CP) [14,137]. This is catalyzed by XYLT1 (XYLT1) during development and by both XYLT1 and XYLT2 (XYLT2) after birth [45] (Figures 1 and 5A). During development, XYLT1 is highly expressed in chondrocytes [45], the skeletal precursor cells that produce the necessarily large quantities of CSPG-aggrecan required for the cartilage template supporting skeletal formation. Interestingly, XYLT1 binds to the aggrecan core protein with lower affinity than to the core protein of the small leucine rich (SLRP) CSPG decorin [44,138]; the latter stabilizing collagen fibrils in soft tissues including skin and cornea [139]. During development, when synthesis of the aggrecan core protein is very high, the higher affinity of XYLT1 for decorin over aggrecan divides the flux through the GAG-CP linker biosynthetic pathway to CSPG-decorin and HSPG/CSPG-aggrecan branches (Figure 5A). Consequently, when XYLT1 is defective, the flux through the decorin branch will be maintained whereas that through the aggrecan branch will be reduced. Consistent with this interpretation, XYLT1 variability results in Desbuquois dysplasia type 2 (known as Baratela–Scott syndrome) [44], a severe skeletal growth retardation with multiple dislocations, joint laxity, advanced carpal ossification, with absence of skin overt phenotypes (CSPG-decorin branch is not affected) (Table 2, and SI2). After birth, under physiological conditions, a decrease in XYLT1 expression is accompanied by expression of XYLT2, which shows similar affinity for both decorin and aggrecan core proteins. Thus, although the two enzymes (XYLT1 and XYLT2) are expressed in all tissues after birth, but at different extents (Figure 3), post-natal XYLT1 expression can in part compensate for XYLT2 defects. Accordingly, XYLT2 variants cause the post-developmental spondyloocular disorder characterized by osteoporosis, cataract, hearing, and learning impairment, with non-skeletal traits [46,140] (CSPG-aggrecan branch is not affected).
Effect of defects in B4GALT7 and in B3GALT6 on pathway channeling
After Xyl addition, the next step in GAG-CP linker biosynthesis is the addition of galactose (Gal) by B4GALT7 to Xyl-(Ser-CP) [141] (Figure 5B). B4GALT7 has a similar affinity for aggrecan and decorin core proteins, indicating that defective B4GALT7 should affect both CSPG-decorin and CSPG-aggrecan branches. Accordingly, B4GALT7 variants lead to B4GALT7-related Ehlers-Danlos syndrome (spondylodysplastic EDS type 1) [48], whose traits are similar to those caused by XYLT1 mutations (CSPG-aggrecan branch affected) with additional soft tissues overt phenotypes including hyperextensible, soft, thin, translucent, and doughy skin (CSPG-decorin branch also affected) (Table 2 and SI2) [48]. In the HSPG/CSPG-aggrecan branch, the resultant Gal-Xyl-(Ser-CP) is then phosphorylated by FAM20B to form Gal-Xyl(P)-(Ser-CP) (Figure 5B), which promotes both HSPG and CSPG-aggrecan synthesis [52,142]. This is consistent with the lethality observed in FAM20B clinical variants, which possess a phenotype similar to that observed in XYLT1-Desbuquois dysplasia but that additionally affect most organs and both CSPG-aggrecan and HSPG biosynthesis [52] (Figure 5B).
The following step in GAG-CP synthesis, is the transfer of a second Gal to Gal-Xyl(P)-(Ser-CP) and to Gal-Xyl-(Ser-CP) by B3GALT6 [141], which provides substrates to both the CSPG-decorin and the HSPG/CSPG-aggrecan pathway (Figure 5B). Consistent with this, B3GALT6 mutations lead to B3GALT6-related spondylodysplastic Ehlers-Danlos syndrome (spEDS), characterized by bone dysplasia, joint laxity, and mild skin hyper elasticity [50] (CSPG-aggrecan and CSPG-decorin branches are both affected) (Table 2 and SI2). Then, B3GAT3 acts at different levels of the overall GAG-CP pathway depending on the availability of both the substrate pools and of its partner PXYLP1. In the HSPG/CSPG-aggrecan branches, when B3GAT3 is complexed to PXYLP1, for which the enzyme shows the higher affinity, GlcA addition takes place followed by immediate dephosphorylation of the Xyl(P) to form GlcA-Gal-Gal-Xyl-(Ser-CP), the preferred substrate for EXTL3 that initiates HS chain biosynthesis [8,142,143] (Figure 5B,C). It is noteworthy that the EXTL3 substrate GlcA-Gal-Gal-Xyl-(Ser-CP) can also be recognized by CSGALNACT1, although with lower affinity, sending the substrate pool excess to the CSPG-aggrecan branch (Figure 5B). As the level of expression of B3GAT3 is higher than that of PXYLP1 in many if not all tissues (Figure 3), B3GAT3 unbound to PXYLP acts on the residual pool of Gal-Gal-Xyl(P)-(Ser-CP) to form GlcA-Gal-Gal-Xyl(P)-(Ser-CP), which is also a good substrate for CSGALNACT1, which recognizes the phosphorylated substrate to send it to the CSPG-neurocan branch. CSGALNACT1 can also be bound to PXYLP that dephosphorylates the GlcA-Gal-Gal-Xyl(P)-(Ser-CP) substrate, thus supporting the CSPG-aggrecan branch [8,142,143] (Figure 5B). On the other hand, B3GAT3 also acts in the CSPG-decorin branch, characterized by the absence of the phosphate group (substrate pool that was not subjected to FAM2B phosphorylation) and where sulfation of the tetrasaccharide linker may occur by an as yet unknown sulfotransferase(s), as shown by analysis of the GAG-CP linker of CS chains isolated from different CSPG [142].
The flux through these complex interlinked pathways occurs at different rates and so result in the mixed skeletal and soft tissue phenotypes observed in B3GAT3 clinical variants responsible for Larsen, Antley-Bixler, Shprintzen-Goldberg, Geroderma osteodysplastica, and spEDS [51]. These functional data are consistent with the linkeropathies phenotypic continuum [51], in which flux through the HSPG pathway is maintained at the expense of flux through the CSPG pathway and so genetic alterations primarily affect the synthesis of different CSPG species, with HSPG biosynthesis being affected in the more severe cases. In any case, HSPG biosynthesis is privileged unless EXTL3 variants lead to a defective enzyme or cellular location to favor the CSPG-aggrecan branch (Figure 5B,C) [8,142,143].
Effect of defects in CP-GAG linker biosynthesis supporting functions on pathway channeling
In addition to variability in the genes encoding the enzymes directly involved in the synthesis of the GAG-CP linker genes, genetic variability in ion and nucleotide transporters and other Golgi proteins that provide the sugar nucleotides and ions necessary for these enzymes functions, can dramatically alter the biosynthesis of the GAG-CP linker [144,145]. For instance, the calcium activated nucleotidase-1 (CANT1) is a Golgi enzyme that hydrolyses the uridine diphosphate nucleoside (UDP), which inhibits glycosyltransferases from the initial steps and all along biosynthesis, but with a higher impact in the very first steps (those involving XYLTs and B4GAT7) [145] (Figure 5B). Accordingly, CANT1 clinical variability leads to Desbuquois dysplasia traits like those observed in XYLT1 and B4GAT7 variants [9,146] (Table 2 and SI2). Moreover, because CANT1 is calcium-dependent, variants in the Ca2+ transporter SLC10A7 can result in skeletal dysplasia with traits overlapping those seen in variants of CANT1 and XYLT1 [11] (Table 2 and SI2). Similarly, clinical variants in genes coding for plasma membrane sulfate ion transporters (SLC26A2) [147], Golgi nucleotide (UDP-GlcNAc and UDP-Gal) transporters (SCL35A2 and SCL35A3), and ion transporters required for enzymatic activity of several glycosyltransferases, kinases, and sulfotransferases (Mg2+ and Mn2+) in the Golgi (Figure 5B), can result in linkeropathy-like phenotypes [10,145,148].
Non-essential HS are required for appropriate response to stimuli
NDST2, NDST3, NDST4, HS6ST2, HS6ST3, and all HS3STs, are tissue restricted sulfotransferases coded by genes clustered in Group IV (Figures 3-5D). It is important to note that while these enzymes are depicted as possibly acting after those responsible for uHS (Figure 5C,D) it is likely that in any particular tissue they might also act together in accord with the concept of the GAGosome. No clinical variants are known for any of these genes and the corresponding knockout mice, when available, have shown to develop normally, to be fertile and evolve healthy in life (Table 4 and SI2). However, SNPs in this Group of genes are associated with risk or resistance to diseases affecting the organs in which they are expressed (Tables 1 and 4). For instance, in brain, polymorphisms in HSPG biosynthetic genes has been associated with risk for mental disorders such as schizophrenia and bipolar disorders (NDST3) [112,114], reading difficulty and language impairment (NDST4) [108], intellectual disability (HS6ST2) [116], satiety and obesity regulation (HS6ST3) [114,117], and risk of Alzheimer’s disease-related traits (HS3ST1) [121,126]. Moreover, the Group IV genes that are predominantly expressed in brain have been involved in the control of response to neurologic stimuli. For example, Ndst3−/− mouse exhibited subtle impaired anxiety with no compensatory effects from other Ndst [113], and Hs3st2 expression was increased in the rat pineal gland after adrenergic stimulation [149]. In peripheral tissues, polymorphisms have been associated with coronary artery diseases (HS3ST1), ulcerative colitis and Crohn's colitis (HS3ST2) [120], chronic obstructive pulmonary disease (HS3ST2) [127], and hereditary angioedema (HS3ST6) [130]. Accordingly, mice lacking Hs3st1 shows a strong proinflammatory phenotype [124], HS3ST3A1 expression increases during normotensive and pre-eclamptic pregnancies [119], HS3ST3B1 is involved in T lymphocytes activation [150], and NDST2 is essential for mast cells responsiveness during inflammation and during innate and adaptive immunity toward pathogens [151]. Moreover, 3-O-sulfated HS (3S-HS) are established to favor infection by certain pathogens [15]. Accordingly, polymorphisms in both HS3ST3A1 and HS3STB1 have been associated with risk of P. falciparum infection [152].
Among non-essential modifications, 3-O-sulfation might finely tune physiological processes [124,153]. Two representative 3-O-sulfated heparan sulfate (3S-HS) sequences have been distinguished, one carrying the GlcA-GlcNS3S±6S disaccharide characteristic of heparin (called HSAT because their affinity to antithrombin) [15], and a second carrying the IdoA2S-GlcNS3S±6S disaccharide (called HSgD because of its interaction with the gD protein of the HSV1 virus capsid) [15,133]. HSAT is produced by HS3ST1 and HS3ST5, whereas HSgD is produced by HS3ST2, HS3ST3A, HS3ST3B, HS3ST4, HS3ST5 and HS3ST6. Although HS3STs redundancy is supported by the fact that HSgD is made by several HS3STs [153–155], lack of defects in one HS3ST seems not to be compensated by paralogue genes, likely because the requirement of specific substrate precursor pools, proposed as limiting factor for efficient activity of HS3STs in tissues in which they are expressed [156]. Alternatively, different specific structures might be surrounding the 3-O-sulfated motif in the sHS chain giving an additional degree of complexity allowing s3S-HS to act as fine-tuners on physiological responses (SI2). Thus, although similar 3S-HS structures could be made by several HS3STs [15], the present analysis suggests that depending on the tissue in which they are expressed, sulfotransferases in Group IV might produce specific HS sequences carrying different structures required for appropriate responsiveness to tissue-specific stimuli. Moreover, redundancy is not supported by genetic polymorphisms which in these genes differently affect susceptibility or resistance to disease, as suggested by the association of genetic variability to risk for various pathologic conditions. This opens a large field of research in the relation of HS to the response of cells and tissues to specific stimuli such as inflammation, immunity, infection, specific tissue vulnerability to disease as well as adaptative neurologic behaviors. With respect to the latter, in neurological diseases, the brain expresses quite a number of Group IV genes (Figure 3, Table 4 and SI2).
Conclusions
There are marked differences in the severity of the clinical manifestations of defects in the genes encoding the Group I essential core proteins. For example, the severity of defects in the three extracellular, so ECM located, core proteins can be ranked perlecan>agrin>collagen XVIII and similarly for the Group I glypican core proteins, where defects in GPC3 have more severe outcomes than those in GPC4 and GPC6 (Figures 3 and 5 and Table 1). This may reflect differences in the interactions of the respective core proteins, which are likely to be of functional significance and which would localize the core protein’s HS chains to a particular local molecular domain. In addition, the HS chains in these essential core proteins may not be able to fully functionally compensate for each other, either due to location or to HS sequences. In relation to the last point is the extent to which HS sequences (uHS and sHS) on these core proteins are equivalent in vivo.
The linkeropathies at first glance have somewhat counterintuitive phenotypes. The present analysis demonstrates that these phenotypes are a consequence of the regulation of the substrate pool flux through the branches leading to CSPG-decorin, CSPG-aggrecan or HSPG biosynthetic pathways. Thus, a reduction in the substrate linker biosynthetic flux that leads to HS biosynthesis is readily compensated for by rerouting some of the CS pathway substrate flux to secure HS biosynthesis. Interestingly, this suggests that cells are able to sense the GAGs they produce. Through feedback mechanisms, biosynthesis is then altered to restore, at least in part, the production of GAG chains essential at that stage of development, so that consequences arise later. As described above, in the linkeropathies, such sensing involves at the least substrate pool levels and protein-protein interactions in the Golgi (as XYLT1 and XYLT2 with core proteins, and as PXYLP1 with B3GAT3 or CSGALNACT1), Moreover, there are examples of adaption to genetic defects in HS biosynthesis steps beyond the GAG-CP linker. For example, the HS2ST1 null mouse would be expected to have at the least a fibroblast growth factor-2-like phenotype. Fibroblast growth factor-2 has a very strong requirement for IdoA2S to form a complex with HS and its receptor tyrosine kinase, and so trigger intracellular signaling. Whereas HS2ST1 null mice are not viable, embryonic fibroblasts from these mice respond to fibroblast growth factor 2 despite their lack of 2-O-sulfated iduronate [61]. Interestingly, HS purified from these cells have increased N- and 6-O-sulfation. This is consistent with a sensing system altering sulfation in the Golgi which enables cells to produce geometries of HS chains sufficiently similar to those in the wild-type, allowing fibroblast growth factor-2 signaling. This indicates that the conformational degeneracy of HS enables a degree of adaptation to even the loss of an essential sulfotransferase such as HS2ST1. However, lethality occurs in due course and the extent to which these sequences can replace functional uHS in their physiological context is not demonstrated.
There is then the question of the differences between uHS and sHS. The obvious difference is 3-O-sulfation. As noted, whether a core protein can simultaneously carry both uHS and sHS sequences in vivo is not known. Moreover, the effects, if any, of enzymes involved in sHS synthesis on the activities of proteins involved in uHS synthesis is not known. Pertinent to this question and the extent of redundancy in, e.g., HS3STs, are the consequences of posttranslational modification of the enzymes involved in HS biosynthesis. Known modifications include glycosylation and most recently tyrosine sulfation of HS6ST1 and 2 [157]. The latter discovery is intriguing because tyrosine sulfation is predicted to occur on all the enzymes involved in HS chain synthesis and modification, but not in the enzymes that produce CS chains [158]. Post-translational modifications (PTMs) often play important roles in regulating protein-protein interactions and it would seem reasonable to propose that PTMs may alter the associations of enzymes synthesizing specific HS sequences or chains. Hence the expression of specific Group IV enzymes in tissues would provide a means for a tissue-specific interactions between Group IV enzymes and those in Group I to generate different sHS (Figure 5). In this way the concept of the GAGosome would reflect a very dynamic entity, which would include in some instances vectorial secretion of at least some sulfotransferases (HS6ST1, HS6ST2, HS3ST2 [157,159]).
Summary
A new model for PGs biosynthesis is provided through the analysis of the consequences of genetic variation and pattern of expression of the 51 genes involved in HS and CS biosynthesis.
The model explains the CS phenotypes of linkeropathies and introduces the concept of uHS and sHS. Whereas uHS are essential and defects in their synthesis cause severe phenotypes, defects/changes in sHS are associated with altered response to stimuli and vulnerability to develop tissue specific diseases. sHS and the mechanisms involved in regulating their synthesis would afford an important target in diseases where such responses need modulation, from inflammatory conditions to ageing and Alzheimer’s disease.
Whether uHS are those that regulate the entirety of trophic and morphogenic functions assuring normal development and homeostasis is not known. It would be a challenge to demonstrate since it requires cells or animals null in all genes synthesizing sHS. Similarly, whether uHS and sHS chains are associated with particular core proteins is an open question.
The control of flux to CS and HS by means of phosphorylation of Xyl and protein-protein interactions between PXYLP1 and B3GAT3 or CSGALNACT1, allied to the discovery of protein tyrosine sulfation of HS6ST1 and HS6ST2, warrants consideration that phosphorylation and tyrosine sulfation regulation of interactions between enzymes synthesizing and modifying HS chains may be an important mechanism whereby cells sense and regulate the HS structures they produce.
The sulfotransferase(s) responsible for the sulfation of the tetrasaccharide linker and the role of this in regulating the flux of synthesis through the branches of the HS/CSPG biosynthetic pathway remain to be discovered.
Competing Interests
The authors declare that there are no competing interests associated with this manuscript.
Funding
This work has received funding from the ANR SkelGAG (18-CE14-0040-03) and ANR MAT-PL (22-CE18-0013), from the European Union’s Horizon 2020 Research and Innovation Program (grant agreement no 737390), from BBSRC awards BB/V003372/1, BB/Y003292/1, BB/T012099/1, and from the Northwest Cancer endowment. KR was founded by the DFG (BA 4036/4-1). G. Barreto was funded by the University Paris Est Créteil.
Author Contribution
DPG designed the review strategy and the synthesis model. DPG wrote the manuscript with participation from MOO, BDSF, MBH, XL, WGH, SC, GLD, NR, AM, HM, ADL, AD, OGV, KR, GB, KB, VCD, JDR, and DF. DPG, VCD, KR, and DF critically analyzed the new model for PGs biosynthesis. DPG and DF edited the final manuscript.
Acknowledgements
We thank all the ArrestAD partners, particularly Prof. T. van Kuppevelt (Radboud University, the Netherlands), Prof. Urzula Wodja (NENCKI Institute of Experimental Biology, Warsaw, Poland) and Prof. Lidia Gimenez-Llort (Universidad Autònoma de Barcelona, Spain), and to Prof. P. Albanese, Dr. A. Fifre (University Paris Est Créteil, France), Prof. J-P. Li, Prof. U. Lindahl, and Prof. L. Kjellen (Uppsala University, Sweden), for interesting discussions.
Abbreviations
- CP
Core protein
- CS
Chondroitin sulfate
- CSPG
Chondroitin sulfate proteoglycan
- DBQD
Desbuquois dysplasia
- DS
Dermatan sulfate
- DTD
Diastrophic dysplasia
- EOEE
Early-onset epileptic encephalopathies
- GAG
Glycosaminoglycan
- GAG-CP
Glycosaminoglycan-core protein
- HS
Heparan sulfate
- HSPG
Heparan sulfate proteoglycan
- HSV1
Herpes simplex virus 1
- IVF
In vitro fecondation
- PG
Proteoglycan
- LMW
Low molecular weight
- PTM
Post-translational modification
- QC
Quality control
- sCP
Specialized core proteins
- sHS
Specialized heparan sulfate
- SLRP
Small leucine rich proteoglycan
- SNP
Single nucleotide polymorphisms
- spEDS
Spondylodysplastic Ehlers-Danlos syndrome
- uCP
Ubiquitous core proteins
- UDP
Uridine diphosphate nucleoside
- uHS
Ubiquitous heparan sulfate
- uTetr
Ubiquitous GAG-CP linker tetrasaccharide