Metabolic network reconstructions define metabolic information within a target organism and can therefore be used to address incomplete metabolic information. In the present study we used a computational approach to identify human metabolites whose metabolism is incomplete on the basis of their detection in humans but exclusion from the human metabolic network reconstruction RECON 1. Candidate solutions, composed of metabolic reactions capable of explaining the metabolism of these compounds, were then identified computationally from a global biochemical reaction database. Solutions were characterized with respect to how metabolites were incorporated into RECON 1 and their biological relevance. Through detailed case studies we show that biologically plausible non-intuitive hypotheses regarding the metabolism of these compounds can be proposed in a semi-automated manner, in an approach that is similar to de novo network reconstruction. We subsequently experimentally validated one of the proposed hypotheses and report that C9orf103, previously identified as a candidate tumour suppressor gene, encodes a functional human gluconokinase. The results of the present study demonstrate how semi-automatic gap filling can be used to refine and extend metabolic reconstructions, thereby increasing their biological scope. Furthermore, we illustrate how incomplete human metabolic knowledge can be coupled with gene annotation in order to prioritize and confirm gene functions.
The wealth of genomic and metabolic information that has amounted in the past decade has highlighted a need for methods that facilitate the functional annotation of biochemicals that take part in human metabolism . Components of the human metabolome can be categorized on the basis of their degree of functional annotation into well-annotated, incompletely annotated and unknown components. The first category provides the basis for BiGG (biochemically, genetically and genomically) structured metabolic networks that have found widespread use in high-throughput data analysis, biochemical engineering and pathway discovery [2,3]. The latter two categories reflect that our knowledge of human metabolism is incomplete. Underlying these are (i) genes and proteins of unknown function, (ii) proteins that are not associated with a sequence (orphan enzymes) and (iii) metabolites, for which metabolic knowledge is incomplete. Estimates report that approximately 30–50% of all enzymes associated with EC (Enzyme Commission) numbers are not associated with genes [4–6] and that the function of over 50% of genes in higher organisms have not been experimentally determined . Dead-end metabolites, i.e. compounds whose anabolic or catabolic route within humans are not completely defined, are also numerous .
BiGG metabolic network reconstructions contain components of an organism's metabolome, from information on genes and metabolic enzymes to metabolic reactants and products, and on how these components interact . Incomplete genetic, genomic and biochemical information can be identified as network gaps in BiGG reconstructions. The two-step procedure of identifying knowledge gaps and, subsequently, suggesting candidate gap-filling reactions is referred to as network gap filling [10–12]. Because BiGG reconstructions can be converted into a mathematical format, gap filling is carried out computationally. The SMILEY gap-filling algorithm, for example, has been successfully used to unravel novel metabolic functions in Escherichia coli  and to propose novel metabolic functions in humans . Similar metabolic pathway discovery algorithms have found widespread use in biotechnology for pathway discovery .
The global human genome-scale metabolic reconstruction RECON 1 represents one of the most comprehensive BiGG metabolic reconstructions generated to date . RECON 1 has found diverse applications in human metabolic research (reviewed in ) and has the potential to facilitate novel metabolic reaction discovery through a gap-filling approach. Although successful examples exist for bacterial metabolism [9,17], the use of eukaryotic metabolic models to this end has received limited attention .
Although comprehensive, RECON 1, and indeed all genome-scale reconstructions, is incomplete as knowledge of metabolism accumulates over time. Incorporation of acquired biological data can be performed using a gap-filling approach. If the incorporated biological components are incompletely annotated, then hypotheses of their biological functions can be generated from the biological network context. In the present study, we set out to identify and characterize human orphan metabolites that were defined here as metabolites that have been detected in human biofluids but were not accounted for in RECON 1. The present study aimed to elucidate the metabolism of these orphan metabolites in humans and simultaneously expand the biological accuracy of RECON 1. To our knowledge the systematic identification and characterization of human orphan metabolites has not been carried out previously. We used a gap-filling algorithm to compute reaction hypotheses that could account for these metabolites in humans and show that the approach not only helps to reconstruct known biochemical pathways, as expected, but is also capable of generating biologically plausible non-intuitive reaction hypotheses. We exemplified this by validating one of our predictions experimentally, which resulted in the characterization of a human gluconokinase.
Pre-processing to metabolite integration
RECON 1 was downloaded from the BiGG database . The reconstruction was imported into Matlab (MathWorks) using the COBRA toolbox . Subsequently, we decompartmentalized all reactions in RECON 1 by placing all intracellular compartment reactions into the cytosol and removing duplicates. Extra-organism-located reactions were kept. We downloaded the KEGG Ligand database (version as of 10/2009)  to obtain a universal reference reaction list. For each metabolite present in RECON 1 and in the universal database, we created a transport reaction from cytosol to the extra-organism space. Furthermore, we downloaded the HMDB (Human Metabolome Database, release 2.5) . We mapped the entries from the HMDB on to RECON 1 using the BiGG identifiers provided by the HMDB.
Computation of candidate solutions using gap filling
The gap-filling algorithm has been described previously . We used the implementation provided by the COBRA toolbox. Briefly, RECON 1 served as the model to which reactions from either the universal reaction list and/or the transport reaction list should be added. In order to identify candidate anabolic and catabolic solutions for each metabolite, we added a sink (sink_A: Ø→1 A) or demand reaction (DM_A: 1 A→Ø) to the model and performed the gap-filling analysis by requiring the flux through either DM_A or sink_A to be greater than zero. Up to ten solutions for each metabolite were computed. The gap-filling algorithm requires metabolites in the reconstruction to have KEGG identifiers in order to calculate metabolic reaction solutions. This restrained the metabolites identified to metabolites from the HMDB with KEGG identifiers. Note that not all metabolites in RECON 1 have a KEGG identifier, which may also limit the identification of candidate solutions from the KEGG database.
Gap analysis and validation of the gap-filling solution output
The gap-filling algorithm generated a total of 1131 solutions for the 343 orphan metabolites identified. In the majority of cases metabolites either had five solutions or only one (Supplementary Table S1 at http://www.biochemj.org/bj/449/bj4490427add.htm). Solutions were filtered with the following criteria in mind: (i) only solutions containing gene-associated resolving reactions were picked; (ii) if no solution met this criteria, solutions with the least number of resolving reactions and leading to integration of the metabolite within the network as opposed to transport were picked; and (iii) if no such solution was found, then the solution involving the least number of resolving reactions was picked, irrespective of the resolving reaction type. In cases where two solutions met these criteria an informed decision on the basis of a literature review was made to select the most biologically relevant solution. These criteria generated a filtered candidate solution output (S1 and S2), which is reported in the Results section (Supplementary Table S2 at http://www.biochemj.org/bj/449/bj4490427add.htm).
Candidate reactions comprising the S1 and S2 solutions were investigated individually to generate plausible hypotheses for the gap filling of RECON 1. The orphan metabolites’ biological origin was determined and the reactions composing the gap-filling solutions were assessed on the basis of the underlying literature and their annotations within multiple reaction and metabolite databases. Biochemical and metabolic information was obtained from the HMDB , the HSDB (Hazardous Substances Database; http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?HSDB) and the KEGG database . For resolving reactions, not annotated in humans, BLAST homology searches of genes encoding the resolving reaction activity were performed against the human sequence (http://blast.ncbi.nlm.nih.gov/Blast.cgi) and using STRING . On the basis of literature and electronic data, solutions were classified as: (i) accepted, solutions were composed of reactions for which genetic or biochemical data had been reported in humans or closely related species; (ii) hypothesis, solutions were biochemically plausible but genetic or biochemical information was incomplete and requires experimental investigation; (iii) discarded, solutions were composed of reactions for which no evidence could be found in humans or closely related species; and (iv) no solution, the gap-filling algorithm failed to generate a solution. All metabolic reaction and subcellular compartment abbreviations are as described in the BiGG database .
Overexpression of C9orf103 and enzyme assays
C9orf103 isoform I (BI826543) and isoform II (BC142991) were directionally cloned into a pT7CFE1-CHis expression vector (Thermo Scientific) at the BamHI and XhoI restriction sites following amplification using the forward primers 5′-GGAATTCCATATGGGATCCGCGGCGCCGGGCGCGCTGCTG-3′ and 5′-GGAATTCCATATGGGATCCAAGAAACCGAGGCCCGCGGCTGC-3′ for isoforms I and II respectively, and the reverse primer 5′-TATCTCGCTCGAGTTTCATTTTTAGGGTTTCCATAATTG-3′. This generated two vector constructs, pT7CFE1-GlncKIsoI and pT7CFE1-GlnKIsoII, which, following linearization with SpeI, were used in 25 μl coupled in vitro transcription/translation reactions using the Pierce® Human In Vitro Protein Expression kit for 6 h at 30°C. Confirmation of overexpressed products was performed by Western blotting using the HisProbe-HRP reagents and kit (Thermo Scientific). GFP (green fluorescent protein) expressed from pCFE-GFP (Thermo Scientific) was used as a positive control in in vitro translation reactions and the gluconokinase assay. Following translation, lysates were diluted in 1×kinase reaction buffer [100 mM phosphate, 40 mM NaCl and 2.5 mM MgCl2 (pH 7.2)] to a volume of 50 μl and desalted using spin columns with a 7.5 kDa molecular mass cut-off. A 12 μl volume of this diluted desalted lysate was then used in gluconokinase assays. The gluconokinase assay was carried out in 33 μl reaction volumes in kinase reaction buffer with 0.1 mM or 1 mM gluconate and 1 mM ATP. Purified recombinant E. coli gluconokinase (50 ng; Megazyme, specific activity 31.8 units/mg) was added and used as a positive control. Following mixing by pipetting, reaction mixtures were immediately split into 20 μl and 13 μl aliquots. The 20 μl aliquot was incubated for 2 h at room temperature (20°C), and then frozen at −80°C for MS analysis. To the 13 μl aliquot, 2 μl of 6PGDH (6-phosphogluconate dehydrogenase) master mix was added and incubated at room temperature for 2 h. NADPH production was measured by monitoring fluorescence at 445 nm following excitation at 340 nm and interpolating values to that of an NADPH standard curve. The 6PGDH master mix resulted in the addition of 0.015 units of 6PGDH (Megazyme) and a 625 μM final concentration of NADP+. Following fluorescence measurements, a 3 μl aliquot was removed from the kinase reaction, diluted to 30 μl with kinase reaction buffer and the ATP content of the reaction was determined with a CellTiter-Glo assay (Promega) according to the manufacturer's instructions. All data are results of averaging over three replicate samples and were similar in n=3 repeats.
LC-MS (liquid chromatography-MS)
A UPLC system (UPLC Acquity, Waters) coupled in-line with a quadrupole-time-of-flight hybrid mass spectrometer (Synapt G2, Waters) was used. Separation was achieved by HILIC (hydrophilic interaction chromatography) analysis on an 1.7 μm Acquity amide column (2.1 mm×150 mm) (Waters). An electrospray ionization interface was used in negative mode to direct column eluent to the mass spectrometer. The flow rate was 0.4 ml/min. Mobile phase A was 10 mM acetonitrile/sodium bicarbonate (95:5) and mobile phase B contained 10 mM acetonitrile/sodium bicarbonate (5:95). The following elution gradient was used: 0 min at 99% A, 3 min at 20% A, 4 min at 99% A and 5 min at 99% A. A capillary voltage of 1.8 kV and a cone voltage of 35 V was used. MS spectra were acquired in centroid mode from m/z 50 to 1000 using a scan time of 0.35 s. Leucine enkephalin (2 ng/μl) was used as the lock mass (m/z 554.2615). [13C6]Gluconate was used as the IS (internal standard) for enzyme assays when gluconate was used as the substrate, whereas gluconate was used as the IS for enzyme assays when [13C6]gluconate was used as the substrate. A 5 μl volume of IS (0.6 mM) was added to 20 μl of sample. Successively 300 μl of methanol was added and samples were vortex-mixed for 5 min and centrifuged for 10 min at 10000 g. Supernatants were recovered, dried under nitrogen and reconstituted in 80 μl of acetonitrile/water (50:50, v:v). Samples (5 μl) were injected into the UPLC-MS system.
Identification and characterization of human orphan metabolites
We aimed to identify human orphan metabolites. Therefore we compared metabolites in RECON 1 with those that have been detected in human biofluids as reported within the HMDB database  (Figure 1A). A total of 343 metabolites were identified whose metabolic fate was not captured in RECON 1 but can be assumed to have a metabolic degradation or production route within humans on account of their detection in biofluids.
Identification of human orphan metabolites and of reactions capable of explaining their metabolic role
The metabolites were characterized on the basis of their chemical taxonomy super-classes (Figure 2A). Organic acids, both aromatic and linear, followed by various alcohols and amino acid derivatives were found to be the most numerous. These compound classes complement the gaps caused by blocked reactions identified in RECON 1 reported previously, where the largest proportion of blocked reactions have been observed in amino acid metabolism, carbohydrate metabolism and lipid metabolism . The majority of the metabolites were detected in either blood or urine and reflects that these biofluids are most commonly investigated due to ease of sample access (Figure 2B). Numerous metabolites were identified that would not normally be associated with basal metabolism, for example, drug derivatives and plant-derived antioxidants. Some of these metabolites, therefore, represent metabolic scope gaps , as they lie outside the biological scope of RECON 1 and/or are involved in non-metabolic pathways.
Characterization of identified orphan metabolites and the properties of their computed solutions
Incorporation of orphan metabolites into RECON 1
In order to incorporate the identified metabolites into RECON 1, we computed candidate solutions, composed of non-organism-specific metabolic reactions from the KEGG database that could account for the metabolic production and consumption of each metabolite within RECON 1 (see the Experimental section). Metabolites that could not be incorporated using reactions from KEGG were connected to RECON 1 using an artificial transport function (Figure 1A). Up to ten solutions were generated for each metabolite (Supplementary Table S1) of which we chose two solutions routes (S1 and S2) corresponding to its metabolic production and consumption route. These were categorized on how they incorporated the metabolites into RECON 1 (Figure 1B) and whether they corresponded to biologically plausible gap-filling solutions (Figure 2C) (defined in the Experimental section). Owing to the reversibility of enzymatic reactions, we did not assign solution routes specifically as production or consumption routes. A total of 119 metabolites had candidate solutions that were either accepted or corresponded to hypotheses. The remaining metabolites had solutions that were discarded (128) or had no solution at all (95). This filtered gap-filling output was the main focus of the following report (Supplementary Table S2).
Accepted candidate solutions
Analysis of the candidate solutions revealed that not all of the metabolites identified corresponded to orphan metabolites. In fact, solutions for 73 metabolites could be added directly to RECON 1 (Figure 2C). In all cases, the reactions associated with these metabolites, with the exception of added transport reactions, were either associated with known human genes or biochemically identified orphan enzyme activities in humans or closely related mammals. With these metabolites, the gap-filling algorithm expanded the metabolic reconstruction through addition of characterized human reactions and metabolites. For example, cholesterols and derivatives represented the most populous group of compounds with accepted solutions (Figure 2D). Five oestrogens were identified that could be added directly to RECON 1 using type I or II solutions, thereby expanding the C18 steroid biosynthesis pathway of RECON 1 (Supplementary Figure S1 at http://www.biochemj.org/bj/449/bj4490427add.htm). A total of 58 out of the 73 metabolites with accepted solutions had type I or II solutions. Having been identified in biofluids, metabolites with type I solutions suggest that at least a portion of these can be produced and consumed within the cell.
Metabolites with type III solutions do not add to a reconstruction knowledgebase in the same manner as type I and II solutions. Type III solutions introduce flux loops into the reconstruction. The plant metabolite phytosterol, for example, was connected to RECON 1 in three reactions: transport into the cell, conversion by Δ-sterol reductase (EC 22.214.171.124) activity into isofucosterol and subsequent transport out of the cell. Contextualization of this type of knowledge is, however, important in terms of modelling sterol metabolism, where biological effects of these compounds could be coupled to knowledge of their metabolic interconversions. Any abnormalities of Δ-sterol reductase activity could then be associated with biological effects of these compounds.
Computed candidate solutions direct hypothesis generation
As a next step we analysed how many biologically plausible hypotheses could be obtained using the gap-filling algorithm by a thorough literature review. We identified 47 orphan metabolites with plausible hypotheses. Most of these applied to organic acids, alcohols, nucleoside derivatives and amino acids (Figure 2D). A total of 22 of these metabolites could be connected to RECON 1 using type I solutions. One of these was cysteine S-sulfate, an NMDA (N-methyl-D-aspartate) receptor agonist excreted into the urine of individuals with sulfite oxidase deficiency . Cysteine S-sulfate was connected to RECON 1 through the L-cystine and L-cysteine by the addition of three enzymatic reactions and the metabolite O-acetyl-L-serine (Figure 3). The formation of cysteine S-sulfate from L-cystine was deemed plausible as glutathione-CoA-glutathione transhydrogenase (EC 126.96.36.199) is an orphan reaction in rat and bovine  and has also been shown to be spontaneous and reversible . Conversion of cysteine S-sulfate into L-cysteine involved cysteine synthase (EC 188.8.131.52), cystathionine δ-synthase (EC 184.108.40.206) and O-acetylhomoserine aminocarboxypropyl-transferase (EC 220.127.116.11). These three enzymes have sequence similarity to two PLP (pyrridoxal 5-phosphate)-dependent enzymes of the human trans-sulfuration pathway, cystathione β-synthase (EC 18.104.22.168) and cystathionine δ-lyase (EC 22.214.171.124), which have recently been shown to have broad substrate specificity in the β-replacement reactions and in the elimination reactions that these enzymes catalyse respectively [27,28]. Cystathionine β-synthase is, for example, known to catalyse the α,β-elimination reaction of L-cysteine to produce L-serine [27,28]. Whether these two enzymes are capable of eliminating thiosulfide from cysteine S-sulfate, to generate serine and thiosulfate, has not been experimentally determined.
Orphan metabolites and examples of gap-filling candidate solutions
Type I solutions were also obtained for compounds originating from coffee or tea, such as theophyllin, theobromine and caffeine. The metabolism of these compounds is fairly well characterized, although the enzymes associated with their degradation have not been fully characterized and the resulting excreted metabolites are heterogeneous. Accordingly, the solution output consisted of multiple degradation routes, none of which could be discriminated against and highlights a shortcoming of automated hypothesis generation when multiple solutions are correct or can be obtained (Supplementary Figure S2 at http://www.biochemj.org/bj/449/bj4490427add.htm).
Type II solutions were obtained for 18 out of the 47 metabolites. Among these was 5α-androstane-3β,17β-diol, a steroid that has been shown to exert an inhibitory effect on prostate cancer cell migration via activation of oestrogen receptor β . The candidate solution involved production from dihydrotestosterone with 3β (or 20α)-hydroxysteroid dehydrogenase (EC 126.96.36.199). Its activity has been detected in ox and rat but has not been associated with a gene in mammals . A recombinant 3β-hydroxy-Δ5-steroid dehydrogenase (EC 188.8.131.52) was, however, previously shown to carry out this reaction (Figure 3) .
A total of seven metabolites had type III solutions. For example, the metabolic route of menthol was predicted to involve intracellular transport, oxidation of menthol to generate p-menthane-3,8-diol followed by extracellular transport of p-menthane-3,8-diol (Figure 3). Menthol is known to enter either enterohepatic circulation as a biliary metabolite or be oxidized by cytochrome P450 generating menthol diols that are then excreted into the urine [32,33]. Type III solutions were also obtained for valproic acid, an epileptic drug. These two examples highlight the need for a detailed reconstruction of xenobiotics, which are currently not captured in RECON 1. Type III solutions were also obtained for methylated nucleosides, well-known cancer markers, which are also not represented in RECON 1 as they are derived from the degradation of RNA and therefore are beyond its scope.
Experimental validation of a computed hypothesis
Candidate reaction solutions remain hypotheses until experimentally proven. We performed experimental confirmations of the type II solution obtained for gluconate. Gluconate is the C-1-oxidized linear derivative of glucose and a common food and drug constituent . The proposed metabolism involved the oxidation of glucose followed by hydrolysis of D-glucono-1,5-lactone to generate D-gluconate that is subsequently metabolized upon phosphorylation in the PPP (pentose phosphate pathway) (Figure 3). Hexose-1-dehydrogenase (EC 184.108.40.206) and gluconolactonase (EC 220.127.116.11) activity have previously been reported in humans [35,36]. An enzyme with gluconokinase activity has also been previously purified and characterized from pig kidney extracts . Performing a homology search of bacterial enzymes having gluconokinase activity (EC 18.104.22.168), we identified C9orf103 as a plausible candidate gene, which may be capable of providing a catabolic route for gluconate in humans. A BLAST query of isoform I of C9orf103 against the E. coli genome revealed a 40% sequence identity to gluconokinase (gene symbol GNTK) with an E value of 8−40 and over 90% sequence coverage. Furthermore, 12 out of the 14 amino acid residues that have been shown to be important for E. coli gluconokinase substrate binding are conserved in C9orf103 (Supplementary Figure S3 at http://www.biochemj.org/bj/449/bj4490427add.htm).
In order to confirm gluconokinase activity of C9orf103, we overexpressed isoforms I and II of C9orf103 in human HeLa cell lysates (Supplementary Figure S4 at http://www.biochemj.org/bj/449/bj4490427add.htm) and assayed for gluconokinase activity using a 6PGDH-coupled enzyme assay. Following incubation of cell lysates with added gluconate, we detected more NADPH in lysates containing overexpressed isoform I than with isoform II or the negative control containing GFP. The NADPH concentration was similar to that observed for the E. coli gluconokinase positive control. Little or no NADPH was detected in the water negative control (Figure 4A). These data suggest that C9orf103 encodes an active gluconokinase. The difference in NADPH concentration can be largely attributed to the reduction of NADP+ that accompanies the oxidation of 6-phosphogluconate produced by the C9orf103 gene product. A drop in ATP concentration upon gluconate addition was also observed in the lysate expressing isoform I, consistent with ATP-dependent kinase activity (Figure 4B). A change in both NADPH and ATP concentration was also observed upon gluconate addition in isoform II and GFP samples, perhaps hinting at background gluconokinase activity in these lysates.
End point enzyme assays of overexpressed C9orf103 in HeLa lysates confirms gluconokinase activity of isoform I of C9orf103
To further confirm the gluconokinase activity of isoform I and completely rule out activity of isoform II or background activity, we quantified gluconate and 6-phosphogluconate in the cell lysates using LC-MS (Figure 5 and Supplementary Figure S5 at http://www.biochemj.org/bj/449/bj4490427add.htm). Component analysis showed little or no gluconate in isoform I and in the positive-control samples, whereas 6-phosphogluconate was readily detected. The opposite was true for isoform II and the negative-control samples. An experiment where uniformly labelled [13C]gluconate was used as the substrate unequivocally demonstrates that the added gluconate is indeed converted into 6-phosphogluconate in the presence of overexpressed isoform I. No kinase activity was observed for isoform II above the negative control. This is most likely due to the absence of the phosphate-binding loop binding domain that is present in the N-terminal domain of isoform I but missing in isoform II (Supplementary Figure S3), suggesting a different function for this isoform. These data confirm gluconokinase activity in humans and support a gluconate degradation route through the PPP similar to the one observed in lower organisms .
LC-MS analysis of gluconate and 6-phosphogluconate composition in HeLa lysates expressing C9orf103
Metabolites with discarded or no solutions
A large proportion of the metabolites identified had either candidate solutions that were discarded or had no solutions at all (Figure 2C). Approximately 60% of the metabolites having discarded solutions were organic acids, alcohols or amino acid derivatives, which also represented the most numerous metabolite classes. In the majority of the cases, no evidence was found to support the occurrence of the proposed solutions in humans. Some metabolites were also found to not correspond to orphan metabolites. Exogenous compounds, such as rosmarinic acid and curcumin, have been shown to be excreted following glucuronidation or glycosylation [39,40]. Similarly, compounds, such as secondary bile acids, tartarate, raffinose, melbiose and 6-hydroxynicotinic acid, which are known to be indirectly metabolized in humans by the gut microbiome, do not need to be accounted for in RECON 1. Compounds that had no solutions were mostly inorganic compounds, amino acid derivatives, organic acids (Figure 2D) and also compounds which are not associated with basal metabolism, such as arsenic, lead, ibuprofen and naproxen.
In the present study, RECON 1 was used as a discovery tool alongside a library of chemical reactions to computationally propose metabolic routes for human orphan metabolites found in biofluid samples. The key results include (i) the identification and characterization of 343 metabolites, (ii) metabolic hypotheses for multiple orphan metabolites and the biochemically detailed description of five of these, (iii) the experimental validation of one of the computationally proposed hypotheses leading to the characterization of C9orf103 as a human gluconokinase, and (iv) a large proportion of the identified metabolites could not be accounted for using our approach and may focus future gap-filling efforts. The results of the present study demonstrate how gap filling can focus the expansion of metabolic networks and facilitate metabolite and gene annotation through a metabolite-centred approach.
Our approach assumed that RECON 1 correctly captured current knowledge and, therefore, that all metabolites identified were orphan metabolites. Clearly this was not the case as more than one-fifth of the metabolites could be added directly to RECON 1, thus not being true orphan metabolites. This result demonstrates that the metabolic network reconstruction process is iterative requiring periodical refinement so that the network model sustains its biological accuracy. With respect to the accepted solutions, the gap-filling algorithm therefore fulfils its role as a network gap-filling tool .
We generated biologically plausible hypotheses for almost 50 orphan metabolites. The notion that almost half of these have proposed anabolic and catabolic routes infers that their detection in biofluids may be associated with abnormal metabolic states, as is the case for cysteine S-sulfate (OMIMM# 272300). The computed hypotheses (Figure 3) conforms with the previously proposed metabolism of cysteine S-sulfate from L-cysteine in the pathogenesis of sulfite oxidase deficiency  albeit in biochemical detail. The second computed solution, through L-serine, is also plausible on the account of the catalytic promiscuity of the PLP-dependent enzymes involved towards sulfur-containing amino acids [27,28]. PLP-dependent enzymes have broad catalytic activity on account of the diverse reactivity of the PLP Schiff base intermediate . For modelling of disease-associated metabolic phenotypes, such as inborn errors of metabolism , these types of discoveries are valuable. The remainder of the hypotheses involved type II and III solutions. Interestingly, during the preparation of the present paper the computed candidate reaction for menthol was confirmed to be catalysed by CYP2A6 in human liver microsomes by Nakanishi and co-workers . Multiple approaches have been devised to derive gene or protein function . With the recent surge in non-targeted metabolite profiling, the number of metabolites whose biological roles are not defined will surely increase . The three solution types highlight how network gap filling directs metabolic hypotheses of orphan metabolites through contextualization of biological data. The linkage between metabolites and genes can therefore be successfully done in a semi-automated manner using a high-quality genome-scale metabolic reconstruction.
Our network gap-filling approach suffers a drawback in that it is not fully automated. Since the universal reaction database (KEGG) used captures biochemical reactions identified in all domains of life, manual ranking and curation of candidate solutions was required before addition of computed reaction solutions to the network, similar to the laborious bottom-up network reconstruction procedure . The prioritization of predicted pathways between metabolites has been a focus point in synthetic pathway engineering, where candidate reactions between metabolites have been ranked according thermodynamic feasibility, pathway length, chemical similarity and organism specificity . Automated ranking and filtering of the gap-filling output would be valuable when faced with multiple candidate solutions.
We confirmed the computationally proposed hypotheses for gluconate consumption and showed that C9orf103 encodes a functional gluconokinase. The proposed metabolism of gluconate is identical with what is observed in lower organisms where it has been extensively characterized . The molecular details of gluconate metabolism in mammals are less well understood, although early biochemical experiments suggest a similar degradation route. Using 14C-labelled gluconate, Stetten and Topper  demonstrated that the carboxyl carbon of gluconate yields CO2 abundantly in intact rats. Because reduction of gluconate directly to glucose had been shown to not take place , the finding was consistent with omission of CO2 from 6-phosphogluconate by 6PDGH (EC 22.214.171.124) [47,49]. C-6 of gluconate was, however, observed to contribute to fatty acid synthesis, presumably following integration into glycolysis through the non-oxidative branch of the PPP . These studies of gluconate metabolism took part in pioneering investigations into oxidative glucose metabolism that aided in the discovery of the PPP . An active gluconokinase was later characterized and purified from rat and pig kidney .
We have demonstrated for the first time that gluconokinase activity is encoded within the human genome. C9orf103 was initially identified during cDNA library construction  and later shown to be located within the commonly deleted region of del(9q) acute myeloid leukaemia . These studies did not report on the possible function of the gene. C9orf103 was, however, electronically annotated prior to the present study as a putative gluconokinase. A similarity in protein sequence does not always transfer to enzyme function . Isoform II was, for example, shown in the present study to be catalytically inactive. Also, an 18 amino acid insertion within both isoforms of C9orf103 as compared with E. coli GlcnK (Supplementary Figure S3) does not appear to abolish the gluconokinase activity of isoform I. The location of this peptide insertion within the amino acid sequence of isoform I hints at a structural difference between isoform I and E. coli GntK previously observed between E.coli GlcnK and various NMP kinases that are known to catalyse the phosphorylation of a broad range of substrates . Although this might suggest additional functions for the C9orf103 protein product, the gluconokinase purified from pig kidney was found to exhibit strict specificity for D-gluconate . Ultimately, the result of the present study provides a missing genetic link between mammalian gluconokinase activity and our computationally proposed gluconate catabolic route previously proposed through biochemical pulse–chase experiments in rats.
An anabolic route for gluconate through the oxidation of glucose was also computed. Although both activities for this conversion are found in humans, cell biology experiments will be required to confirm whether these act synergistically to generate gluconate in vivo. There is no apparent gain of oxidation of glucose through gluconate compared with glucose 6-phosphate. Both presumably yield the same pentose phosphates and two molecules of NAPDH. This result does, however, not rule out a metabolic contribution of gluconate as a nutrient with a catabolic role via the PPP following, for example, cellular uptake or as a degradation product of intracellular carbohydrates. Similar experiments would also shed light on the effect of down-regulation of C9orf103 associated with del(9q) acute myeloid leukaemia .
We identified more than 200 orphan metabolites for which no solution could be obtained as their metabolism was beyond the scope of RECON 1. These metabolites point to reaction pathways that should be incorporated into RECON 1 to account for metabolic phenotypes associated with, for example, phase I and II xenobiotic detoxification routes, the contribution of the human microbiome, signalling cascades, and transcription and translation. Integration of such non-metabolic reconstructions would help address orphan metabolites, whose metabolic fate lies beyond the scope of RECON 1 and subsequently better define a list of true human orphan metabolites.
The finding that RECON 1 can be used to generate computational hypotheses of human biological functions is intriguing as it implies that metabolic models can be used to prioritize the discovery of human metabolic functions albeit by focusing on metabolites rather than genes. This prioritization stems from filling gaps in knowledge of human metabolism, thereby systematically expanding metabolic knowledge through a bottom-up approach. This approach does, however, not unearth functions that are globally uncharacterized as the method relies on annotated functions and genes. The manner in which incomplete biological data is put into context with current metabolic knowledge is therefore the driving force behind this approach of biological target identification and subsequent functional validation. The result is an expanded metabolic reconstruction, which better captures and defines the known and unknowns of human metabolism.
Óttar Rolfsson designed the study, performed the experiments, analysed the data and wrote the paper. Giuseppe Paglia and Manuela Magnusdóttir performed MS analysis. Bernhard Palsson contributed to experimental design and writing of the paper. Ines Thiele designed the study, performed the experiments and contributed to the writing of the paper.
This work was supported by the European Research Council [grant number 232816].