Life science research frequently involves optimizing experimental conditions to maximize product yield, ensuring the research pipeline can progress efficiently. For instance, in biochemistry, many processes in the protein production and characterization pipeline need optimizing such as cloning, protein expression, protein purification, and a range of structural biology techniques and functional assays. Design of Experiments (DoE) is an approach that designs a set of experiments that varies all factors at the same time for rapid process optimization, however, this approach presents a challenge due to statistical complexities and high software costs. To address this, DoE is introduced with Spreadsheet DoE, an open access user-friendly educational factorial design tool that provides step-by-step guidance and transparent spreadsheet calculations. Protein expression optimization in Escherichia coli is used as an example, and other biochemical optimizations are also explored. This guide along with Spreadsheet DoE aims to encourage researchers with minimal statistical background to effectively employ DoE methodologies in their life science studies.

Biochemical research often involves complex biological systems that are affected by multiple factors (explanatory variables) which need optimizing to maximize product yield (response variable). It is a human nature to jump into a trial-and-error approach to try and optimize factors one at a time as this requires little planning that yields quick results. However, a little extra planning at the start using a statistical design of experiments approach known as “Design of Experiments” (DoE) significantly saves time/resources as well as provides deeper system insights. DoE is a method that plans a set of experiments to explore how different factors and their interactions influence a specific outcome. DoE will find the optimal factor levels to maximize the outcome much quicker than studying the effect of one factor at a time (OFAT), as seen in Figure 1.

Figure 1

Optimizing two factors by different approaches. Two factor levels can be optimized using inefficient one factor at a time approach (left), which holds all other factors constant and varies just one factor at a time or the efficient DoE approach (right), which changes all factor levels at the same time.

Figure 1

Optimizing two factors by different approaches. Two factor levels can be optimized using inefficient one factor at a time approach (left), which holds all other factors constant and varies just one factor at a time or the efficient DoE approach (right), which changes all factor levels at the same time.

Close modal

DoE tests several variables at once helping researchers understand the relative impact of each factor and how they work together to generate a response, providing a deeper insight into the system compared with OFAT. This approach is commonly used in industries like manufacturing, pharmaceuticals, and agriculture to optimize processes, improve product quality, and reduce costs. It provides a practical way to gather insights and make informed decisions, leading to more reliable and faster results. However, the use of DoE approaches presents a challenge to understand and implement, particularly for non-statisticians and has not been widely adopted in academia. To address this, this article demystifies DoE with two-level factorial designs and showcases some current DoE applications in biochemical research with outputs generated by Spreadsheet DoE

As an example, protein expression in Escherichia coli is chosen as it often requires optimization to maximize protein yield (the response variable) by adjusting different factors (the explanatory variables). Four common factors (growth temperature, growth time after induction, inducer concentration, and cell type) will be explored with the widely used two-level fractional factorial design. For any given run, each factor either takes a “high” or “low” value and certain combinations are tested in a series of eight experimental runs (Table 1). For any given factor, there are four runs where this factor is at the low setting and four runs where it is at the high setting, allowing a comparison of means to be calculated for all factors.

Table 1
Two-level fractional factorial design to optimize protein expression yields using four different factors in eight runs from the C tab in the 4.8 spreadsheet. Each factor is at one of two levels (high or low) for each of the eight runs. The last column indicates the amount of pure protein obtained from the eight runs (eight different one-litre cultures grown under the different conditions indicated). Factor levels are chosen for each run to make a balanced design, which will allow important terms to be identified in a minimal number of runs. In this table, runs are in standard order, however, when collecting the data, the order of runs is randomized to minimize the effect of any possible systematic bias
Standard Run (SR)Factor AFactor BFactor CFactor DResponse
NumberTemp. (°C)Time (hours)IPTG (mM)Cell TypeProtein (mg)
SR 1 25 0.1 BL21 (DE3)pLysS 85.50 
SR 2 35 0.1 BL21 (DE3) 69.20 
SR 3 25 12 0.1 BL21 (DE3) 91.30 
SR 4 35 12 0.1 BL21 (DE3)pLysS 9.30 
SR 5 25 0.5 BL21 (DE3) 83.60 
SR 6 35 0.5 BL21 (DE3)pLysS 73.10 
SR 7 25 12 0.5 BL21 (DE3)pLysS 97.40 
SR 8 35 12 0.5 BL21 (DE3) 12.60 
Standard Run (SR)Factor AFactor BFactor CFactor DResponse
NumberTemp. (°C)Time (hours)IPTG (mM)Cell TypeProtein (mg)
SR 1 25 0.1 BL21 (DE3)pLysS 85.50 
SR 2 35 0.1 BL21 (DE3) 69.20 
SR 3 25 12 0.1 BL21 (DE3) 91.30 
SR 4 35 12 0.1 BL21 (DE3)pLysS 9.30 
SR 5 25 0.5 BL21 (DE3) 83.60 
SR 6 35 0.5 BL21 (DE3)pLysS 73.10 
SR 7 25 12 0.5 BL21 (DE3)pLysS 97.40 
SR 8 35 12 0.5 BL21 (DE3) 12.60 

An equivalent trial-and-error or “OFAT” approach to compare each factor mean (with four replicates) at the high and low level would require 20 runs. Furthermore, any interactions between factors cannot be detected, which is one of the most powerful aspects of the DoE approach. For example, if you were investigating two drugs for a positive health response then just knowing the effect of each (A and B) is not as informative as knowing if there are any synergies or anti-synergies (+AB or −AB). An interaction effect means that the effect of a high level of A is different depending on the level of B, that is, A and B are not independent of each other. DoE approaches are therefore an efficient way to perform a relatively small number of experiments yet obtain maximum amount of information and a potentially predictive linear regression model (Box 1) about the system under study. This allows the response to be maximized by selecting the factor levels in the correct combination according to the sign and magnitude of the coefficients in the model. DoE is used widely in the pharmaceutical industry, where efficiency equals profitability, however, by demystifying the approach and providing some easy-to-use educational DoE spreadsheets; others can greatly benefit from this approach too.

Box 1. The linear regression model

The effects of each factor or interaction generated by the DoE approach are included as terms in a final linear equation to predict the response. The sign and magnitude of the coefficients of each term help researchers determine their relative importance and whether each factor should be high or low to maximize the response. For example, a model may take the form below, where each letter is replaced by +1 or −1 depending on its level for that run.

Response = 7.5 + 5.2*A + 2.4*AB − 1.1*B

The “-1.1*B” term has a negative coefficient, which may suggest factor B should be kept at its low level to increase the response. However, the “2.4*AB” term indicates that there is a “positive“ interaction between factors A and B. In other words, the effect of a high level in A increases 2.4 to the response when B is simultaneously at its high level and decreases 2.4 to the response when B is at its low level. Therefore, changing the levels of A and B at the same time produces an effect on the response that is greater than the effect on the response from the B term alone. In this case, we would choose A to be at its high level and B to be at its high level to maximize the response (which would be predicted to be 7.5 + 5.2 + 2.4 − 1.1 = 14) even though the effect of the “−1.1*B” term alone reduces the response by 1.1, although this reduction is smaller than the 2.4 gain from the interaction effect. Linear regression models are a core component of many statistical analyses.

Intro to Spreadsheet DoE

As an educational tool to explain/implement DoE to nonstatisticians, a series of five Spreadsheet DoEs is available. Each spreadsheet corresponds to optimizing systems affected by 3–6 factors with the goal of producing a predictive linear regression model using two-level factorial designs. They are named by the number of factors and the minimum number of runs required to explore the system. For example, DoE spreadsheet 6.16 would be used for exploring a process affected by six factors in a minimum of 16 runs. Within each spreadsheet, there are five design tabs labelled A–E, design C is the standard design with the minimal number of runs intended to be performed at the same time and not replicated. Design A can be used when all factors under investigation are continuous and numerical such as temperature (also known as a quantitative factor) and includes four additional replicate runs where all factor values are midway between the high and low level. This is the most powerful design as it provides an independent measure of both confidence in the model and confidence in predicting a linear trend between and beyond the factor levels tested. Design B can also provide an additional independent error estimate by duplicating Design C, although increasing the number of runs much more than Design B. Designs D and E allow you to perform your experiments in two blocks (i.e., half the runs at one time and the other half at a different time) and estimate any unintended effect between blocks. The spreadsheets are standalone educational tools with instructions guiding you throughout and, in many cells, there are comments and equations for more explanation. We will go through the key steps to analyze protein expression data from a minimal number of runs collected at the same time using the 4.8 spreadsheet as outlined in (Table 2). DoE encompasses other more complex designs besides the two-level factorial designs although the principals in Table 2 provides the foundation for these using the materials found in the further reading/resources section.

Table 2
The five steps for statistical analysis of two-level factorial design of experiments
StepStatistical analysis approach
Calculate factor and interaction term effects 
Estimate which terms to include in regression model 
Form a linear regression model 
Use analysis of variance to determine a measure of confidence in the model 
Check model assumption by analyzing residuals 
StepStatistical analysis approach
Calculate factor and interaction term effects 
Estimate which terms to include in regression model 
Form a linear regression model 
Use analysis of variance to determine a measure of confidence in the model 
Check model assumption by analyzing residuals 

Step 1: Calculate factor and interaction term effects

Formally, the effect of a factor is the change in response produced by the change in the level (from high to low) of that factor. Mathematically, it is the difference between the mean of the subset of responses at the high level of a given factor and the mean of the subset of responses at its low level. The same is true for an interaction effect as well (Table 3). This is the power of DoE, the same set of relatively few collected responses from your design can be grouped into several different means to calculate a range of effects for each term. A term is defined as a factor or factor interaction and is usually represented by one or more letters (top row of Table 3).

Table 3
Calculation of effects and Sum of Squared deviations (SS) for four factors and three interactions in a 4.8 design. A coded version of an expanded Table 1 is found here, that includes factor interaction terms generated in Spreadsheet DoE. Each factor/interaction is at one of two levels (high or low) for each of the eight runs, indicated by the code 1 or −1. The SS is calcualted from the effect using a simple formula within Spreadsheet DoE. The terms are ordered from smallest (left) to largest (right) SS values. Factor A has the biggest effect and SS
RunACADDCBABAResponse
−1 −1 −1 −1 85.50 
−1 −1 −1 −1 69.20 
−1 −1 −1 −1 91.30 
−1 −1 −1 −1 9.30 
−1 −1 −1 −1 83.60 
−1 −1 −1 −1 73.10 
−1 −1 −1 97.40 
12.60 
Avhigh- Avlow (Effect) 0.75 1.85 −2.15 2.85 −25.20 −35.00 −48.40  
Sum of Squared Deviations 1.13 6.85 9.24 16.24 1270.08 2450.00 4685.12  
RunACADDCBABAResponse
−1 −1 −1 −1 85.50 
−1 −1 −1 −1 69.20 
−1 −1 −1 −1 91.30 
−1 −1 −1 −1 9.30 
−1 −1 −1 −1 83.60 
−1 −1 −1 −1 73.10 
−1 −1 −1 97.40 
12.60 
Avhigh- Avlow (Effect) 0.75 1.85 −2.15 2.85 −25.20 −35.00 −48.40  
Sum of Squared Deviations 1.13 6.85 9.24 16.24 1270.08 2450.00 4685.12  

Step 2: Estimate which terms to include in regression model

A half-normal probability plot helps consider which terms are important and likely to contribute to a predictive regression model and represent “signal” above those terms that represent background “noise”. When the assumption is that everything is noise, all terms will lie on the projected line made by the early points. Any terms off the line are considered “signal” and may be useful to include in the regression model (Figure 2).

Figure 2

Half normal probability plot. This helps identify terms that are true signal above noise and to be included in a regression model. These terms are A, AB and B and lie to the right of the projected line.

Figure 2

Half normal probability plot. This helps identify terms that are true signal above noise and to be included in a regression model. These terms are A, AB and B and lie to the right of the projected line.

Close modal

It is not expected that many terms will deviate from the line as the “sparsity of effects” rule states that only 20% of the main effects and two-factor interactions will be significant in most systems and that the other terms will vary only to the extent of normal error.

Step 3: Form a linear regression model – a predictive equation

A linear regression model (Box 1) is a mathematical equation that uses the data from a completed design to attempt to predict any response given the coded factor level settings, which are either high (+1) or low (−1). Usually, only those terms that are significant, based on the outliers in the half-normal plot, are included in the model. However, additional diagnostic graphs, R2 values and probability percentages (steps 4 and 5) are also used to determine the final terms included in the model. The remaining terms not used in forming the model are grouped to give an estimate of error (see next section). To construct the model, we simply take the arithmetic average of responses from all runs as a starting point and then add each model term multiplied by its coefficient (calculated by dividing the term’s “effect” in two). The model from our example using the 4.8 design predicts the following response (amount of purified protein in milligrams) from three terms.

 Response = 65.3–24*A - 17.5*AB – 12.6*B

Therefore, Factor A (temperature) has the biggest effect on the response followed by Factor B (growth time after induction), while Factors C (IPTG concentration) and D (cell type) make no significant impact. Furthermore, A and B have an interaction (that has a bigger effect than Factor B alone), which suggests choosing a low level for A and a high level for B would maximize the response. We would therefore grow our cells at 25°C for 12 hours after induction with either high or low levels for IPTG concentration and either cell type; furthermore, we may explore decreasing the temperature or increasing the growth time to possibly continue the trend.

Step 4: Use analysis of variance to determine a measure of confidence in the model

A linear regression model is only useful if it passes a confidence measure (this section) and all the assumptions in constructing the model hold true (next section). An important parameter to consider that helps determine confidence is called variance, which is easily calculated from the sum of squared deviations (SS). For set of simple replicates, SS is calculated by squaring the difference between each data point and the grand arithmetic average and summing these squared differences together. This provides insight into the variability or spread of the observations around this average. If we divide SS by the sample size −1, it calculates variance and the square root of variance is the standard deviation, commonly reported with averages. The SS for all responses in a factorial design can also be similarly calculated against its grand mean to reveal the spread of observations in our design, which we seek to explain from the effect of factors or factor interaction terms included in the regression model. SS calculations for each term in a DoE design is more complicated than one set of replicates, since there are replicates at high (+1) and low (−1) levels and we want to know how much the change in level of that term contributes to the variability in the response. A small effect for a term will result in a smaller SS for that term and for these balanced designs, there is a simple formula to convert each term’s effect to their SS (and therefore variance), which together, allow an analysis of variance (ANOVA) of the model.

ANOVA is based on separating the total variance of a set of responses (i.e. the spread of protein yields in expression experiments) into two parts, the proposed “signal” (explained variance) and the proposed “noise” (unexplained variance) and then comparing them. The signal is the pooled variance from the terms in the regression model, while the noise is the pooled variance from the left-over residual terms that we assume representing random variability. The F-statistic for the model is calculated by dividing the signal variance by the noise variance, the higher the ratio, the more likely the model is significant (over noise), and this is expressed in Spreadsheet DoE as a probability percentage, where greater than 95% is deemed significant (equivalent to a p-value < 0.05). F-statistics can be evaluated for any signal-to-noise comparison, for example, the significance of individual terms in the model can also be calculated (Table 4). For a model with high confidence, in addition to high probability values for the model terms and the overall model, the model R2 value should be close to one as this quantifies the proportion of the variation explained by the predictive model (calculated by dividing the model SS by the sum of residual SS and model SS). A summary of all ANOVA statistics for different models that include increasing number of model terms is found in Table 5, which shows having three terms (A, AB, and B) give the best statistics and best meets the conditions above, especially as given similar statistics, it is important to choose the model with fewer terms.

Table 4
ANOVA table with three terms (A, AB, and B) in model using Table 3 with associated R2 values. The mean square is the variance and is calculated for the overall model, the individual model terms and the residual (noise). The R2 adjusted value is calculated by penalizing for additional terms in the model to avoid overfitting and although it is lower than R2 value, it is in fact a better value to use in conjunction with the R2 predicted. R2 predicted evaluates how well the model is likely to predict responses for new, unseen data and is used to assess the predictive power of the model, not just how well it fits the data used to build it. This value can be negative when the predictive power of the model is worse than just using the overall response average to predict any factor combination
SourceSum of squaresdfMean squareF-statisticSignificance
Model 8405.2 2801.73 334.94 100.00 
4685.12 4685.12 560.09 100.00 
AB 2450.00 2450.00 292.89 99.99 
1270.08 1270.08 151.83 99.98 
Residual 33.46 8.36   
Corrected total 8438.66    
    R2 1.00 
    R2adjusted 0.99 
    R2predicted 0.98 
SourceSum of squaresdfMean squareF-statisticSignificance
Model 8405.2 2801.73 334.94 100.00 
4685.12 4685.12 560.09 100.00 
AB 2450.00 2450.00 292.89 99.99 
1270.08 1270.08 151.83 99.98 
Residual 33.46 8.36   
Corrected total 8438.66    
    R2 1.00 
    R2adjusted 0.99 
    R2predicted 0.98 
Table 5
Summary of ANOVA statistics for the different models. Each model contains an increasing number of model terms. The chosen model with three terms is highlighted in bold that has high probability percentages for the overall model and the individual terms and good R2 values (close to 1), with small difference between R2 adjusted and predicted
Model prob %AABBCR2R2 adjR2 predR2 adj-pred
1 term in model 96.61 96.61    0.56 0.48 0.21 0.27 
2 terms in model 99.06 99.18 97.21   0.85 0.78 0.6 0.18 
3 terms in model 100.00 100.00 99.99 99.98  1.00 0.99 0.98 0.01 
4 terms in model 99.98 99.99 99.98 99.93 80.89 1.00 1.00 0.99 0.01 
Model prob %AABBCR2R2 adjR2 predR2 adj-pred
1 term in model 96.61 96.61    0.56 0.48 0.21 0.27 
2 terms in model 99.06 99.18 97.21   0.85 0.78 0.6 0.18 
3 terms in model 100.00 100.00 99.99 99.98  1.00 0.99 0.98 0.01 
4 terms in model 99.98 99.99 99.98 99.93 80.89 1.00 1.00 0.99 0.01 

Step 5: Check model assumptions by analyzing residuals

Residuals are the difference between the observed and predicted (by the regression model equation) response values for each experimental run and should be normally distributed to fulfil the assumptions used to create the predictive model. Normality can be tested by plotting residuals against normally distributed theoretical z-scores for the dataset to assess if a straight line is produced as well as against the predictive values to assess if an even vertical spread of data from left to right in the plot is produced (Figure 3). If both diagnostic tests are generally passed, then the residuals are deemed normally distributed and independent with constant variance and confirm the statistical assumptions made for the ANOVA approach. Minor deviations from these assumptions are usually acceptable. Furthermore, a plot of the actual responses versus the predicted responses should give a straight line with a slope of 1, showing (in another way) how well the model fits the data. Transforming the responses before forming models can help better meet these assumptions and improve the diagnostic plots and this is discussed further in the spreadsheets.

Figure 3

Diagnostic plots to check assumptions of normality for model with three terms (A, B, and AB).

Figure 3

Diagnostic plots to check assumptions of normality for model with three terms (A, B, and AB).

Close modal

This method can be applied in various fields for both quick evaluations (screening) or in-depth analysis (characterization). It often delivers 80% of the improvements in results and quality with just 20% of the total effort. When curvature is detected, full optimization (to get that final 20% improvement) may require experimentation with a response surface design, a related approach that includes additional factor levels and requires more specialist software for analysis (for other software, see below). DoE has been used during the protein production and characterization pipeline to include PCR, cloning, cell culture, protein expression, purification, crystallization, and characterization (see table 6 in Papaneophytou et al.1 for a summary of typical factors investigated).

DoE helps in the optimization of cell culture conditions in biopharmaceutical production. Factors like oxygen levels, carbon sources, and agitation rates can be adjusted to improve the growth rate and productivity of cultured cells. Protein expression is optimized (often as C-terminal GFP folding reporter fusions) using similar parameters to those given in this article example (as well as including different N-terminal fusion tags) to maximize protein yields allowing costs to be reduced and smaller volumes of cultures grown.2 Affinity-tagged protein purification using different bead types, different buffers, different chromatography method parameters and other conditions often led to greater yields3 and add-on ATKA software from Cytiva can be purchased to perform many other different chromatography designs. Other popular DoE software that can be purchased that can perform two-level factorial as well as other DoE designs include Design Expert (by Stat-Ease), JMP (by SAS), Minitab, and MODDE (by Sartorius). Furthermore, a free online resource called SAmBA (https://www.igs.cnrs-mrs.fr/samba/) will create a DoE design called an (unbalanced) incomplete factorial. This allows researchers to define any number of factors and factor levels in a required number of runs, sampling across the combinatorial space. However, these incomplete factorial designs provide fewer system insights and confidence measures, as they are not balanced and do not have equal number of experimental runs for each level, for all terms.

Protein stability in different buffers is often screened by studying factors such as pH, salts, osmolytes such as glycerol, and nondenaturing detergents such as Tween-20. The melting temperature of the protein in the different buffers can be measured to find the most important factors and interactions to keep the protein stable.4 This is essential for long-term characterization of proteins that occur in NMR and other biophysical methods, where more stable conditions minimize protein degradation during lengthy data collection periods. Protein stability is also essential in the formulation of protein therapeutics and may allow these therapeutics to be stored at higher temperatures while still being intact and an effective treatment.5 Another area involves finding buffer conditions for optimizing enzyme reactions, protein folding, and protein crystallization. 6-8 In fact, DoE can apply to almost any process where factors can be identified and responses can be measured, although traditionally used in process industries (agricultural, food, and pharmaceuticals), it has been applied in almost all fields of research and will almost certainly continue to help us better understand and efficiently optimize many more systems in the future.

To aid biochemistry researchers using DoE to optimize the protein production and characterization pipeline who are looking for any emerging optimization trends, any completed spreadsheet DoE design that researchers share with the author will be categorized and made publicly available as a community resource in the future.

The five Spreadsheet DoE workbooks and the 4.8 Spreadsheet with the example data used in this article are available at https://doi.org/10.5281/zenodo.13998442.

graphic

I work as a Biochemistry Senior Lecturer in the School of Biosciences at the University of Liverpool. I am the programme director for the Biochemistry degree. Before Liverpool, from 2010 to 2017 I worked as an Associate Professor of Biochemistry at Eastern New Mexico University, USA. I also worked as an Assistant Professor in the Chemistry/Biochemistry department at Mount Allison University, Canada (2008-2010), a Post-Doc at the Hospital for Sick Children and University of Toronto, Canada (2004-2008), a Ph.D. student at Cambridge University, UK (1999-2004), an English and Chemistry Teacher in Jalisco, Mexico (1998-1999), a Research Scientist at GlaxoSmithKline (1996-1997), where I first started using DoE and an Undergraduate student at Leeds University, UK (1994-1998). My research lab uses bioinformatics and biophysical tools to probe protein structure, dynamics and interactions to gain a deeper understanding of protein interaction specificity, and routinely uses Spreadsheet DoE to rapidly optimize the protein production and characterisation pipeline. I am also interested in pedagogical research to better understand and generate resources for student success in the lab, which inspired the development of Spreadsheet DoE. Email: [email protected].

1.
Papaneophytou
C
.
Design of experiments as a tool for optimization in recombinant protein biotechnology: from constructs to crystals
.
Mol Biotechnol
2019
;
61
:
873
91
.
[PubMed]
2.
Papaneophytou
CP
,
Kontopidis
G
.
Statistical approaches to maximize recombinant protein expression in Escherichia coli: a general review
.
Protein Expr Purif
2014
;
94
:
22
32
.
[PubMed]
3.
McGraw
J
,
Tatipelli
VK
,
Feyijinmi
O
, et al
.
A semi-automated method for purification of milligram quantities of proteins on the Qiacube
.
Protein Expr Purif
2014
;
96
:
48
53
.
[PubMed]
4.
Ablinger
E
,
Hellweger
M
,
Leitgeb
S
, et al
.
Evaluating the effects of buffer conditions and extremolytes on thermostability of granulocyte colony-stimulating factor using high-throughput screening combined with design of experiments
.
Int J Pharm
2012
;
436
(
1-2
):
744
52
.
[PubMed]
5.
Grant
Y
,
Matejtschuk
P
,
Dalby
PA
.
Rapid optimization of protein freeze-drying formulations using ultra scale-down and factorial design of experiment in microplates
.
Biotechnol Bioeng
2009
;
104
:
957
64
.
[PubMed]
6.
Rodríguez
ÁP
,
Leiro
RF
,
Cerdán
ME
.
Kluyveromyces lactis β-galactosidase crystallization using full-factorial experimental design
.
J Mol Catal B Enzym
2008
;
52
:
178
82
.
7.
Onyeogaziri
FC
,
Papaneophytou
C
.
A general guide for the optimization of enzyme assay conditions using the design of experiments approach
.
SLAS Discov Adv Life Sci R&D
2019
;
24
:
587
96
.
8.
Willis
MS
,
Hogan
JK
,
Prabhakar
P
, et al
.
Investigation of protein refolding using a fractional factorial screen: a study of reagent effects and interactions
.
Protein Sci
2005
;
14
:
1818
26
.
[PubMed]
Published by Portland Press Limited under the Creative Commons Attribution License 4.0 (CC BY-NC-ND)