Evidence provided in support of a proposed label claim for efficacy must demonstrate strong correlation between application of the product and the claim. Well-designed, well-conducted efficacy trials are critical to building your portfolio of evidence for submission to the APVMA.
The experimental design and analysis must be able to demonstrate that it is the application of the product that is having the measured effect, rather than some other variable. If a trial is poorly designed, even the most skilled statistician will have difficulty in applying appropriate and meaningful statistical analysis to prove that the product works as claimed.
Poorly designed trials and incorrect statistical analyses introduce doubt about the efficacy of a product and could result in the APVMA being unable to be satisfied against the prescribed criteria that the product is efficacious. Therefore, you should carefully consider both trial design and analysis before commencing any trial.
This guideline provides advice about designing efficacy trials and may be used as a checklist of design principles. It is not intended to replace the considerable information available on the subject elsewhere and does not include all matters relating to trial design and analysis. We encourage you to seek advice from a professional biometrician or statistician when planning trials.
The guideline draws heavily on Design and analysis of efficacy evaluation trials, which is published by the European and Mediterranean Plant Protection Organization (EPPO) (PP 1/152(4), 2012).
When designing efficacy trials for registration purposes, you should consider the following matters in order to generate data that are both reliable and robust:
- objectives and scope
- design type and analysis requirements
- treatments and controls
- statistical power
- plot and sample size
- managing contamination between plots
- measurement methodology.
Certain agricultural chemical products may require trials to be designed in a particular way in order to demonstrate a proposed label claim for efficacy. For example, there are specific requirements for experimental design and analysis for testing of novel swimming pool and spa sanitisers in order to meet the APVMA’s efficacy criteria.
The type of analysis that can be conducted on trial data is heavily dependent on the trial design. Therefore, it is crucial that the trial design and type of analysis be considered together.
1. Objectives and scope
The objectives and scope for a trial should be clearly defined. The trial design should not be more complicated than is required to meet the objectives. Trial objectives may even place constraints on the trial design.
The main objective of any efficacy trial is to demonstrate that the product will be efficacious when used in accordance with the proposed label instructions. However, the final label instructions (for example, instructions on rates and frequency of use) may not be obvious if pilot studies have not been conducted to determine the likely rate(s) or number of applications. Pilot studies are often designed differently from pivotal studies for this reason.
An example objective could be: To demonstrate that product A will provide control of pest B in situation C, using commercially available equipment, at a level deemed commercially acceptable and no less than current industry standard D.
Trial objectives lead to the scope of the trial. The scope of the trial encompasses all reasonable variables that could affect the trial objectives and ultimately the product’s intended use. The scope must be sufficient to cover all important possibilities and variables and allow a statistical analysis to demonstrate a strong correlation between the application of the product and the measured effect.
Limitations on the scope should also be considered. For example, sometimes objectives can only be met by conducting a number of separate trials, and environmental factors affecting efficacy can vary from plot to plot and over time. The scope of the trial or groups of trials should address these issues to ensure that the design allows for adequate randomisation of treatments, replication of treatments and repeats of trials. Other scope parameters include variation in the testing environment, using the appropriate rates and application methods of the new product and applicable industry standards.
2. Design type
There are many trial designs, each with benefits and constraints. The best design allows the true extent of a product’s efficacy to be assessed by an appropriate statistical analysis. It will take account of any variability in parameters in the study and allow their influences to be measured or ranked in importance. A good trial design will also minimise the probability of any result, efficacious or not, being the consequence of chance.
Efficacy trials should be designed to allow valid and appropriate statistical analyses of the data generated. The most appropriate experimental design will depend on the pest or situation, and you should consider those factors and the test objective when designing the experiment. Treatment group sizes and the number of replicates depend on treatment differences for each pest situation and may need to be determined in preliminary range-finding studies.
Most efficacy trials (especially laboratory trials and field-based trials of agricultural pesticides) include the test product(s), reference product(s) and an untreated control (the treatments). Further details on treatments are provided below. The trial design determines how many of each treatment type are required (that is, replicates), where each treatment is placed in relation to others (for example, randomisation) and how they are placed in relation to the trial environment (for example, blocks of treatment sets). It is the pattern of all these features that makes up the trial design and determines the type of analysis possible.
Common designs used in efficacy trials include randomised complete block designs and Latin squares.
In a randomised complete block design, the block is a group of plots within which the environment is homogeneous, and each block contains only one of each treatment, placed in a random order within the block. This design is useful if the trial area is variable (heterogeneous) but there are patches of homogeneity where a block of treatments will fit. The placement of the blocks should aim to control the variability of the site by ensuring that each treatment is compared against all other treatments under the same trial conditions within the block. Examples of areas where randomised complete block designs are suitable are plantings of horticultural crops grown in areas of varying topography or soil.
Latin squares are a modification on a block design. The design is formed on the basis of a square matrix in which one axis of the matrix is a treatment and the other axis represents another variable, such as time, space or a person. Each treatment appears once in each row and each column. For example, Figure 1 shows a square matrix with four treatments (A, B, C and D):
The type of analysis to be conducted depends on the purpose and design of the trial and the type of observations made. Statistical analysis is not required in all cases; nor is it appropriate in certain situations. However, when a comparison between two products or between one product and no treatment is required, statistical analysis must be provided to support the interpretation of the data. Novel statistical analysis submitted in support of experimental data should be accompanied by the raw data and the published literature that references the statistical technique. This guideline cannot describe all analytical approaches for all trial designs, but aims to provide some principles of analysis to assist applicants. If you are not confident of your knowledge in this area, it is highly advisable to seek the assistance of a competent statistician before starting trial work.
Typically, it is the variable that determines which broad type of analysis is required (that is, parametric or non-parametric). If the variable is quantitative (binary, binomial, discrete or continuous), parametric statistical methods should be used, such as analysis of variance or linear or logistic regression. If the variable is qualitative (nominal and ordinal methods, such as ranking or scoring), non-parametric methods are required.
Before conducting a parametric analysis of variance, three assumptions should be met to ensure that the analysis is valid:
- additivity of effects
- homogeneity of variance
- normality of the error.
If these three assumptions cannot be met, non-parametric methods may be preferred.
Additivity requires that the sources of variability (eg treatments) are independent of each other. Independence results in an additive (eg multiplicative or logarithmic) effect on the response variable (eg pest population). The more variables interact with each other, the greater the chance that the observed response is not the result of the individual treatments may invalidate the observed results. Sometimes, effect results are not on a natural scale and must be transformed to different scales (for example, probit or logit) to meet additivity requirements. Methods to test additivity are available (such as Tukey’s test of additivity).
Homogeneity of variance requires that all the populations tested contain the same level of variability. The less homogeneity between variances of populations being compared, the less likely it is that a parametric method will be able to accurately produce a significant result. There are many tests used to test homogeneity of variance, each with advantages and disadvantages.
Normality requires the distribution of errors (variance around the mean) to be normal. Normal distribution is important because the further the distributions are from normal, the less validity any analysis of variance assessments will have, as there is a greater chance that a significant result will be false (and vice versa). Standard tests and graphical displays are available to demonstrate normality.
Analysis of variance
When reporting the results of an analysis of variance (ANOVA), you should present a table of means of each of the treatments, along with the standard error or confidence interval (the variability around the mean). Presenting means with the variability of results can overcome the difficulty of explaining statistically equivalent results when differences between means are large (and vice versa).
Formal statistical tests, often as F-tests, are usually also performed to demonstrate any significant results between treatments. Typically, study reports present an analysis to compare all treatment means against each other. In considering the original objective of the trial, this may not be necessary and may confuse the analysis and interpretation. For example, not all treatments need be compared against each other, especially if the comparison of interest is only a limited set of treatments, such as the new product versus the industry standard at proposed label rates. If the trial is designed for this purpose, these matters should be considered at the trial design stage (for example, t-tests may be appropriate). You should consult appropriate texts and professional statisticians if you are unsure of the most appropriate test or procedure.
Non-parametric methods may be required and are preferred when the data are qualitative rather than quantitative or the three assumptions described above cannot be met. However, non-parametric methods should be used with caution when analysing small data sets. There are a number of different non-parametric methods, many of which are suitable only in certain situations. You may wish to refer to the EPPO’s Design and analysis of efficacy evaluation trials (PP 1/152(4)) for references describing which test is relevant to a particular type of data set.
You may need to conduct separate but closely similar trials at different locations and/or at different times. The series of trials can be analysed together in certain circumstances (for example, if they have the same methods, external impacting factors and pest abundance and similar standard error) and for particular reasons (for example, to estimate treatment effects over sites and years or to test potential confounding factors). Such an analysis should not be conducted unless it has been planned for at the trial design stage so that all requirements can be met. See EPPO guideline PP 1/152(4) for more details.
3. Treatments and controls
Trials often include multiple treatments to allow for comparisons between treatments of interest. These could be treatments using different rates of the same product to show any different rate effects or treatments using different products to show equivalence or differences between products. Applying no treatment at all is also very valuable, as it can highlight many issues within an experiment and allow different comparisons to be made (see below).
For a test to be acceptable for regulatory purposes, in most cases the test treatment should be compared to one or more control treatments. They can include:
- an untreated (negative) control
- a reference product standard (positive control).
The untreated control allows a comparison to be made by measuring the difference between the treatment and what happens if no treatment is applied. It also allows the effects of any other variables present in the trial to be measured. For example, an unexpected drop in temperature, a hailstorm or spray drift from another area could considerably reduce the population of pests in a crop trial and therefore have an impact on the trial results. By considering the measurements from the negative control, the magnitude of other effects can be determined. The untreated control results are also used to provide a modified, per cent control figure that is specific to the treatment and not any other influential factors. When the test treatment includes additives such as wetting agents, the untreated control can include the same additive so that the true effect of the active constituent can be determined.
In small-scale simulated trials, the inclusion of an untreated control is quite easy. However, the APVMA appreciates that in certain situations, such as large-scale field trials and public health situations, untreated controls may be uneconomic or inappropriate. In such situations, you may need to conduct longer pre-treatment monitoring to determine whether there are any extraneous reasons for population fluctuations. In addition, more trials may be needed to show consistent results, thereby discounting location- and time-specific reasons for population reductions.
The placement of untreated controls within the trial design depends on the specifics of the trial (product type, situation and pest) and the analysis. Untreated controls are most often placed just like any other treatment (included controls), but can also be placed next to every treatment (for example, by splitting individual plots into treated and untreated sections; that is, adjacent controls), by placing them outside the treatment group (excluded controls) or by systematically placing them between and within treatment groups to account for variability in the trial area (imbricated controls).
Positive controls can be used as an industry benchmark, thus providing an idea of how each product compares in equivalent situations. Statistically equivalent efficacy (with a low probability of a chance result) is the minimum aim in this comparison. Equivalent efficacy obtained under difficult conditions usually constitutes good evidence of the efficacy of a proposed product.
Every trial will have factors influencing the effect provided by the treatment applied. This could include variations in the trial environment, such as patchy pest abundance or physical parameters of the trial site. Assigning treatments to plots randomly (and having sufficient replication) is considered the best way of ensuring that any uncontrolled sources of variation affect each treatment evenly and that any bias in assigning treatments to certain plots is removed. Simply assign each treatment or plot to a number and use a random number generator to assign treatments.
Even when testing in what appear to be identical situations, the result of a pest control treatment will be variable due to differences between individuals or populations of the same pest species, slight differences in applications made and variations in environmental conditions. To allow an appropriate assessment, the extent of variation in performance needs to be understood, especially when comparing two products or when comparing against a negative control treatment. Therefore, repeat the same treatment a number of times to measure the likely variability, so that any difference between treatments can be deemed to be due to the treatments and not just chance.
If the nature of the trial does not lend itself to statistical analysis (for example, repeated commercial-scale field tests in separate locations or areas not considered equivalent for statistical purposes), the separate trials are not considered true replicates. Commercial-scale field trials are typically used to demonstrate that the product can be used on a commercial scale with commercial equipment. By themselves, such trials are usually insufficient because they are not designed to demonstrate that efficacy is due only to the treatment being applied. However, if control plots are included in each location, a statistical analysis could be done with these data.
If data held by you or a third party are insufficient to demonstrate an appropriate level of control, you should consider collecting data using a research permit.
6. Replication versus pseudo replication
True replication, not ‘pseudo’ replication,’ is essential in ensuring that the trial can be appropriately analysed. A true replicate involves a single and separate application of the treatment. Each separate treatment application should have one result per parameter measured or experimental unit. If multiple samples are taken from each single treatment they should be added, joined, composited etc, as appropriate to the analysis, to present a single value per replicate. A common error made by researchers is to make one large treatment application and take multiple replicate samples from the one treatment. These samples are considered to be ‘pseudo replicates’ and should not be used as replicates in statistical analysis, where it will be inconsistent with the assumptions used to validate the particular statistical method used (eg ANOVA).
It can be difficult, depending on the trial site and the pest being treated, to determine how best to create proper replicates. One way of ensuring this is to apply blocked separation of treatments and randomisation (see above).
The number of replicate treatments required is dependent on a number of factors, including how much statistical ‘power’ is required (see below), the precision of the measurements being taken (see below) and the variability (heterogeneity) of the test site.
7. Statistical power
The power of a statistical test is the probability that it will yield significant results (Cohen 1977) or the probability of detecting a given difference between treatments if such a difference exists (EPPO PP 1/152(4)). The main goal of determining statistical power is to allow the researcher to decide, while in the process of designing an experiment:
- how large a sample is needed to enable statistical judgements that are accurate and reliable
- the likelihood that the statistical test will detect effects of a given size in a particular situation.
If a design does not have sufficient power, the experiment might not be able to determine with any confidence that a significant result has occurred. A finding beforehand that a test has low power should lead to a review of the experimental design. A test found afterwards to have low power should convince the researcher to either perform the experiment again with a larger sample size or to at least consider what conclusions about demonstrating efficacy, if any, can be drawn from the experiment.
Researchers use the number of residual degrees of freedom (rdf) available in a trial design to determine whether the trial is likely to have sufficient statistical power. Statisticians also use rdf to describe the number of values in the final calculation of a statistic that are free to vary. Generally, the minimum accepted number of rdf required for a trial design to be considered adequate is 12(EPPO PP 1/152(4)). However, this should be increased if there is low precision in the measurements taken. Table 1 gives the rdf in relation to a number of sites, treatments and replicates in a site. You may wish to obtain advice from an expert statistician about this aspect of trial design.
Source: EPPO PP 1/152(4). Efficacy evaluation of plant protection products: design and analysis of efficacy evaluation trials.
The rdf for an estimate equals the number of observations (values) minus the number of additional parameters estimated for that calculation. As the number of parameters to be estimated increases, the degrees of freedom available decrease. It can also be thought of as the number of observations that are freely available to vary, given the additional parameters estimated.
When estimating a single population mean (for example, the average number of weeds per square metre from n-square-metre samples), the number of rdf is n – 1 (the 1 represents the mean around which all other sample measurements will vary). When estimating the difference between two population means (that is, a two-sample t-test), the number of degrees of freedom is n1+ n2 – 2 (or (n1 – 1) + (n2 – 1)).
In a completely randomised design trial with 5 treatments (for example, no treatment, 0.5x proposed rate, 1x proposed rate, 2x proposed rate and an industry standard) and 4 replicates, there are 15 rdf. This is calculated by the total df (that is, 5 x 4 – 1) minus the treatments df (5 – 1 = 4); that is, 19 – 4 = 15.
In a randomised complete block design trial with 3 treatments (for example, no treatment, full rate and an industry standard) and 7 replicates, there are 12 rdf (that is, total df (7 x 3 – 1) – treatments df (3 – 1) – blocks df (7 – 1) = 20 – 2 – 6 = 12 rdf).
In a completely randomised design trial with 3 treatments and 4 replicates repeated at 3 equivalent sites, there are 18 rdf (that is, total df (3 x 4 x 3 – 1) – treatments df (3 – 1) – sites df (3 – 1) – interaction treatment x sites df ((3 – 1) x (3 – 1)) – replicate df over sites (4 – 1) x 3 = 35 – 2 – 2 – 4 – 9 = 18 rdf).
We recommend that you consult a statistician or a compendium of appropriate experimental designs to determine the most suitable study design for each specific situation (host, pest, experimental purpose) and product type. Some product-specific guidelines that may be included in the regulatory guidelines suggest possible trial designs that suit the particular type of product and types of trials that are typically used for that type of product. Alternatively, you may seek an assessment of a proposed trial protocol from the APVMA.
8. Experimental units (plots)
The experimental unit is that part of the trial material to which a single treatment is applied and on which observations are made (EPPO PP 1/152(4). For example, it could be a plot of wheat receiving a selective herbicide application, an apartment with a cockroach treatment, an arm of a person with personal insect repellent or an apple tree receiving a fungicide treatment.
The experimental units should be representative of the population the trial is testing and representative of the likely situation the product being tested will be used in if registration is granted. Lack of environmental uniformity between units can sometimes be dealt with by blocking, as described above.
The size of an experimental unit must be uniform between treatments and between trial sites if comparison is required. The size required is dependent on the variables of the trial. The ‘bigger is better’ rule is true for most situations, as accuracy increases with plot size, but only while the environmental variables of the plot remain uniform. The plot should allow a treatment to be applied that replicates or simulates real use and allows an adequate sample size to be taken. Where specific product type guidelines exist, they may include experimental-unit size recommendations.
Interference between plots is also of considerable importance. Individual plots need to be sufficiently separate from each other to ensure that there is no interference or contamination of treatments (or pests) from one unit to another. For example, in situations where pesticides are sprayed on a crop, spray drift can carry chemical from one treatment plot to another if there is insufficient protection or distance between plots.
Untreated buffer rows between plots can be added to the experimental design to reduce effects such as drift from one treatment to another. Depending on the situation, the rows may be of a taller or denser species or a more disease-resistant or spray-tolerant cultivar. Physical barriers such as plastic sheets can also be used at the time of application. Plots may also be planted farther apart to provide a distance break. Application equipment must also be suitable for this purpose. Alternatively, plots may be large enough to allow for factors such as drift, and sampling the effect of the treatment is only done where those factors are unlikely to be present; for example, a plot size is 5 m x 20 m but only the middle 3 m is used for assessment.
Where the pest of interest is very mobile (such as vertebrate pests or certain insects), it may also be necessary to ensure that units are sufficiently separate or protected so that individuals in one treatment are not able to move out of the treatment area or into another treatment area, thereby affecting each treatment result.
Other types of interference depend on the product, pest and situation being tested. You should document any possible interference issues and explain how they were managed.
9. Variables and methods of measurement
The nature of the variable being observed for change as a result of a treatment is important because it usually influences the statistical method used to interpret the results of a trial (EPPO PP/1/152(4)). Variables can be binomial (for example, yes/no or presence/absence of damage), nominal (for example, non-ordinal descriptions such as species present or type of damage—root, stem or leaf), ordinal (for example, ordered but not measured, or qualitative descriptions such as levels—bad, good, best) or quantitative (that is, measured and ordered results).
Any observations made should be of a type that can be consistently accurate, relevant to the aim of the trial and, wherever relevant, allow an appropriate statistical analysis.
Observations on variables can be in the form of measurements, visual estimations, ranking and/or scoring. Different types of observations require different analysis (that is, parametric versus non-parametric) and in certain situations may not provide suitable evidence for demonstrating efficacy.
There are certain benefits to visual estimation, scoring and ranking, such as speed of observation and possibly lower cost, but you should take care that the type of observation does not limit the choice and power of the statistical method. For example, determining the effect of an ant poison by estimating the number of ants present within 20 cm of a food station at a point in time can be difficult. One method would be to estimate whether the number was less than 10, between 10 and 30, between 30 and 100 or more than 100. This method, while easy, quick and inexpensive, will be unlikely to provide the type of information that will allow a powerful statistical analysis. However, taking a photo and counting ants individually or measuring the amount of food consumed in a particular period may provide the type of information required for an appropriate analysis.
Observations that will best support a statistical analysis will have:
- precision—a combination of accuracy (an absence of any bias by the observer) and reliability (they will have low variability)
- sensitivity—an ability to detect small changes in the parameter observed (measured) in the experimental unit
- repeatability—able to provide the same or a closely similar value to the same observer with identical experimental units
- reproducibility—with the same or closely similar value for a different observer with identical experimental units.
Often, it is necessary or beneficial to make more than one type of observation to determine the effectiveness of a pesticide. In addition, there are usually many ways a product’s effect can be observed and measured. This can lead to different analysis options and, as discussed above, different probabilities of detecting differences in treatments where they exist.
It is critically important, however, that the type of observation made can support the product label claims being proposed. For example, an efficacy trial of pesticide for use in crops can look at both the effect on the pest concerned and any effect in crop yield. Together, these results provide much more information than either alone. Although crop yield is usually the most important end result for the user, for regulatory purposes it needs to be demonstrated that it is the particular pest being observed that is having the yield effect.
It is imperative that the type of observation to be made is considered along with the trial design and analysis before the trial is conducted, and that the observations support the claims to be made on the label. We recommend that you seek professional advice if you are unsure.
Cohen, J 1977, Statistical power analysis for the behavioral sciences, Academic Press, New-York.
European and Mediterranean Plant Protection Organization (EPPO), Design and analysis of efficacy evaluation trials (PP 1/152(4)).
Food and Agriculture Organization of the United Nations (FAO) 1985, Guidelines on efficacy data for the registration of pesticides for plant protection, FAO, Rome.