John Libbey Eurotext

Science et changements planétaires / Sécheresse


Evaluating technologies for agricultural development: How to capture the true impact? Volume 24, issue 4, Octobre-Novembre-Décembre 2013


See all figures


Auteur(s) : Camille Boudot, André Butler, Nikhil Dugal

Institute for Financial Management and Research (IFMR) Centre for Microfinance 24 Kothari Road Nungambakkam Chennai 600 03 India

Reprints: C. Boudot

Agricultural Technology for Development

In much of the developing world the agricultural sector is fundamental for driving economic growth, overcoming poverty, and enhancing food security. Despite the trend of rapid urbanisation, about three-quarters of the developing world's poor still live in rural areas where agriculture is the major source of income and employment (Ravallion et al., 2007). Poverty alleviation is therefore inextricably linked to agriculture. However, in many regions agriculture is characterized by low use of modern technology resulting in modest productivity (Foster and Rosenzweig, 2010; Suri, 2011; Duflo et al., 2009). Enhancing agricultural growth will not be possible without developing and disseminating cost effective yield-increasing technologies.

The green revolution is testament to the potential impact of agricultural technology adoption on poverty alleviation and enhancing food security. Driven by research, development, and technology transfer initiatives occurring between the 1940s and 1970s, the green revolution resulted in dramatic growth of the agricultural sector in many developing countries. Improved varieties of maize, wheat, and rice were widely adopted along with the use of fertilizer and investment in irrigation systems. This led to substantial increases in cereal production allowing countries such as India, China, and Mexico overcome food insecurity (Evenson and Gollin, 2003) . Despite the large gains achieved during the green revolution, adoption of promising technologies has been far from ubiquitous. In Sub-Saharan Africa, adoption of new technologies has lagged behind that of Asia. For example in the year 2000, adoption of improved maize varieties only accounted for 17% of the total area harvested in sub-Saharan Africa compared to 90% in Asia and the Pacific (Gollin et al., 2005). Furthermore, intensive systems where the green revolution technologies made its largest impact have recently shown signs of yield stagnation or even decline (Ray et al., 2012).

A given agricultural technology may dramatically increase yields in ideal conditions, however this does not necessarily mean it will be adopted. There are a variety of market inefficiencies, such as access to credit or supply chains, as well as behavioural reasons including risk aversion, which often constrain adoption (Foster and Rosenzweig, 2010). As a result there exists a prominent adoption gap among smallholder farmers for many promising technologies despite billions of dollars having been invested in their development. Innovative impact evaluation techniques offer an opportunity to incorporate issues such as marketing, finance, labour and social interactions, which have traditionally been neglected during on site agricultural research. Rigorous impact evaluation studies must therefore be used to evaluate the potential of such technologies for reducing poverty and meeting demands of food production without irreversible degradation of the natural resource base.

This review will provide an overview of the potential constraints to adoption of agricultural technologies, helping to guide technology transfer initiatives in order to maximize their impact. Secondly, we will address the challenges of impact evaluation and introduce various methods for effectively capturing the true impact of programs and technologies. Finally we will compare and contrast the different approaches to help researchers make an informed decision on which impact evaluation methods to adopt.

Constraints to Adoption

The main objective of many agricultural technologies is to alleviate productivity limitations. The most direct channel to achieve this is through yield increasing technologies including improved seed varieties, fertilizer, and management practices. Similarly, growth can also be enhanced through increases in efficiency including seed varieties which require less inputs such as fertilizer while maintaining similar levels of production (Chen et al., 2011). However, many farmers do not take on what seems to be profitable investments, thus raising the question as to what the constraints to adoption for agricultural technologies are (Kelsey, 2011).

Financial constraints

One of the leading presumption has been that farmers refrain from potentially profitable investments because they lack the initial capital (Fletschner et al., 2010; Udry, 2010; McIntosh et al., 2013). However, the inherent risk of crop failure caused by drought, pests and diseases, often deter farmers from taking loans and investing in available technologies. Anticipating these risks, some farmers employ low return, low risk production strategies locking themselves into a poverty trap. As a result, risk alleviation technologies such as insurance, irrigation, drought resistant seed varieties, and livestock vaccinations have been identified as a potential catalyst for further adoption (Giné and Yang, 2009; Heffernan et al., 2011; Kostandini et al., 2011). Yet in many cases adoption of insurance products has remained low, with evidence from India pointing towards liquidity and trust as major barriers (Cole et al., 2012). Accordingly, relaxing liquidity constraints through bundled cash grants and insurance, as was done in an impact evaluation study in northern Ghana, increased adoption of weather index based insurance leading to significantly larger agricultural investment and riskier production choices (Karlan et al., 2012).

Information constraints

Information inefficiencies and market failures are particularly acute in developing countries and often supress adoption of agricultural technologies (Kelsey, 2011). Effective communication of the existence of new technologies, as well as how to optimize their potential benefits is likely to promote adoption. Benefits and costs of most technologies however are heterogeneous (Suri, 2011), thus farmers require customized advice. Mobile phone technology has proven an effective tool for reducing the barriers to technology adoption caused by information inefficiencies. The introduction of a mobile-phone based agricultural consulting service in Gujarat, India increased the adoption of more effective pesticides as well as the cultivation of cumin, a lucrative but risky crop (Cole and Fernando, 2012) (see Case Study 1). Yet in many cases market failures may prevent farmers from accruing the full benefits of technologies. For instance, biofortified crops have a higher nutritional content which should be reflected in their market value (Chowdhury et al., 2011), but if the health benefits are either inefficiently signalled or not internalized by consumers, farmers will be unable to demand a higher price. Such constraints can be alleviated through marketing channels, such as contracting between smallholders and commodity-processing firms (Ashraf et al., 2008; Barrett et al., 2012).

Case Study 1 The Value of advice: evidence from mobile phone-based agricultural extension

Shawn Cole and Nilesh Fernando, Harvard Business School Finance Working Paper, 2012.

Agricultural productivity varies dramatically even between farmers in the same region. Some of this variation is caused by scarcity of customized information on optimal management practices.

This study explores the potential of using a mobile phone platform for reducing information inefficiencies to improving agricultural management and reducing barriers to technology adoption. The randomized experiment focused on cotton farmers in Gujarat, subject to two treatments: i) a physical extension service as well as information provided over the phone; and ii) only information provision through the mobile phones.

About half the treated farmers called into the mobile extension service within the first seven months of the intervention. The advice had the effect of changing management practices, such as an increase in the adoption of more effective pesticides and a significantly larger quantity of cultivated cumin, a lucrative but risky crop.


Agricultural technologies that reduce environmental externalities and create positive spill-overs often remain at low levels of adoption because some or all of the benefits from these technologies are not accrued by the adopting farmer. For instance nitrogen fertilizer not taken up by crops or stored in soil organic nitrogen pools is lost to the environment resulting in major pollution problems1 representing a wasted resource for farmers (Tilman et al., 2002) . However technologies which enhance input use efficiency, such as nitrogen use efficient crops, often suffer a comparative disadvantage as the environmental benefits are not valued by the farmer.

Challenges of Impact Evaluation

The intrinsic objective of any impact evaluation is to identify a causal effect for a program or technology on a desired outcome. For example, you would at the end of an impact evaluation want to be able to say “fertilizer subsidies cause an improvement in household income within agricultural communities”. Attributing causal inference relies on the identification of an accurate counterfactual. That is, the true impact of a program can only be determined by comparing the outcome of the intervention to what would have happened without it. Comparing the same individual over time will not, in most cases, give a reliable estimate of the program's impact since external factors that affect outcomes of interest may have changed since the program's introduction. This is particularly true in the agricultural sector; for example, inter-annual fluctuations in rainfall, production costs and produce prices can in many cases have a larger impact on your outcome of interest than the program itself. In contrast, comparing a group of individuals who were identical at the outset to those who participated in an intervention, will act as an unequivocal counterfactual for evaluating impact. Individuals in these groups live through the same external events throughout the same period of time, and thus encounter the same external intervening factors. The only difference between the two groups is that those in what is known as the ‘treatment group’ are exposed to the intervention and the counterfactual more commonly known as the ‘control group’ are not. Therefore, any difference in the outcomes between the two groups at the end of the study must be attributable to the intervention itself.

In practice, people who participate in a program are systematically different from those who do not. Participation in a program or access to a technology may be subject to preconditions which target a subsection of the population. Consider DrumNet, an NGO that provides smallholder farmers in Kenya with information and services allowing farmers to switch to export crops. Farmers are screened for participation based on: i) being a member of a registered farmer group (self-help group); ii) expressing an interest in growing crops marketed by DrumNet; iii) having irrigated land; and iv) their ability to meet the first Transaction Insurance Fund commitment (Ashraf et al., 2008). It is evident that individuals that meet these specific requirements will be inherently different from the general population. Furthermore, the decision to participate in the program is voluntary, creating a system of self-selection. That is, individuals who choose to participate may for example be better educated or have larger land holdings. Comparing a group of participants to a group of non-participants, even if they live in the same region and at the same time, would therefore introduce a selection bias where any difference between the groups can be attributed to both the impact of the program or pre-existing differences, which are difficult to disentangle.

Randomized Control Trials

The most straightforward way of avoiding selection bias and establishing a credible counterfactual is through randomly assigning sampling units to either a treatment or control group (Duflo et al., 2007). Randomization ensures that the two groups are identical on average at the outset, the only difference being that those in the treatment are exposed to the intervention and those in the control are not. Individuals in these groups live through the same external events throughout the same period of time, thus isolating external intervening factors such as a favourable harvest or fluctuating prices, from the impact of the intervention itself. Any difference in the outcomes between the two groups at the end of the study can therefore be attributed to the intervention itself. Furthermore, random assignment assures the direction of causality: for example, adoption of an agricultural technology causes an increase in wealth of farmers, rather than, wealthy farmers are more likely to adopt the technology. Such selection bias, due to who chooses to join as well as to whom the program targets, often prevents non-randomized evaluations from demonstrating a causal link.

Randomized controlled trials (RCTs) have emerged as one of the best ways to determine which types of interventions and products are effective. RCTs originally gained widespread popularity following an influential study by Lalonde (1986) which concluded that other econometric methods could not accurately replicate experimental results, though as we will discuss in the final section, more recent evidence has questioned this conclusion. Nevertheless, over the last decade randomized experiments have experienced a rebirth with several influential studies emerging from the field of development economics (Karlan and Zinman, 2009; Pande, 2011; Duflo et al., 2009; Beaman et al., 2012) becoming widely viewed as the gold standard for impact evaluations.

The empirical specification for analysing experimental data from RCTs is relatively straightforward. Simple cross-sectional experimental data can be analysed by regressing the outcome (yi) on an intercept (α) and a dummy variable indicating treatment status (T), thus yielding an unbiased estimator for the average effect of the treatment (β).






As equation (2) suggests, the estimated average treatment effect (β̂) is the difference in the expected sample averages by treatment status. Assuming the error term is uncorrelated to treatment, causality is established. Baseline covariates can be included in the regression function to improve precision without jeopardizing consistency, as the randomization implies that in large samples the treatment indicator and the covariates are independent. This analysis can be represented as:


where i represents individuals and t represents time (often two time periods, before and after). As in equation (1), the treatment effect is captured by the coefficient β1. Here however we also account for xi(t-1), a vector of baseline variables. Controlling for the baseline values of covariates that are likely to influence the outcome does not affect the average treatment effect or impact (β1). It can however reduce the variance of the estimators, having implications for the chosen sample size. That is, reducing the standard errors of the estimates by controlling for baseline variables that have a large effect on the outcome can reduce the required sample size. Conversely, controlling for variables that have little or no influence on the outcome can increase the standard errors by reducing degrees of freedom.

Factorial Design – Dealing with Several Simultaneous Treatments

Agricultural production involves complex ecological systems with many interacting factors. Impact evaluations should therefore focus on testing relevant research questions that account for this complexity. Optimizing productivity is not a purely biological problem but one characterized by social, cultural, political and economic dimensions. Inputs and associated management practices can often be complementary. For example, most improved crop varieties only express their full yield potential relative to traditional varieties when combined with adequate inputs, in which case the adoption of one particular input can act as a catalyst for further adoption of complementary technologies. Similarly, farmers are often induced to re-optimize other aspects of the production system in response to a technology. This can be seen in female rice farmers in Mali, who increased their use of both herbicide and hired labour in response to fertilizer transfers (Beaman et al., 2013). Isolating the most effective aspects of a complex intervention can be achieved by testing different treatments and their complementarity. This requires building carefully designed experimental evaluations around comprehensive hypotheses considering the entire production function.

In practice, agricultural technology transfer initiatives often involve a suit of technologies bundled with financial products and services which account for the complexity of the system. For example, Giné and Yang (2009) evaluated whether the provision of insurance against a major source of production risk induces maize farmers in Malawi to take out loans for adoption of new technologies. Evidently, this particular intervention encompasses multiple financial products as well as a variety of agricultural technologies. In such cases, we could simply test the effectiveness of the entire program however this would not teach us about the channels through which the program impacts the target population. In such cases a factorial experimental design can be used to test several different treatments and combinations of interventions simultaneously, with randomization being conducted so that treatments are orthogonal2 to each other. Factorial experiments can also establish whether treatments have important interaction effects by comparing treatment bundles with individual treatment effects:


where i represents individuals and t represents time period, β1 and β2 are the coefficients of interest capturing the effect of the separate treatments, and xi(t-1) is a vector of baseline variables (refer to text around equation (3) for explanation regarding this vector).

An example of effective use of factorial design comes from a study of a pilot program for Atención a Crisis in Nicaragua, aimed at alleviating risk to climate shocks through conditional cash transfers (CCT) and diversification of income generating activities (Macours et al., 2012). The full program combined a CCT (aimed at short term shock relief) with a vocational training or a productive investment grant (aimed at long term risk reduction). In order to isolate the different components of the program, households received a basic conditional cash transfer or the same CCT plus one of the two diversification interventions. A comparison between treatment groups revealed that households receiving the diversification intervention as well as the CCT were better protected against the negative impact of drought shocks.


Unit of randomization

The unit of randomization is the unit at which the treatment is allocated. There are two main options for the randomization unit: i) group (e.g. villages, self-help groups) or ii) individual (this can include individual households or people). This choice is often primarily dependent on the kind of intervention which is being evaluated. The randomization unit may however, differ from the sampling or observation unit. For example when analysing the impact of agricultural loans on farmer well-being, the intervention of providing access to loans through a micro-finance institute is randomly allocated at the village level while the observation unit is the individual. In this case, the regression analysis introduced in the previous section will in fact be as follows:


where the j represents the group (e.g. village) and δ is the fixed effect (time constant group indicator).

Choosing between group and individual randomization units can be used to tackle the problem of spill-over. One of the major difficulties in conducting social experiments is the ability to effectively isolate the treatment. Many agricultural technologies and practices do not have clearly defined boundaries and are thus susceptible to contamination or spill-over between treatment and control units (Cole and Fernando, 2012). Sharing of agricultural equipment, peer effects that increase adoption of agricultural practices among neighbours, and contamination of neighbouring fields with genetically modified varieties are all examples of spill-over. If spill-overs between treatment and control groups are positive, implying they have positive externalities, it can lead to an underestimation of treatment effects as they would positively impact the outcomes for both treatment and control groups; while negative spill-overs would lead to overestimating treatment effects as outcomes for the control group would be negatively impacted by the treatment. The most simple and effective way of minimizing spill-over effects is to ensure sufficient geographical distance between randomly sampled units. However in cases where spill-over is not a significant problem, treatment can be randomly assigned at the same level as the sampling unit, for instance randomly assigning households to a treatment or control group within a set of villages such that the same village may contain both treatment and control units.

A unit of observation that is particular to agriculture is the plot level, frequently used when researchers are interested in measuring the impact of technology adoption on yield. The impact of a new technology on productivity in real world farms is likely to differ significantly from those measured within the controlled environment of a research station, mainly due to variability in inputs, management practices, and environmental parameters. A study in Kenya, designed to ascertain the profitability of fertilizer, offers an example of plot-level design. Within each sampled farm, two treatments (fertilizer alone or fertilizer with hybrid seed) were randomly allocated to two of three adjacent plots of equal size, with the final parcel acting as a control. All environmental parameters, including the farmer's ability and skills, were therefore controlled for, leaving all things equal on average except for the fertilizer status (Duflo et al., 2008).


Sample populations can be subdivided, or stratified, on the basis of certain observable characteristics, such as income or education. Stratified randomization ensures that the treatment and control groups are balanced along these characteristics, for example that there is the same fraction of individuals above and below the poverty line in both the treatment and control groups. This approach can be adopted to then evaluate the impact of a program on specific subgroups defined by these characteristics.

Stratification is useful for establishing whether average outcomes differ for different groups of people. Establishing for example, that expected outcomes from an intervention are higher for people with certain characteristics could help in better targeting the implementation of a particular intervention for the greatest impact. Consider for example fertilizer adoption, which has highly heterogeneous returns to investment for apparently similar households, in part due to the pre-existing fertility status of the soil (Marenya and Barrett, 2009). Soil organic matter acts as a nutrient reservoir, which helps to preserve the nutrient status of the soil and consequently enhance the efficiency of synthetic fertilizers. As a result, cultivation on highly degraded soils, which is often the case for poor farmers, could lead to fertilizer subsidies being less pro-poor and equitable than is commonly assumed (Beaman et al., 2013). In this case researchers and policy makers may be interested in testing the returns to investment for wealthy and poor farmers separately, possibly by stratifying assignment to treatment on a measure of income or wealth.

Sample size

In general the notion of “more is best” is applicable when considering sample size; however practical and budgetary constraints surrounding impact evaluations in the field compel researchers to estimate the optimal size for an experiment. Sample size is important as it decisively influences the statistical power of an experiment3, which is the probability that a treatment effect will be correctly identified if there is one. Furthermore researchers can also optimise the number of sampling units so as to target a Minimum Detectable Effect (MDE), which is the smallest treatment effect that can be successfully identified.

Once the targets for statistical power and MDE have been established, researchers also need to consider the heterogeneity of the outcome variable which plays a pivotal role in influencing the sample size. For example if the variance between units sampled is small, that is the population is homogeneous in their main characteristics (e.g. income, consumption, ethnicity, occupation, etc.) then a smaller sample size is sufficient to achieve a given power and identify a treatment effect when there is indeed one. This is especially the case when stratification is adopted along some main characteristics, as this reduces the variance in outcomes (since we are controlling for some of their determinants) and therefore decreases the necessary sample size. Another special consideration arises when randomization is done at the group level. In this case one needs to consider whether to increase the number of groups or the number the individuals sampled within each group, which in turn will depend on both the variance between groups and within groups. The variance within-groups (also called intra-cluster correlation) can often be relatively low as group members are submitted to the same environment and to the same treatment, in which case the increase in power from adding additional members to a group is minimal. On the other hand, between-group variance is often significantly higher and therefore has the potential to influence the power of the experiment. Therefore, in such a case more focus should be placed on increasing the number of groups rather than increasing the group size.

Impact and Participation

In the majority of cases the adoption of new agricultural technology is voluntary, such that the intervention only increases the probability that a farmer will receive the treatment but does not ensure adoption. Consider a new crop variety that is made available for purchase in treatment villages, while farmers in control villages continue to cultivate existing local crop varieties. A comparison of the mean yield between the two groups is not an estimate of the yield benefits of the new variety, but the impact of marketing the seed on the entire treatment group regardless of whether they adopt the technology or not. This estimate is known as the intention to treat (ITT) and is likely to be smaller than the direct yield benefits of the new variety, since only a fraction of farmers in the treatment group would have adopted.

When compliance is incomplete the direct benefit of the technology for those who choose to adopt can also be informative; this estimate is often known as the average effect of the treatment on the treated (ATT). In the pursuit of increasing the adoption of an existing technology, impact evaluation projects may face the situation wherein members of the control group also have access to the technology being evaluated. For example, randomized experimental trials in Kenya gave the farmers the opportunity to experiment with fertilizer on their own farm by providing a “starter kit”, consisting of a small quantity of fertilizer, or fertilizer and hybrid seeds sufficient for a 30-square-meter plot (Duflo et al., 2009). In this case, the starter kits were intended to encourage farmers in the treatment groups to use fertilizer for cultivation. However, some individuals in the control group will have also been using the same technology. In such cases, the ATT can be recovered using treatment group status as an instrumental variable (we will touch on this method again in the instrumental variable section). This estimate is known as the local average treatment effect (LATE) which can be interpreted as the average treatment effect on a specific group of individuals who are induced by the intervention to take advantage of a technology (Imbens et Angrist, 1994; Heckman, 1997). In the case when nobody in the control group adopts the technology, ATT can simply be estimated by dividing the ITT estimate by the probability of adoption.

Propensity Score-Matching

Propensity score matching (PSM) enables researchers to estimate the effect of receiving a treatment when using a sample for which there was no random assignment of the intervention (Rosenbaum and Rubin, 1983). This can be the case for projects where random allocation of treatment is not possible, either from an ethical or a logistical perspective. PSM creates a control group by pairing treated units (villages, households or individuals) with non-treated units that have similar characteristics. Consider for example, comparing the outcome from agricultural credit on household income between those who take up credit and those who don’t. The results from such an analysis would be subject to selection bias due to factors which are inherently different between the two groups; such as education level, land holdings, input usage etc. The PSM method uses these variables to predict the probability of adopting credit and thereby gives a scale along which to create a control group from a sample of non-treated units that exhibit similar characteristics to credit users.

The propensity score value measures the probability of adopting an intervention based on a given set of influencing observables. The propensity score exi for subject i,(i=1,…,N) is estimated using a logistic regression4 so as to find the conditional probability of being assigned to the treatment group given a set of observed covariates (Imbens and Wooldridge, 2008). That is:


where xi are variables which predict receiving the treatment (Ti =1). Dillon (2011) uses this methodology to create counterfactual groups for farmers who access small-scale versus large-scale irrigation projects in Mali. The logistic regression to estimate the propensity score includes covariates such as household size, assets, age of household head, education levels, and farm capital, which are all determining characteristics for irrigation access.

The construction of a credible counterfactual relies on an effective matching technique. The most basic matching technique is the nearest-neighbour method, where treated individuals are paired with control units which have the closest propensity scores. This method is vulnerable to imprecise matching if the closest neighbour is remote. This can be resolved by defining a maximum acceptable difference in propensity score values above which no match is assigned to that particular treated individual; commonly known as radius matching (Dehejia and Wahba, 2002). This method incorporates all the individuals within the defined radius allowing for a large control group when there are many close matches.

Kernel and local linear matching are non-parametric estimators where all treated units are matched with a weighted average of all the control units (Heckman et al., 1997; Heckman et al., 1998). The weights are inversely proportional to the distance between propensity scores. The main advantage of this method is the reduction in variance achieved by incorporating more information. The local linear matching method includes a linear term in addition to the intercept in the propensity score of the treated individual; increasing the accuracy when comparison group observations are distributed asymmetrically around the treated group. This method is commonly used in conjunction with difference-in-differences estimation, thus eliminating time constant sources of bias, making the estimator more robust (Dillon, 2008).

While this approach allows researchers to estimate impact when a randomized experiment is not feasible, it nevertheless faces certain challenges which reduce its viability. The underlying assumption of PSM is that there is no difference on average outcomes of interests and their observable determinants between the treated and untreated group pre-intervention. This assumption can only be certified if all the variables which cause differences between the two groups can be controlled for. This entails that the variables need to be observable and measurable, which is possible for variables such as ethnicity, income and education, but much harder for variables such as attitude to risk and time preferences. It is very rare to be able to completely avoid omitted variables and the degree to which this is achieved is not easily testable, making selection bias an important issue to be considered.

As mentioned previously, Dillon (2011) makes use of a large range of household and village level observable covariates. Results on the agricultural production and income are robust, showing that small-scale irrigation accrues higher profits (see Case Study 2). However results on consumption per capita are mixed, potentially due to unobservable characteristics unaccounted for in the propensity score estimation. Rigorously analysing what influences adoption can prove beneficial in limiting the potential for omitted variables. Mendola (2007) uses a cross-sectional household survey from rural Bangladesh to calculate the PSM in adopting high yielding varieties. The author makes use of four different logistic regressions in order to test the importance of accounting for different observables, including time related differences between the dry and wet season. Through this rigorous matching process, units once matched are close to identical.

Case Study 2 Do differences in the scale of irrigation projects generate different impacts on poverty and production?

Andrew Dillon, International Food Policy Research Institute, 2011

In 2007, only 4% of total cultivated area was under irrigation in sub-Saharan Africa. This suggests significant potential to alleviate poverty by increasing investment in irrigation. However irrigation has so far not proved uniformly successful.

This paper quantifies the differences in welfare between farmers accessing large-scale irrigation, small-scale irrigation or no irrigation in Mali. Two main identification problems arise: i) the non-random program placement due to targeting of projects (e.g. less productive areas to target the poor); and ii) non-random program participation (e.g. more educated farmers are more likely to adopt new technologies). To resolve the potential bias in comparing “users” versus “non-users” the author adopts PSM to construct the treatment and control groups. Propensity scores are estimated using a probit model which accounts for both household and village characteristics that predict the probability of adopting irrigation. Given these scores, groups of treatment and control are created and used to measure the average treatment effect of those receiving treatment.

The paper finds a significant increase in yields and income for those in small scale irrigation projects. Results on consumption however, are less precise and consistent.


Difference-in-Differences (DID) is based on a framework involving both a treatment and a control group over at least two periods of time. The approach can therefore be used to measure impact in both randomized experimental (Ashraf et al., 2008) and non-experimental settings (Dillon, 2008). In this section however, we will focus on the use of DID as a methodology to estimate impact for non-experimental interventions. For example DID can be used to analyse the impact from a change in law regarding agricultural subsidies that affects one state but not a neighbouring one. Using DID, it is possible to exploit the presence of a neighbouring state as a natural counterfactual.

For this method, both study groups are followed over at least two time periods. The impact of the intervention is thereby calculated as the average change in the control group subtracted from the average change in the treatment group. This can be estimated via the following equation: yit=α+β1Ti+β2Pt+β3Ti∗Pt+εit where


where i indicates the household and t indicates the time period. T is a dummy variable taking the value 1 if the individual is in the treatment group and 0 if they are in the control group, which captures possible differences between the two groups prior to the intervention. P is a dummy variable taking the value 1 in the post-treatment period and 0 in the pre-treatment period, which captures aggregate factors that would cause changes in y even in the absence of an intervention. Our main parameter of interest is β3, which is an estimate of the impact on the treated group (T) beyond changes experienced in the control group (C) over time, accounting for any pre-existing differences between the two groups (Imbens and Wooldridge, 2008).

The within group difference across time allows us to isolate any group specific unobservable other than time varying effects. For example, the control group may include villages which on average have a smaller market thereby reducing the opportunity of sales in terms of volume traded. In figure 1, this is represented by the vertical distance between the two groups at the starting point in pre-treatment. The difference between groups then allows us to remove the bias from time trends unrelated to the intervention in question, which is shown in figure 1 through the general increasing trend over time of both the control and treatment group. For example, favourable weather will positively affect agricultural output. If we were to analyse the impact of a new seed variety, DID would allow us to remove the bias which weather would bring to the evaluation as well as any group differences between users and non-users of the new variety, thereby focusing our results on the impact of the treatment itself.

The validity of this method is based on various assumptions5. The most difficult assumption to assert is that of the parallel trends which requires both groups to follow the same trajectory over time (Imbens and Wooldridge, 2008). That is, it requires that the rates of change between the two groups would have been the same in the absence of the intervention. In figure 1 this is represented by the dashed line, which shows the treatment group trajectory if they had not been a part of the experiment. There are various ways in which to confirm this assumption. One innovative approach is that of Azam (2012) which exploits the phase wise implementation of the Indian government program for rural employment (NREGA) which was launched in 2006 across only 200 districts in India with additional districts included in 2007-2008 and again in 2008-2009. Both baseline and post intervention data were available for all districts covered. In order to prove the parallel trend assumption, the author makes use of a DID estimation with two pre-intervention data rounds showing that the control and treatment villages where in fact on the same trajectory. Another method commonly used for fulfilling the parallel trend assumption, is to use propensity score matching so as create a credible counterfactual (as explained in the previous section). Furthermore when using PSM with DID, having the two time period data allows for accounting of differences in trends when creating the two groups rather than just differences in the initial level of observables, thereby increasing the consistency of the two groups.

Regression Discontinuity

Regression discontinuity (RD) was first introduced by Thistlethwaite and Campbell in (1960) for impact evaluations in the field of education. Though it is not a novel approach, it has only recently been widely recognised as a means of testing the impact of development programs (Lee and Lemieux, 2009). This method is especially applicable when estimating treatment effects of an intervention which targets a specific population through a threshold based treatment assignment. For example, farmers may only be eligible for a means based subsidy program if they fall below a defined wealth threshold. This threshold is then exploited as a way of simulating a randomized setting, with the premise that individuals which fall just above will represent a valid counterfactual to individuals that fall just below the threshold. That is, it relies on the assumption that individuals around the threshold will in fact have similar characteristics. Therefore any change in the outcome variable between these two groups following the intervention, can be attributed to the treatment effect. This can be represented as:


where for individual i, y is the outcome, x is the explanatory variable, and D is a dummy variable such that D=1 if xc and D=0 if x<c where c is the threshold at which the treatment occurs. In this model, the coefficient of interest is β1 which is an estimate of the impact on the treated group (Imbens and Wooldridge, 2008). Kanz (2012) makes use of this methodology to analyse the impact of the Indian debt relief program for small and marginal farmers. The author uses the eligibility criteria which specify that households with less than two hectares of land receive 100% unconditional debt relief. Comparing households in the immediate vicinity of the program threshold the author shows that these do not differ in their observable pre-program characteristics, therefore any difference post-program can be attributed to the debt relief intervention.

As for all the previous methods considered in this paper, there are some concerns with the viability of this method. The main issue is that individuals should not be able to manipulate the threshold. That is, the presence of the treatment should not change the behaviour of those close to being eligible, such as incentives to move their conditions so as to satisfy the criteria for benefitting from the intervention. Secondly, it is essential to ensure that all other variables which determine the outcome don’t change at the threshold. For example being eligible to participate in a subsidised fertiliser scheme should not change the probability of obtaining credit. If it does, this would bias the estimated treatment effect (Lee and Lemieux, 2009).

A growing number of studies have adopted the regression discontinuity design to estimate program effects in a wide variety of economic contexts (see Lee and Lemieux, 2009 for a thorough review of the methodology and its application). There is definitely the scope for developing this method within agricultural programs, especially when the intervention is targeted to a specific group and depends on clear eligibility criteria such as size of land holdings or household wealth index.

Instrumental Variables

When attempting to evaluate an intervention through a non-experimental setting, one of the main challenges is establishing the causal relationship. For instance, large irrigation dams are likely to increase agricultural productivity, but regions with high productivity have more capacity to invest in dam construction. Due to this problem of reverse causality, our simple correlation estimates do not represent the true impact on the outcomes of interest. Using instrumental variables is one way of solving this empirical problem.

The instrumental variable methodology relies on finding an instrument which influences the main outcome only indirectly through a variable of interest. For example when analysing this relationship between dams and agricultural productivity, Duflo and Pande (2007) exploit differences in river gradient across districts within Indian states, which influences the probability of dam construction. This link is feasible, as the most suitable areas for dam construction are passages with a low river gradient, thereby directly impacting on the suitability of construction. However river gradient is otherwise unrelated to agricultural productivity, thereby fulfilling the requirements of an effective instrumental variable.

The first step in the IV approach is to estimate the variable of interest (the endogenous predictor) using the chosen instrument. Secondly, the above estimate for the variable of interest replaces the endogenous variable in the main equation, and is then used to calculate the impact on the outcome. This can be represented with the following equations:



where β1 gives the estimate for the variable of interest using the instrument z. In the second stage, x̂, which is the estimated variable in the first stage (x̂=β1+ρ), is used to estimate the true impact of x on the outcome y (Imbens and Wooldridge, 2008).

The main drawback to this methodology is finding an appropriate and effective instrument. It is very difficult to find a variable which impacts on the outcome solely via a variable of interest. It is possible however, to make use of randomly varying incentives to participate in a program without affecting outcomes of interest, as an effective instrument (Imbens and Angrist, 1994). As we mentioned previously, in a randomized experiment with voluntary compliance the treatment status can be exploited as an effective instrument.

A Comparison of Methods

As we have seen in this review, a range of methods are available when attempting to evaluate the impact of agricultural technologies. It is therefore important to compare and contrast these different methods allowing researchers to make an informed choice on which method is most appropriate given a range of possible interventions.

RCTs are often heralded as the gold standard in impact evaluations, due to their credibility in concluding that the measured impact is indeed caused by the intervention (Fisher, 1925; LaLonde, 1986; Duflo et al., 2007). However, when we conduct experiments, our goal is to evaluate the direct causal effect of an intervention in a way that enables us to make statements about people at large. The degree to which the findings of RCTs can be generalised to a wider population, often referred to as external validity, has been the subject of much debate (Banerjee and Duflo, 2008; Deaton, 2009; Imbens, 2009; Heckman and Urzúa, 2010; Pritchett and Sandefur, 2013). Pritchett and Sandefur (2013) argue that the focus of RCTs on ensuring internal over external validity can be misleading, and that non-experimental evidence from the right context is more accurate and meaningful for policy makers than estimates of impact from randomized studies taken from a remote geographical, economic, or cultural contexts. However this issue is not restricted to RCTs as it also applies to any geographically and temporally restricted empirical method. In order to address this concern, Banerjee and Duflo (2008) advocate the actual replication of studies in different contexts (this would range from, different country, implementing agency, research team etc.). There are in fact a growing number of these “replicating studies” which allow researchers to define the generalizability of the results, and leads to a general growth in knowledge on the importance of context specific characteristics. For instance the study by Miguel and Kremer (2004) in Kenya which tested the impact of deworming on school attendance was replicated in North India by Boninis et al. (2006) with the addition of iron supplement as the authors had found high anaemia levels in children (unlike those in Kenya). Both studies proved that this program had a significant and cost-effective impact on increasing school attendance.

RCTs have also been criticised for their lack of theoretical content (Deaton, 2009). However as argued by Banerjee and Duflo (2008), is it precisely the fact that experimental results do not rely on theory for their identification that their results are often very innovative. Nevertheless, it is true that theory should be considered when implementing an experiment, which is often the case, just as results from experiments are vital in refining theory.

An important aspect to remember is that not all interventions can be evaluated using randomized experiments, due to both practical and ethical obstacles. In such cases quasi or non-experimental evaluation methods can provide a feasible alternative.

Using randomized experiments as the benchmark, it is possible to evaluate the accuracy and consistency of non-experimental methods. Using this approach, Lalonde (1986) provided evidence that non-experimental econometric estimates often differ significantly from experimental results leading him to endorse RCTs as the more reliable. However, a more recent analysis of the same dataset revealed that relative to the estimators that Lalonde evaluated, propensity score matching estimates of the treatment impact are much closer to the experimental benchmark (Dehejia and Wahba, 1999; Dehejia and Wahba, 2002). However, the consistency of propensity score matching appears to be sensitive to the variables included in the probability estimates. This suggests that estimates based on propensity score matching are not generalizable and careful consideration should be given to the availability and choice of variables to include in the score estimation.

A comparison of different methods for evaluating the impact of an education program in Western Kenya that randomly provided flip charts to primary schools suggests that non-experimental estimates seriously overestimate the impact on student test scores. This problem was minimized by analysing propensity score pairs using a difference-in-differences approach (Glewwe et al., 2004). This result is also supported by a study on the impact of migration in New Zealand which found that non-experimental methods other than instrumental variables are found to overestimate the gains from migration by 20–82%, with difference-in-differences and bias-adjusted matching estimators performing best among the alternatives to instrumental variables (McKenzie et al., 2010).

One of the main constraints to applying regression discontinuity analysis is the availability of data in the immediate vicinity of the treatment selection threshold. Relaxing data selection criteria to encompass observations further from the discontinuity will both increase the statistical power for estimating a treatment effect, and augment the selection bias of the study. This bias can be minimised by using a local linear or a Kernel based semi-parametric estimator where regression estimates are weighted by distance from the selection threshold (Hahn et al., 2001; Porter, 2003). In benchmark comparison studies, well designed discontinuity regression have demonstrated a high degree of consistency (Buddelmeyer and Skoufias, 2004; Cook et al., 2008).

Having compared all of these potential methodologies to address impact evaluation, it is fair to conclude that RCTs provide the most reliable estimates of treatment effects. However, RCTs should not limit the range of programs to be evaluated. Interesting interventions which cannot be evaluated using RCTs should not be left behind. As we have seen in this paper there does exist a range of other methods which can provide consistent estimates, especially when used in conjunction with one another.


We would like to thank Laura Litvine at the Centre for Microfinance – IFMR, Simone Verkaart at ICRISAT – Nairobi, Ben Roth at MIT, Nilesh A. Fernando at Harvard University, and Loic Watine at Innovations for Poverty Action (IPA) for their valuable comments on this manuscript. All errors are our own.