Evidence from Rural Mexico
M Csapek SA, Tecnológico de Monterrey
Published in Education Economics, 2025
The puzzle: Returns to education are positive and significant — yet educational attainment remains very low
Standard explanation: external constraints
This paper: Can a holistic child sponsorship program shift children’s aspirations toward higher education?
Growing evidence: internal constraints — aspirations, grit, self-efficacy, self-esteem — also play a critical role (Cunha and Heckman 2009; Dalton et al. 2016; Heckman and Kautz 2012)
Compassion International: Third largest child sponsorship program worldwide
Duration: Average 9.3 years of sponsorship (\(\approx\) 4,000 hrs of organized activities)
Program components:
Hypothesis: By broadening horizons and alleviating income constraints, CI may raise children’s aspirations to pursue further education
Research Question
Does Compassion International’s sponsorship program raise the aspiration of rural Mexican children (ages 12–15) to acquire a higher education degree?
Three contributions to the CI literature:
Institutional Framework & Data
Material support:
Educational support:
Socio-emotional development:
The holistic design is the key mechanism: not just income relief but also expanding perceived career possibilities and social horizons
Multi-stage targeting procedure:
Eligibility criteria:
Key implication for identification: Selection is not random — younger and more economically disadvantaged children are more likely to be selected
Survey design:
Survey subjects:
Data collected as part of a companion study with Ross et al. (2021) evaluating CI’s impact on psychological indicators
| Step | Restriction | N |
|---|---|---|
| Initial survey | Ages 10–18 | 926 |
| Age restriction | Ages 12–15 only | — |
| Reason 1: | Children start working after primary (\(\approx\) age 12) | |
| Reason 2: | CI sites operational < 6 years on average | |
| Reason 3: | Aligns with sponsorship eligibility (age ≤ 9 at start) | |
| Final sample | 403 | |
| Sponsored | (CI group) | 163 |
| Non-sponsored | (Control group) | 240 |
Note: In Model 2 (with subjective expectations), sample further restricted to 271 children who correctly interpreted probability questions used to elicit income beliefs
| All Mean (SD) | Sponsored Mean (SD) | Non-Spons. Mean (SD) | t-test | |
|---|---|---|---|---|
| Aspires: higher ed. (any) | 0.730 | 0.712 | 0.742 | -0.030 |
| Aspires: university degree | 0.620 | 0.571 | 0.654 | -0.084* |
| Age | 13.375 | 13.006 | 13.625 | -0.619*** |
| Male | 0.469 | 0.429 | 0.496 | -0.066 |
| Asset index | 0.057 | -0.216 | 0.244 | -0.460*** |
| Protestant | 0.506 | 0.730 | 0.354 | 0.376*** |
| Education father (yrs) | 6.797 | 7.000 | 6.659 | 0.341 |
| Education mother (yrs) | 6.490 | 6.321 | 6.604 | -0.283 |
| N | 403 | 163 | 240 |
*** \(p<0.01\), ** \(p<0.05\), * \(p<0.1\). Asset index from first principal component of household assets (proxy for income).
Differences: sponsored children are younger, more likely Protestant, and from poorer households — consistent with CI’s targeting
Adapted from Attanasio et al. (2012) (used in Prospera evaluation):
For each education level \(\ell \in \{\)primary, middle, high school, technical, university\(\}\):
Assuming a triangular distribution \(f(Y^\ell)\) on \([y^\ell_{\min}, y^\ell_{\max}]\):
\[\mathbb{E}[\ln(Y^\ell)] = \int_{y_{\min}}^{y_{\max}} \ln(y)\, f_{Y^\ell}(y)\, dy\]
Perceived returns: \(\;\rho^\ell = \mathbb{E}[\ln(Y^\ell)] - \mathbb{E}[\ln(Y^{\ell-1})],\quad \ell = 2,\ldots,5\)
Subjective Expectations
From the survey data, for each individual \(i\) and education level \(\ell\):
\[\mathbb{E}[\ln(Y^{\ell}_i)] = \int_{y^\ell_{\min,i}}^{y^\ell_{\max,i}} \ln(y)\, f_{Y^\ell_i}(y)\, dy\]
This is computed directly from the individual’s stated \(y^\ell_{\min}\), \(y^\ell_{\max}\), and the probability question that pins the triangular distribution.
Perceived return to education level \(\ell\):
\[\rho^\ell_i = \mathbb{E}[\ln(Y^\ell_i)] - \mathbb{E}[\ln(Y^{\ell-1}_i)]\]
Used directly as a control variable in the outcome equations.
Distributional assumption
Income conditional on \((y^\ell_{\min}, y^\ell_{\max})\) follows a triangular distribution. Robustness: uniform distribution gives nearly identical results.
Comparison with 2015 census at 2017 prices (Table 2, medians in MX pesos):
| High School Male | High School Female | University Male | University Female | |
|---|---|---|---|---|
| Census: Oaxaca (pop < 50k) | 4,272 | 3,296 | 6,408 | 6,408 |
| Census: Chiapas (pop < 50k) | 3,204 | 2,746 | 4,577 | 4,577 |
| Survey: Sponsored (Oaxaca) | 6,766 | 2,739 | 13,110 | 5,826 |
| Survey: Non-Sponsored (Oaxaca) | 4,245 | 3,248 | 9,762 | 6,471 |
Key patterns:
Reading the figure:
Children’s beliefs are realistic on average — but extremely diverse across individuals
Roy Model
Naive comparison is biased:
Standard IV approach (LATE):
This paper: Binary Roy-type model (Aakvik et al. 2005)
Three latent variable equations:
1. Selection equation — who gets sponsored:
\[S^*_i = Z_i\gamma - U_{Si}, \qquad S_i = \mathbf{1}[S^*_i > 0]\]
2. Outcome for sponsored (\(S_i=1\)):
\[Y^*_{1i} = \beta^1_0 + \rho_{HE,i}\,\beta^1_2 + \mathit{Dist}_i\,\beta^1_3 + \tilde{X}_i\,\beta^1_4 - U_{1i}\]
3. Outcome for non-sponsored (\(S_i=0\)):
\[Y^*_{0i} = \beta^0_0 + \rho_{HE,i}\,\beta^0_2 + \mathit{Dist}_i\,\beta^0_3 + \tilde{X}_i\,\beta^0_4 - U_{0i}\]
Observed outcome: \(Y_i = S_i Y_{1i} + (1-S_i)Y_{0i}\), where \(Y_{ji} = \mathbf{1}[Y^*_{ji}>0]\)
\[S^*_i = \gamma_0 + \sum_{p=6}^{8} \mathit{Age}p_i\,\gamma_p + \mathit{AssetIndex}_i\,\gamma_4 + \mathit{Protestant}_i\,\gamma_5 + \mathit{SiteCI}_i\,\gamma_6 - U_{Si}\]
Exclusion restrictions: \(\mathit{Agep}\) dummies
Exclusion restriction
\(\mathit{Agep}\) affects selection probability but not aspirations directly — valid if aspirations depend on current characteristics, not age at program arrival
Other controls: Asset index (wealth), Protestant (church attendance), CI site dummy
Regressors \(X_i = (1,\, \rho_{HE,i},\, \mathit{Dist}_i,\, \tilde{X}_i)\):
Why \(\rho_{HE}\) matters:
The error terms share a common latent factor \(\theta_i\):
\[U_{Si} = -\theta_i + \varepsilon_{Si}, \qquad U_{1i} = -\alpha_1\theta_i + \varepsilon_{1i}, \qquad U_{0i} = -\alpha_0\theta_i + \varepsilon_{0i}\]
What this allows:
Normalization: \(\mathrm{Var}(\theta_i) = \mathrm{Var}(\varepsilon_{ji}) = 1\;\forall i\)
vs. IV: IV assumes \(\alpha_1 = \alpha_0\) (no selection on gains). The Roy model relaxes this. More flexible, but requires the normality assumption.
Integrate over the unobserved factor \(\theta_i\):
\[L = \prod_{i=1}^{N} \int \Pr(S_i, Y_i \mid X_i, Z_i, \theta_i)\, \phi(\theta_i)\, d\theta_i\]
where:
\[\begin{align} \Pr(Y_i=1 \mid S_i=1, X_i, \theta_i) &= \Phi(X_i\hat{\beta}_1 + \hat{\alpha}_1\theta_i) \\ \Pr(Y_i=1 \mid S_i=0, X_i, \theta_i) &= \Phi(X_i\hat{\beta}_0 + \hat{\alpha}_0\theta_i) \\ \Pr(S_i=1 \mid Z_i, \theta_i) &= \Phi(Z_i\hat{\gamma} + \theta_i) \end{align}\]
Numerical integration: Gauss-Hermite quadrature with 10 nodes — accurate approximation to the normal integral over \(\theta_i\). Standard errors via bootstrapping.
Average Treatment Effect (ATE):
\[\mathrm{ATE}(x) = \Phi\!\left(\frac{x\hat{\beta}_1}{\sqrt{1+\hat{\alpha}_1^2}}\right) - \Phi\!\left(\frac{x\hat{\beta}_0}{\sqrt{1+\hat{\alpha}_0^2}}\right)\]
Average over all children with characteristics \(x\)
Average Treatment on Treated (ATT):
\[\mathrm{ATT}(x, S=1) = \frac{1}{F_{U_S}(z\hat{\gamma})} \left[F_{U_S, U_1}(\cdot) - F_{U_S, U_0}(\cdot)\right]\]
Average only over sponsored children
Marginal Treatment Effect (MTE):
\[\begin{aligned} \mathrm{MTE}(x, u_S) &= \Pr(Y_1=1 \mid X=x, U_S=u_S) \\ &\quad - \Pr(Y_0=1 \mid X=x, U_S=u_S) \end{aligned}\]
Effect for children at the margin of selection \(u_S\)
MTE is a building block: ATE and ATT are weighted averages of MTE with appropriate weights (Heckman and Vytlacil 2007)
Results
Table 3: Selection equation (Model 1, N=403)
| Model 1 Coeff. | Model 1 Marg. effect | Model 2 Coeff. | Model 2 Marg. effect | |
|---|---|---|---|---|
| Dummy(Age 6) | 1.19*** | 0.215*** | 1.69*** | 0.257*** |
| Dummy(Age 7) | 1.63*** | 0.210*** | 1.85*** | 0.285*** |
| Dummy(Age 8) | 1.43*** | 0.259*** | 1.67*** | 0.271*** |
| Protestant | 1.31*** | 0.236*** | 1.92*** | 0.338*** |
| Asset index | -0.165** | -0.029** | -0.27*** | -0.043*** |
| Treated site | 3.313 | 0.599*** | 3.149 | 0.432*** |
Table 4: Marginal effects (Model 1, N=403). Bootstrap standard errors in parentheses.
| Spons. ME | (SE) | Non-Spons. ME | (SE) | |
|---|---|---|---|---|
| Dummy(Prospera) | 0.057 | (0.108) | 0.076 | (0.091) |
| Dummy(Male) | -0.043 | (0.074) | -0.119** | (0.055) |
| Asset Index | 0.016 | (0.024) | 0.336*** | (0.022) |
| Parental Education | 0.027** | (0.013) | 0.023** | (0.010) |
| Distance to Univ. (km) | -0.006 | (0.001) | 0.000 | (0.001) |
*** \(p<0.01\), ** \(p<0.05\), * \(p<0.1\). Model-level averages (both groups): \(E[\mathrm{ATE}(x)] = 0.016\) (SE 0.083); \(E[\mathrm{ATT}(x)] = 0.170\) (SE 0.183).
Table 4: Marginal effects (Model 2, N=271). Bootstrap standard errors in parentheses.
| Spons. ME | (SE) | Non-Spons. ME | (SE) | |
|---|---|---|---|---|
| Dummy(Prospera) | 0.112 | (0.154) | 0.005 | (0.127) |
| Dummy(Male) | -0.078 | (0.097) | -0.035 | (0.080) |
| Asset Index | -0.011 | (0.033) | 0.060** | (0.030) |
| Parental Education | 0.050*** | (0.017) | 0.024 | (0.015) |
| Distance to Univ. (km) | 0.000 | (0.001) | 0.001 | (0.001) |
| \(\rho_{HE}\) (perceived returns) | -0.113 | (0.072) | 0.016 | (0.069) |
*** \(p<0.01\), ** \(p<0.05\), * \(p<0.1\). Restricted to N=271 children who correctly interpreted probability questions. Model-level averages (both groups): \(E[\mathrm{ATE}(x)] = -0.008\) (SE 0.009); \(E[\mathrm{ATT}(x)] = 0.204\) (SE 0.180).
Average treatment effect on the treated (ATT):
How to interpret imprecision:
Back-of-envelope: If aspirations translate to behavior, a 20 pp increase in aspiration implies \(\approx\) 8 more months of schooling — consistent with Wydick et al. (2013) who find 1.03–1.46 additional years for adult outcomes
Statistical explanation:
Substantive interpretation:
Proposed mechanism: CI de facto requires school attendance as condition for continued support
\(\Rightarrow\) More time in school \(\Rightarrow\) stronger educational identity
Reading the MTE curve:
Pattern: MTE declines in \(u_S\)
Key finding from MTE:
\[\mathrm{Cov}(U_S, U_1) = \alpha_1 > 0\]
Children who are more likely to be selected are also those who benefit more from sponsorship
Policy implication for CI:
Formal result: The positive correlation between selection propensity and treatment effect means CI’s self-selected targeting is efficient — children most in need extract the greatest gains
Result: \(\rho_{HE}\) (perceived returns to higher education) is not statistically significant in either the sponsored or non-sponsored outcome equations (Model 2)
Two possible interpretations:
Related to Genicot and Ray (2017): feasibility matters as much as desirability. If a goal is perceived as unattainable, even high expected returns won’t shift aspirations.
\(\Rightarrow\) Short-term attainable milestones may be more effective than long-run income arguments
Heterogeneity Analysis
Aspirations (outcome equation):
Income expectations (Table 2):
Puzzle: Females aspire more to higher education than males — yet expect substantially lower earnings. Suggests aspirations and income expectations are driven by different mechanisms for boys and girls
Discussion & Conclusions
Why does CI have a positive (if imprecise) effect on aspirations?
Main results are robust to:
The direction of the effect (positive ATT) and the non-significance are robust across all specifications
For child sponsorship programs:
For gender equity:
Intervening early (ages 9–12) may be more productive than waiting for aspirations to be fully formed