Methods

This paper combines two approaches that are rarely used together: a structural procurement auction model to recover unobserved firm costs from observed bids, and stochastic frontier analysis (SFA) to measure how efficiently each firm uses its inputs relative to the production frontier. This section explains both approaches in detail, including the assumptions they require and the estimation challenges they pose.

Step 1

Structural Auction Model

Observed bids → pseudo-costs
GPV nonparametric approach,
asymmetric bidders

→

Step 2

Stochastic Frontier Analysis

Pseudo-costs + input prices → efficiency index
Half-normal and truncated-normal models

Key contribution: Pseudo-costs from Step 1 serve as the cost variable in Step 2, circumventing the need for direct observation of firm costs, which is typically unavailable in procurement data.

Step 1: Structural Procurement Auction Model

The Identification Problem

The fundamental challenge in procurement auction analysis is that firm costs are unobserved. We observe the bids that firms submit, but not the actual cost each firm would incur to complete the contract. Without knowing costs, we cannot say whether a winning firm is genuinely efficient or simply lucky in the strategic interaction.

A reduced-form approach, simply comparing winning bids across firm types, confounds cost differences with strategic behavior. A Type 1 firm that bids higher than a Type 0 firm might be doing so because: (a) it has higher costs, (b) it has lower costs but shades its bid more aggressively knowing it faces less competition, or (c) some combination of both. The structural model separates these two channels by exploiting the equilibrium relationship between bidding strategies and cost distributions.

Setup and Assumptions

The government runs a first-price sealed-bid procurement auction. There are \(n\) risk-neutral bidders partitioned into two types:

Type 1 (\(n_1\) bidders): firms that have received at least one contract by direct allocation or I3P invitation at some point during the sample period
Type 0 (\(n_0\) bidders): firms that have only ever competed in public auctions, with \(n_0 + n_1 = n\)

Each firm \(i\) of type \(j \in \{0, 1\}\) privately observes its own cost \(c_{ji}\), drawn independently from \(F_j(\cdot)\), a continuous distribution on \([\underline{c}, \bar{c}]\).

The independent private values (IPV) assumption

The model assumes independent private values: each firm knows only its own cost, and knowing a rival’s cost would not change its own cost assessment. This is the standard assumption in structural auction models and is particularly well-suited to this setting.

For small, homogeneous pavement contracts — paving 8–9 city blocks with hydraulic concrete — there is little room for a “common value” component. The value of a pavement contract is largely determined by each firm’s own input costs and operational efficiency, not by uncertain qualities of the contract itself. Contrast this with oil lease auctions, where knowing a rival’s seismic estimates would update your own assessment of the reserve value — that is the classic common value case (Wilson, 1969; Milgrom and Weber, 1982). For pavement, each firm’s cost is largely idiosyncratic.

The IPV assumption also rules out ex-post regret about one’s own bid: if you win, you pay what you bid and deliver the project at your private cost. There is no “winner’s curse” in the traditional sense.

Equilibrium Characterization

At a Bayesian Nash Equilibrium, each firm chooses a bid \(b_{ji}\) to maximize its expected profit:

\[\pi_{ji} = (b_{ji} - c_{ji}) \cdot \Pr(\text{firm } i \text{ wins} \mid b_{ji})\]

In a first-price procurement auction, the firm submitting the lowest bid wins (the government awards to the cheapest supplier). Thus:

\[\Pr(i \text{ wins} \mid b_{ji}) = \Pr(\text{all rivals bid above } b_{ji})\]

At equilibrium, each firm uses a bidding strategy \(s_j(\cdot)\) — a strictly increasing function mapping its private cost into a bid — and the equilibrium strategies are mutually best responses. Several properties hold at equilibrium:

Bidding above cost: \(b_{ji} > c_{ji}\) always, as firms shade their bids upward to earn a markup
Monotonicity: \(s_j(\cdot)\) is strictly increasing, meaning lower-cost firms bid lower
Type-specific strategies: because \(F_0 \neq F_1\), the equilibrium strategies \(s_0\) and \(s_1\) differ

The existence and uniqueness of such equilibria in asymmetric auctions is non-trivial and was established by Maskin and Riley (2003) under regularity conditions satisfied in this setting.

Differentiating the expected profit with respect to \(b_{1i}\) for Type 1 firms and \(b_{0i}\) for Type 0 firms yields the first-order conditions. For \(i = 1, \ldots, n_1\):

\[(b_{1i} - c_{1i})\left\{(n_1-1)\frac{f_1[s_1^{-1}(b_{1i})]}{1-F_1[s_1^{-1}(b_{1i})]}\frac{1}{s_1'[s_1^{-1}(b_{1i})]} + n_0\frac{f_0[s_0^{-1}(b_{1i})]}{1-F_0[s_0^{-1}(b_{1i})]}\frac{1}{s_0'[s_0^{-1}(b_{1i})]}\right\} = 1 \tag{1}\]

and for \(i = 1, \ldots, n_0\):

\[(b_{0i} - c_{0i})\left\{n_1\frac{f_1[s_1^{-1}(b_{0i})]}{1-F_1[s_1^{-1}(b_{0i})]}\frac{1}{s_1'[s_1^{-1}(b_{0i})]} + (n_0-1)\frac{f_0[s_0^{-1}(b_{0i})]}{1-F_0[s_0^{-1}(b_{0i})]}\frac{1}{s_0'[s_0^{-1}(b_{0i})]}\right\} = 1 \tag{2}\]

This system does not have a closed-form solution. The indirect approach of Flambard and Perrigne (2006) avoids having to solve it explicitly.

The GPV Inversion Formula

The core identification result is due to Guerre, Perrigne, and Vuong (2000), extended to asymmetric bidders by Flambard and Perrigne (2006). Using the one-to-one mapping between equilibrium strategies and bid distributions — \(G_j(b) = F_j(s_j^{-1}(b))\) and \(g_j(b) = f_j(s_j^{-1}(b))/s_j'(s_j^{-1}(b))\) — the system of first-order conditions can be rewritten directly in terms of the observable bid distributions \(G_j(\cdot|n_1,n_0)\) and \(g_j(\cdot|n_1,n_0)\), eliminating the need to solve for the equilibrium strategies \(s_j(\cdot)\).

For a Type 1 firm (\(i = 1, \ldots, n_1\)), the pseudo-cost is recovered by:

\[c_{1i} = b_{1i} - \frac{1}{(n_1 - 1)\,\dfrac{g_1(b_{1i}|n_1,n_0)}{1 - G_1(b_{1i}|n_1,n_0)} + n_0\,\dfrac{g_0(b_{1i}|n_1,n_0)}{1 - G_0(b_{1i}|n_1,n_0)}} \tag{3}\]

For a Type 0 firm (\(i = 1, \ldots, n_0\)), the pseudo-cost is:

\[c_{0i} = b_{0i} - \frac{1}{n_1\,\dfrac{g_1(b_{0i}|n_1,n_0)}{1 - G_1(b_{0i}|n_1,n_0)} + (n_0 - 1)\,\dfrac{g_0(b_{0i}|n_1,n_0)}{1 - G_0(b_{0i}|n_1,n_0)}} \tag{4}\]

Here \(G_j(\cdot|n_1,n_0)\) and \(g_j(\cdot|n_1,n_0)\) are the bid CDF and PDF of type \(j\) firms, conditional on the number of bidders of each type. Note that the two formulas are not symmetric: in equation (3) the rival type 0 terms enter with weight \(n_0\) (all type 0 rivals), while in equation (4) the type 0 hazard rate enters with weight \((n_0-1)\) (excluding firm \(i\) itself). Both formulas deliver pseudo-costs \(c_{ji}\) that converge to true costs under the model’s assumptions as the sample grows.

Intuition for the inversion formula

Equations (3) and (4) say: cost = bid minus markup. The markup is the term being subtracted — the reciprocal of the denominator. A firm bids above its cost, and the formula tells us exactly how much above, so we can back out the cost.

The denominator measures how many rivals are bidding just above your bid level. More precisely, each term \(g_j(b)/(1-G_j(b))\) is the hazard rate of type \(j\) rivals’ bids at \(b\): the conditional density of a type \(j\) rival bidding at \(b\), given that they bid at least \(b\). Multiplied by the number of rivals of that type and summed across both types, the denominator gives the total rate at which rivals are clustered just above \(b\). When that rate is high, raising your bid even slightly risks losing the contract, so the optimal markup is small. When rivals are sparse near \(b\), you can afford a larger markup. The optimal markup is the reciprocal of this competitive pressure.

Since the bid \(b_{ji}\) is directly observed, and \(G_j(\cdot)\) and \(g_j(\cdot)\) are estimated from the observed bids of all firms in the sample, every term on the right-hand side is either observable or estimable — which is why we can compute a cost estimate \(\hat{c}_{ji}\) for every bidder \(i\) of every type \(j\).

Nonparametric Estimation

The estimation proceeds in three steps:

Step 1: Estimate bid distributions. Consider \(L\) auctions indexed by \(\ell = 1, \ldots, L\). Let \(\mathbf{z}_\ell\) be a vector of covariates characterizing auction \(\ell\) (cubic meters of concrete, sewage dummy), observed by the econometrician. In auction \(\ell\), there are \(n_{1\ell}\) Type 1 bidders submitting bids \(\{b_{1p\ell}\}_{p=1}^{n_{1\ell}}\) and \(n_{0\ell}\) Type 0 bidders submitting bids \(\{b_{0q\ell}\}_{q=1}^{n_{0\ell}}\).

The bid CDFs are estimated by:

\[\hat{G}_1(b \mid \mathbf{z}, n_1, n_0) = \frac{\displaystyle\sum_{\substack{\ell:\, n_{1\ell}=n_1,\, n_{0\ell}=n_0}} \frac{1}{n_1}\sum_{p=1}^{n_{1\ell}} \mathbf{1}(b_{1p\ell} \leq b)\; K_G\!\!\left(\frac{\mathbf{z}-\mathbf{z}_\ell}{h_{zG}}\right)}{\displaystyle\sum_{\ell:\, n_{1\ell}=n_1,\, n_{0\ell}=n_0} K_G\!\!\left(\frac{\mathbf{z}-\mathbf{z}_\ell}{h_{zG}}\right)}\]

\[\hat{G}_0(b \mid \mathbf{z}, n_1, n_0) = \frac{\displaystyle\sum_{\substack{\ell:\, n_{1\ell}=n_1,\, n_{0\ell}=n_0}} \frac{1}{n_0}\sum_{q=1}^{n_{0\ell}} \mathbf{1}(b_{0q\ell} \leq b)\; K_G\!\!\left(\frac{\mathbf{z}-\mathbf{z}_\ell}{h_{zG}}\right)}{\displaystyle\sum_{\ell:\, n_{1\ell}=n_1,\, n_{0\ell}=n_0} K_G\!\!\left(\frac{\mathbf{z}-\mathbf{z}_\ell}{h_{zG}}\right)}\]

The corresponding bid densities are:

\[\hat{g}_1(b \mid \mathbf{z}, n_1, n_0) = \frac{\dfrac{1}{h_{1g}}\displaystyle\sum_{\substack{\ell:\, n_{1\ell}=n_1,\, n_{0\ell}=n_0}} \frac{1}{n_1}\sum_{p=1}^{n_{1\ell}} K_g\!\!\left(\frac{b - b_{1p\ell}}{h_{1g}},\; \frac{\mathbf{z}-\mathbf{z}_\ell}{h_{zg}}\right)}{\displaystyle\sum_{\ell:\, n_{1\ell}=n_1,\, n_{0\ell}=n_0} K_g\!\!\left(\frac{\mathbf{z}-\mathbf{z}_\ell}{h_{zg}}\right)}\]

\[\hat{g}_0(b \mid \mathbf{z}, n_1, n_0) = \frac{\dfrac{1}{h_{0g}}\displaystyle\sum_{\substack{\ell:\, n_{1\ell}=n_1,\, n_{0\ell}=n_0}} \frac{1}{n_0}\sum_{q=1}^{n_{0\ell}} K_g\!\!\left(\frac{b - b_{0q\ell}}{h_{0g}},\; \frac{\mathbf{z}-\mathbf{z}_\ell}{h_{zg}}\right)}{\displaystyle\sum_{\ell:\, n_{1\ell}=n_1,\, n_{0\ell}=n_0} K_g\!\!\left(\frac{\mathbf{z}-\mathbf{z}_\ell}{h_{zg}}\right)}\]

where \(K_G(\cdot)\) and \(K_g(\cdot)\) are kernel functions, and \(h_{zG}\), \(h_{zg}\), \(h_{1g}\), \(h_{0g}\) are optimal bandwidths. The key feature of these estimators is that the sums in the denominators run only over auctions with exactly \(n_{1\ell} = n_1\) and \(n_{0\ell} = n_0\), conditioning on the same competitive environment as the auction being evaluated. A triweight kernel \(K(u) = \tfrac{35}{32}(1-u^2)^3\,\mathbf{1}(|u|\leq 1)\) is used for the continuous variable (concrete volume), and the Aitchison-Aitken (1976) kernel for the sewage dummy.

Bandwidth selection and the boundary bias problem

Bandwidth selection is a critical practical choice in kernel estimation. An overly wide bandwidth produces a smooth but biased estimate; an overly narrow bandwidth produces a noisy but unbiased estimate. The optimal bandwidth is chosen following Guerre, Perrigne, and Vuong (2000), with a correction for the triweight kernel following Härdle (1991).

A more serious problem is boundary bias: near the lower boundary of the bid support (\(\underline{b}\)), the kernel function extends below the support, causing the density estimator to underestimate the true density. Two strategies address this:

Log transformation: bids are log-transformed before kernel estimation. This maps the support from \([\underline{b}, \infty)\) to \((-\infty, \infty)\), eliminating the lower boundary problem and also reducing right-skewness.
10% boundary trimming: the 10% of observations at the tails of the bid distribution (combined across both ends) are excluded from the inversion step. This removes the region where boundary effects are most severe, at the cost of losing information about the tails. An alternative that avoids trimming altogether is the boundary-corrected kernel estimator proposed by Hickman and Hubbard (2015), which directly adjusts the kernel near the support boundary.

Step 2: Recover pseudo-costs. Plugging the estimated distributions into the type-specific inversion formulas (3) and (4) yields the sample analogues. For each Type 1 bid \(b_{1p\ell}\) in auction \(\ell\):

\[\hat{c}_{1p\ell} = b_{1p\ell} - \frac{1}{(n_1-1)\,\dfrac{\hat{g}_1(b_{1p\ell}\mid\mathbf{z}_\ell, n_1, n_0)}{1 - \hat{G}_1(b_{1p\ell}\mid\mathbf{z}_\ell, n_1, n_0)} + n_0\,\dfrac{\hat{g}_0(b_{1p\ell}\mid\mathbf{z}_\ell, n_1, n_0)}{1 - \hat{G}_0(b_{1p\ell}\mid\mathbf{z}_\ell, n_1, n_0)}} \tag{5}\]

and for each Type 0 bid \(b_{0q\ell}\):

\[\hat{c}_{0q\ell} = b_{0q\ell} - \frac{1}{n_1\,\dfrac{\hat{g}_1(b_{0q\ell}\mid\mathbf{z}_\ell, n_1, n_0)}{1 - \hat{G}_1(b_{0q\ell}\mid\mathbf{z}_\ell, n_1, n_0)} + (n_0-1)\,\dfrac{\hat{g}_0(b_{0q\ell}\mid\mathbf{z}_\ell, n_1, n_0)}{1 - \hat{G}_0(b_{0q\ell}\mid\mathbf{z}_\ell, n_1, n_0)}} \tag{6}\]

These \(\hat{c}_{jp\ell}\) values are the pseudo-costs: not directly observed, but recovered by inverting the equilibrium condition using only observable bids and estimated distributions. Under the model’s assumptions, \(\hat{c}_{jp\ell} \to c_{jp\ell}\) as the sample grows.

Step 3: Estimate cost distributions. Use kernel density estimation on the recovered pseudo-costs \(\{\hat{c}_{ji}\}\) to obtain \(\hat{f}_j(\cdot)\). These cost distributions are the structural objects of interest.

Step 2: Stochastic Frontier Analysis

Efficiency in Production: The Conceptual Framework

Even if we could observe a firm’s actual cost directly, that cost tells us little about whether the firm is using its inputs efficiently. A firm may face high costs simply because wages or materials are expensive in its region; that is not inefficiency. Technical inefficiency refers to excess input use given the firm’s output and the prices it faces: a firm is technically inefficient if it uses more inputs than necessary to produce a given output.

The concept of a production frontier is that there exists a minimum cost achievable by the best-practice firm given its output level and input prices. The distance between a firm’s actual cost and the frontier cost is a measure of inefficiency. The challenge is that the frontier is unobserved and must be estimated from the data.

Ordinary least squares (OLS) on a cost function:

\[\ln C_i = \alpha + \boldsymbol{\beta}' \mathbf{x}_i + \varepsilon_i\]

estimates the average relationship between inputs and costs, not the frontier. The intercept \(\alpha\) absorbs both the frontier and the average inefficiency level, making it impossible to recover firm-level efficiency from residuals.

The Stochastic Frontier Model

The SFA approach decomposes the cost deviation from the frontier into two components:

\[\ln C^a_i = \underbrace{\ln C^*(w_i, z_i)}_{\text{frontier cost}} + \underbrace{\eta_i}_{\substack{\text{technical} \\ \text{inefficiency}}} + \underbrace{\varepsilon_i}_{\substack{\text{noise}}}\]

where: - \(C^*(w_i, z_i)\) is the minimum achievable cost for a firm with output \(z_i\) (cubic meters of concrete) and input prices \(w_i\) (concrete, wages, machinery rental) - \(\eta_i \geq 0\) is the inefficiency term, the proportional excess cost due to overuse of inputs; \(e^{\eta_i}\) is the factor by which costs exceed the frontier - \(\varepsilon_i \sim \mathcal{N}(0, \sigma_\varepsilon^2)\) captures random noise, including idiosyncratic shocks, measurement error, and other factors outside the firm’s control

The inequality \(\eta_i \geq 0\) is what distinguishes SFA from standard regression: costs can only be at or above the frontier, never below. The key insight is that \(\eta_i\) and \(\varepsilon_i\) have different distributions and are therefore separately identified (in principle) from the skewness of the composite error \(\varepsilon_i + \eta_i\).

The Frontier Cost Function

The frontier \(\ln C^*(w_i, z_i)\) is approximated by a translog specification — the standard flexible functional form for cost functions, which nests the Cobb-Douglas as a special case. Using three inputs (\(w_1\) = daily wage, \(w_2\) = price of concrete, \(w_3\) = daily machinery rent) and imposing homogeneity of degree one in input prices by normalizing on \(w_1\), the estimated cost equation is:

\[\begin{aligned} \ln\!\left(\frac{\hat{c}_i}{w_{1,i}}\right) =\;& \beta_0 + \beta_z \ln z_i + \beta_2 \ln\!\left(\frac{w_{2,i}}{w_{1,i}}\right) + \beta_3 \ln\!\left(\frac{w_{3,i}}{w_{1,i}}\right) \\[4pt] &+ \tfrac{1}{2}\beta_{zz}(\ln z_i)^2 + \tfrac{1}{2}\beta_{22}\!\left[\ln\!\left(\frac{w_{2,i}}{w_{1,i}}\right)\right]^{\!2} + \tfrac{1}{2}\beta_{33}\!\left[\ln\!\left(\frac{w_{3,i}}{w_{1,i}}\right)\right]^{\!2} \\[4pt] &+ \tfrac{1}{2}\beta_{23}\ln\!\left(\frac{w_{2,i}}{w_{1,i}}\right)\ln\!\left(\frac{w_{3,i}}{w_{1,i}}\right) + \beta_{2z}\ln\!\left(\frac{w_{2,i}}{w_{1,i}}\right)\ln z_i + \beta_{3z}\ln\!\left(\frac{w_{3,i}}{w_{1,i}}\right)\ln z_i \\[4pt] &+ \beta_{K+1}\,\text{Controls}_i + \eta_i + \nu_i \end{aligned} \tag{11}\]

Symmetry requires \(\beta_{kk'} = \beta_{k'k}\). Homogeneity of degree one in input prices further requires \(\sum_k \beta_k = 1\), \(\sum_k \beta_{kk'} = 0\) for all \(k'\), and \(\sum_k \beta_{kz} = 0\) — restrictions imposed during estimation by the normalization on \(w_1\). Controls include state fixed effects and a sewage work dummy. The noise term \(\nu_i\) can be interpreted as coming from the error in pseudo-cost estimation.

Distributional Assumptions on Inefficiency

The model is estimated under two alternative specifications for \(\eta_i\):

Model	Distribution of \(\eta_i\)	Parameters	Flexibility
Half-normal	\(\eta_i \sim \lvert \mathcal{N}(0, \sigma_u^2) \rvert\)	\(\sigma_u\)	Restricted: mean inefficiency is \(\sigma_u \sqrt{2/\pi}\), a fixed function of spread
Truncated-normal	\(\eta_i \sim \mathcal{N}^+(\mu, \sigma_u^2)\)	\(\mu, \sigma_u\)	More general: \(\mu\) freely estimated; half-normal is the \(\mu = 0\) special case

The half-normal model is simpler and more commonly used. The truncated-normal model allows the modal inefficiency to differ from zero, which is important if most firms in the sample are systematically far from the frontier. Both are estimated and compared for robustness. The specification follows Kumbhakar, Wang, and Horncastle (2015).

Maximum likelihood estimation

Error term specifications. The statistical noise is normally distributed, \(\nu_i \sim N(0, \sigma_\nu^2)\), and the inefficiency term follows a truncated normal whose mean depends on observable firm characteristics \(W\) (firm type, election-year indicator, and municipal–state party alignment):

\[\eta_i \sim N^+\!\bigl(\mu(W),\, \sigma_\eta^2\bigr)\]

The composite error is \(\epsilon_i = \eta_i + \nu_i\).

Likelihood function. Given these parametric assumptions, the log-likelihood for observation \(i\) is (equation 12):

\[L_i = -\frac{1}{2}\ln(\sigma_\eta^2 + \sigma_\nu^2) + \ln\phi\!\left(\frac{\mu(W) - \epsilon_i}{\sqrt{\sigma_\eta^2 + \sigma_\nu^2}}\right) + \ln\Phi\!\left(\frac{\mu_{*i}}{\sigma_*}\right) + \ln\Phi\!\left(\frac{\mu(W)}{\sigma_\eta}\right)\]

where the auxiliary quantities are:

\[\mu_{*i} = \frac{\sigma_\nu^2\,\mu(W) + \sigma_\eta^2\,\epsilon_i}{\sigma_\nu^2 + \sigma_\eta^2}, \qquad \sigma_* = \frac{\sigma_\nu^2\,\sigma_\eta^2}{\sigma_\nu^2 + \sigma_\eta^2}\]

Efficiency recovery. Once the model is estimated by MLE, the firm’s excess use of inputs is approximated by \(E[\eta_i \mid \epsilon_i]\), and the efficiency index by \(E[\exp(-\eta_i) \mid \epsilon_i]\), both evaluated at the estimated parameters. A full derivation is provided in the paper’s Appendix (subsection 6.3).

The Efficiency Index

The Battese and Coelli efficiency index is reported as the primary result:

\[BC_i = e^{-\hat{\eta}_i} \in (0, 1]\]

Interpretation: \(BC_i = 1\) means the firm is on the cost frontier — its actual cost equals the minimum achievable cost. \(BC_i = 0.60\) means that the minimum cost is 60% of the firm’s actual cost, so the firm could reduce its costs by 40% if it operated at the frontier. Values closer to 1 indicate less inefficiency.

The paper reports mean \(BC\) by firm type, testing whether Type 1 (government-selected) and Type 0 (public-auction-only) firms have statistically different efficiency levels.

Connecting the Two Steps

The Methodological Contribution

The key innovation is using pseudo-costs from Step 1 as the dependent variable in Step 2.

Normally, SFA requires actual cost data, such as accounting records or cost reports. In procurement contexts, such data is almost never available: firms do not disclose their project-level costs, and governments typically record only contract values (bids), not actual expenditures. This has historically limited the joint application of auction theory and efficiency analysis.

By recovering pseudo-costs \(\hat{c}_{ji}\) from the structural auction model, we obtain a proxy for \(C^a_i\) that can be plugged into the SFA framework. The model then estimates:

\[\ln \hat{c}_{ji} = \ln C^*(w_{ji}, z_{ji}) + \eta_{ji} + \varepsilon_{ji}\]

treating pseudo-costs as the observable cost measure.

Validity and Potential Concerns

This approach is valid under two conditions:

Consistency of pseudo-costs: The GPV inversion formula recovers true costs consistently as the sample size grows. In finite samples, kernel estimation introduces bias, particularly near the boundaries of the bid support. The boundary trimming and log transformation discussed above mitigate, but do not eliminate, this concern.
Classical measurement error: If pseudo-costs differ from true costs by a classical measurement error (mean zero, independent of explanatory variables), the SFA estimates remain consistent. Non-classical measurement error, i.e., systematic bias correlated with firm type, would bias the efficiency comparison.

What could go wrong

The most important potential concern is that pseudo-costs may be systematically biased for one firm type. If, for example, Type 1 firms submit bids in a thinner part of the distribution where kernel estimation is less precise, their pseudo-costs would have larger measurement error than Type 0. This would artificially inflate the variance of their efficiency distribution. The robustness checks using both half-normal and truncated-normal specifications, and the trimming of extreme observations, are partly designed to address this concern.

Another concern is model misspecification at the auction level: if firms do not play Bayesian Nash Equilibrium, perhaps due to collusion or bounded rationality, the inversion formula does not recover true costs. The paper addresses this indirectly by examining the KS tests for distributional equality and the reduced-form bid regressions as consistency checks.

Why Not Just Compare Bids?

A natural question is: why not simply compare bids between Type 0 and Type 1 firms rather than going through the two-step procedure?

The answer is that bids confound costs and strategy. Type 1 firms bid 8.5% higher on average in the same auctions — but this could reflect: (a) higher costs, (b) lower competitive pressure (fewer rivals of their type), or (c) anticipation of future contract awards that reduces their incentive to bid aggressively. Only by recovering pseudo-costs can we separate the cost component from the strategic markup. And only by applying SFA to pseudo-costs can we further separate genuine cost efficiency from random cost variation.

Summary of Modeling Choices

Choice	Decision	Rationale
Value model	Independent private values	Homogeneous projects; no common uncertainty
Bidder types	Two (government-selected vs. public-only)	Justified by KS test and institutional analysis
Identification	GPV nonparametric inversion	No parametric assumption on cost distribution
Asymmetry	Flambard-Perrigne extension	Type-specific bid distributions differ (KS test)
Kernel	Triweight, log-transformed bids	Bounded support; reduces boundary bias
Trimming	10% total (combined tails)	Reduces kernel boundary effects
Cost frontier	Translog log-cost function	Standard flexible form for cost functions
Inefficiency distribution	Half-normal + truncated-normal	Robustness check; half-normal is baseline
Efficiency index	Battese-Coelli \(BC = e^{-\hat{\eta}}\)	Standard in SFA literature