Most warehouse operations that use external labor benchmarks are using WERC data. The Warehouse Education and Research Council publishes performance standards that the industry has relied on for years. If you’ve used them, you’re in good company.
Here’s the problem: WERC benchmarks are based on a survey, not observed operational data. A survey. With roughly 200 respondents.
Two hundred people, self-reporting their performance metrics, across an industry that processes billions of orders a year. That’s the ruler much of the warehouse industry uses to measure itself.
Why It Matters
Survey-based benchmarks have three structural problems that no amount of careful survey design can eliminate.
First, self-reported data skews optimistic. Operations report how they perform on a good day, or before a difficult period, or what they aspire to. Observed scan data from actual transactions doesn’t have this problem.
Second, averages obscure distribution. A benchmark for average orders-per-hour in batch picking doesn’t tell you how much that number varies by order complexity, facility size, industry vertical, or pick method. An average built from 200 self-reports across all of those variables tells you very little about how your specific operation should perform.
Third, survey benchmarks are static. They’re published periodically and represent a point-in-time snapshot. They don’t update as your operation changes or as industry performance evolves.
What a Better Benchmark Looks Like
The alternative is platform data — actual scan-level transaction data from operations doing the same work you do, processed by AI that can slice it at the right level of specificity.
Deposco’s Labor Intelligence benchmarks are built from 60 million tracked labor hours across 5,500+ brands. The benchmark for batch picking isn’t a survey average — it’s the actual 90th percentile performance of workers doing batch picking of the same order types on the same platform. AI models those relationships at a granularity no survey could match: your batch pick performance for multiline, multi-quantity orders is compared to the platform’s batch pick performance for multiline, multi-quantity orders specifically. Not a blended average. The right comparison for your exact situation.
And unlike a static survey, the benchmark updates continuously as new data accumulates. You’re always comparing yourself to how your actual peers perform today, not how a sample of them said they performed a year ago.
The Practical Implication
If you’re using WERC benchmarks today, you’re not doing anything wrong. They’re the best widely available option for most of the industry. But you should understand what they are — and what they aren’t.
They’re a starting point. They’re not a precise standard for your specific operation running your specific order mix with your specific pick flows. And they don’t have the AI layer that can tell you not just how you compare, but what to do about the gap.
Two hundred survey respondents shouldn’t be setting the bar for your operation. Your actual peers on the platform — and the AI that reasons across their performance — should be.
WERC benchmarks are a fine starting point — but they’re a point-in-time survey, not your operation. Labor Intelligence benchmarks you against 60 million tracked labor hours of real scan data from peers doing your exact order types, and Felix tells you not just where the gap is, but what to do about it.