Hyperion - Client

Hyperion Daily Production Model Changes

Written by Ed Muir | Nov 21, 2025 3:34:27 PM

 

Executive Summary

We've significantly enhanced our natural gas production forecasting methodology with a new three-facet hybrid approach that delivers more accurate daily production estimates across all major producing regions.

What's Changed:

Our previous methodology relied on simpler relationships between pipeline measurements and production and is still used in most subregions. For several subregions, the new framework introduces:

  1. Statistical correction for reporting delays - Accounts for the fact that state production reports start incomplete and are revised upward over time.
  2. More advanced modeling techniques - Automatically identifies the counties where state reported production is not reflected in interstate pipe scrape data and exclude them from our regression.
  3. Intelligent hybrid coverage - Where counties have poor interstate pipeline coverage, we substitute the pipe scrape model for our short term forecast to ensure we capture production.

The Bottom Line:

The new methodology produces materially different production estimates in the areas that they are used compared to our previous approach.

Production Changes by Region

We have updated regression coefficients in all subregions. Of these subregions, South Texas, West Texas, North East Pennsylvania now use the more advanced regression model as we saw significant improvement in the fit between state reported and modelled production in those regions.  (Note that there was a small change to the STF yesterday that makes some of these changes a bit different than shown, as these charts were generated prior to the STF change.)

South Texas:

The most affected region is South Texas, where a combination of long reporting lags and few correlated interstate pipelines lead to significant under reporting in the previous model. Our new model increases our production estimate by around 0.4 Bcf/d from May 2025.

West Texas:

West Texas also suffers from long reporting lags. When these reporting lags are corrected for, our production estimates increase moderately from June 2025. We use our new county-based regression model here to account for counties such as Crockett and Ward, which are poorly covered by interstate pipe flows.

North East Pennsylvania

While the majority of changes in Northeast Pennsylvania reflect updated state production data, our new county-based regression model shows significant improvement in the backtested fit. This enhanced accuracy should carry forward into future production estimates.

L48:

Looking at the lower 48 as a whole, the difference is about 500 mmcf/d recently, though this varies greatly. This is because the changes are primarily localized to Texas, which saw only a moderate increase in our production estimates. The changes prior to 2025 predominantly stem from state data updates. 

Introduction

Accurate, timely natural gas production estimates are essential for market participants, pipeline operators, and energy analysts. However, the industry faces a persistent challenge: state production data is comprehensive but slow, while pipeline measurements are fast but incomplete.

State regulatory agencies collect detailed production data from operators, but this information suffers from significant reporting lags—up to 12-14 months after the production month ends for Texas. By the time official state production numbers are received, field conditions may have shifted dramatically.

Interstate pipeline systems provide near-real-time flow schedules, but they only capture gas moving through their specific infrastructure. Production that never arrives at an interstate pipeline - like the many, major intrastates in Texas - remains invisible to these measurements.

The fundamental question we set out to answer: How can we improve our current pipe scrape derived production model to account for the shortcomings of the data collection process.

This article describes our solution: a new three-facet hybrid regression framework that addresses these challenges through a combination of corrections to the state reported production, an enhanced regression model, and substitution of the short term forecast for our pipe scrape model in specific locations.

Our Solution: A Three-Facet Hybrid Approach

Our new methodology breaks the production forecasting problem into three complementary facets:

  1. Report Rate Correction: Correct historical state production data for systematic underreporting, giving us more accurate training data.
  2. More advanced modeling techniques: Use machine learning to learn the relationship between county-level pipeline measurements and total subregional production.
  3. Hybrid Coverage: Integrate short-term forecasts for subregions with very poor pipeline monitoring to ensure complete production coverage.

Each facet addresses a specific limitation of traditional approaches, and together they enable us to produce daily production estimates that align with eventually-reported state values while maintaining real-time responsiveness to actual pipeline flows.

Let's dive into each facet.

Facet 1: Correcting for Reporting Delays

The Problem

When state agencies first publish production data for a given month, those numbers are incomplete. Early reports may only capture a fraction of actual production because operators file their data at different times—some submit promptly, others take weeks or months. Over time, as late-filing operators submit their data, state agencies revise the production numbers upward.

For example, in West Texas, we observed that the state reported production was still being adjusted upwards even some 12-18 months prior to the current report date. (Note that the progressively brighter lines are from later reporting times)

If a regression model is trained on these incomplete early reports without correction, the model learns to underestimate production. So, models are only trained on complete data - but consequently, it's very unresponsive to recent changes.

Our Approach

We developed a statistical model that captures how production numbers are revised over time. By analyzing historical revisions, we can estimate what the "true" production should be in the present given historical revision rates.

The core insight: Production reporting across Texas follows a fairly consistent exponential approach to completeness. The production for the West Texas region for the month of October 2024 as a percentage of the latest reported production is shown below for successive state reports. The production continues to be revised upwards. 

Correcting the state data using our exponential approach model, we can plot an estimate of the final reported production. The solid lines are our estimates for the true production given the historical reporting rate, dashed lines are the original state reported production. Notice that the estimated production lines converge upon a single, corrected, production value per month. Though the estimated numbers still change substantially over time, their variation is similar to state reported production variation looking back 12-14 months, giving us a more-responsive estimate of production.

Why This Matters

Corrected historical data can be used to increase the recency of state production values we can regress our pipeline model on. The relationship between the pipeline flows and state production can change over time, therefore having more up to date data points to regress upon will improve our regression's accuracy significantly. This correction is essential in Texas, where there is significant reporting lag, and significant changes in production.  We use this approach for every subregion in TX.

Facet 2: County-Level Pipeline Regression

The Spatial Challenge

The simple approach of summing all pipeline measurements to predict total sub-regional production works well when most production reaches pipelines within the subregion. However, in some regions with poor pipeline coverage, production changes in certain counties aren't captured by changes in pipeline flows. To address this, we use a model to identify and weight the counties whose pipeline measurements correlate most strongly with the state total, effectively discounting counties with poor correlation. For convenience, we aggregate at the county level.

Machine Learning with Lasso Regression

We use Lasso regression (Least Absolute Shrinkage and Selection Operator), a machine learning technique particularly well-suited to this problem. Unlike standard regression, Lasso performs automatic feature selection by shrinking less important predictors to exactly zero.

Here's what this means in practice: We feed the model pipeline measurements from all the counties in a sub-region. Lasso automatically figures out which counties have the strongest, most reliable relationships with total production. Counties with weak or noisy pipeline signals get assigned zero weight and effectively drop out of the model.

The technical approach:

  1. Data preparation: For each month, we have corrected state production (from Facet 1) and county-level pipeline measurements.
  2. Standardization: We normalize all values to have comparable scales, theoretically preventing counties with larger absolute flows from dominating (though in practice counties with larger flows often dominate because they represent so much of the result).
  3. Time series cross-validation: We train the model on progressively more historical data and validate on future periods (never training on future data to predict the past).
  4. Automatic regularization: The model selects the optimal level of sparsity through cross-validation.

South Texas illustrates how lasso regression addresses sparse pipeline coverage. The first chart shows the challenge: plotting total pipeline gas against state-reported production reveals that (A) pipeline measurements capture only a fraction of actual production (~1-3 Bcf/d versus 5.5-7.5 Bcf/d), and (B) the simple linear relationship is weak (R²=0.35).

The second chart shows the lasso regression results. Our model achieves much stronger predictive power (R²=0.89) by selectively weighting counties whose pipeline data correlates with state totals. Note that the state-reported values in this chart are slightly lower than in the first chart because we've excluded certain counties (including De Witt, McMullen, and Lavaca) from the regression. In these counties, production changes aren't visible in interstate pipeline measurements, so we replace them with short-term forecasts from Facet 3 instead.

Why Lasso Over Alternatives?

We considered several approaches:

  • Standard linear regression: Can't handle the fact that many counties contribute little signal, leads to overfitting
  • Ridge regression: Shrinks coefficients but never sets them to exactly zero, making interpretation harder
  • Manual county selection: Requires expert judgment for each sub-region and doesn't adapt as infrastructure changes

Lasso gives us the best of both worlds: automatic, data-driven feature selection with interpretable results (specific counties and their weights).  We use this for South TX, and West TX subregions.

Facet 3: Filling the Coverage Gaps

The Hybrid Strategy

Even with sophisticated regression, we face a fundamental limitation: some of the most productive counties have no production that ever hits an interstate pipeline. If we only use the regression model, we'd systematically underestimate total production by missing these counties. We need a way to account for production in counties without pipeline visibility.

Below the median county pipe flow to county production ratio is plotted per sub-region for the lower 48. Ideally, we'd want 1.0 for every county. For some states such as Ohio far more gas flows through interstate pipes than is produced in the region due to gas crossing from PA (which is handled by the county level regression described in Facet 2). For other states, such as Texas, the interstate pipeline flows are only a small percentage (~20-30%) of the overall state production, and even less in some subregions. For some counties within Texas, such as DeWitt and its surrounding Eagle Ford counties, there are no interstate pipe flows and the flows for the surrounding counties appear to be entirely insensitive to changes in production within the county.

Integration with Short-Term Forecasts

Our solution is to use a hybrid approach:

  • Counties with at least minimal pipeline coverage: Use the regression model (Facet 2) to estimate production based on real-time measurements
  • Counties without pipeline coverage: Use short-term forecast (STF) estimates based on observed completions and well-level decline curves.

We leverage real-time data where it's available and supplement with completion-based forecasts where it's not. The regression model learns to predict production from monitored counties, while STF fills in the unmonitored counties.

The technical implementation:

During model training, we explicitly exclude certain counties from “both sides” of the regression - the known production, and the pipeline data “input”. The model learns the relationship between pipeline flows and total production minus these excluded counties. During prediction, we:

  1. Apply the regression to current pipeline measurements (monitored counties)
  2. Add the STF contribution from excluded counties
  3. Sum to get total sub-regional production

Why This Works

This hybrid approach is superior to alternatives:

  • Using STF for all counties: Would ignore valuable real-time pipeline signals
  • Extrapolating regression to all counties: Would risk overfitting to noise in counties without true coverage
  • Ignoring uncovered counties: Would systematically underestimate production

By combining observational data (pipeline) with model-based estimates (STF) in a principled way, we get both accuracy and completeness.

One disadvantage of this approach is that we lose the difference between STF and production model, which can detect wells being withheld. But, given that this will only be used selectively, in subregions with truly awful coverage, it's a worthwhile tradeoff.  Finally, we use this facet only in South TX, West TX, and Haynesville TX currently.

This Adds up to Improved Accuracy

Compared to simpler approaches (e.g., fixed scaling factors between pipeline flows and production), our three-facet framework delivers:

  • Better alignment with eventually-reported state values through report rate correction
  • More robust predictions through automatic feature selection and outlier handling
  • Complete coverage by integrating both pipeline and forecast data sources

Conclusion

Natural gas production monitoring requires balancing multiple imperfect data sources. State reports are complete but slow. Pipeline measurements are fast but incomplete. Short-term forecasts provide coverage but lack real-time signals.

Our new three-facet hybrid framework synthesizes these sources intelligently:

  1. Correct historical data for reporting delays to build accurate training datasets
  2. Learn county-specific relationships between pipeline flows and production using machine learning
  3. Combine regression and forecasts to ensure complete geographical coverage

The result is a production monitoring system that delivers daily estimates with the accuracy of eventually-reported state values and the timeliness of real-time pipeline data.

For natural gas market participants, pipeline operators, and analysts requiring up-to-date production intelligence, this methodology provides a significant improvement over traditional approaches. As we continue to refine and enhance the framework, we expect accuracy and responsiveness to improve further.

Questions?

We're always interested in discussing our methodology with clients and partners. If you have questions about how these models work, how to access the data, or how they might apply to your specific use case, please contact support@synmax.com.

Note: This article describes production monitoring methodology as of November 2025. Specific model parameters, training procedures, and validation metrics may evolve as we continue to enhance the framework.

Appendix

The table below shows subregions that use at least one of the three facets described above. All other subregions use a simpler regression model based on total pipeline measurements, with coefficients updated using the latest state-reported production data.

Subregion County level regression STF replacement for insensitive counties Report rate correction applied
AL ✔️    
Central - TX ✔️    
Haynesville - TX   ✔️ ✔️
KS ✔️    
NE PA ✔️    
N LA ✔️    
North - TX ✔️    
South - TX ✔️ ✔️ ✔️
SW PA ✔️    
West - TX ✔️ ✔️ ✔️