I recently conducted statistical and econometric analysis on U.S. wine production and consumption and assessed the effects of natural disaster on these variables.

Nerd Alert! I’m sharing my recent research on wine production and the effects of natural disaster and extreme weather conditions over the past twenty years. My goals were to identify the natural pattern and see how it was disturbed by California drought and fires. The data that I used was from the Wine Institute and the Alcohol Tobacco Tax and Trade Bureau which looks at consumption and production at the national level. Bottom Line Up Front: at the national level, change in production and consumption is not evident with statistical significance. With access to California data, I feel confident that effects in production and consumption would be evident. If I haven’t lost you yet or you’re a math geek like me…here’s an overview of my report: 

U.S. Wine production seasons are somewhat predictable. Grape growth begins in the Spring and extends into the Fall. Grape harvest occurs over a short period in the Fall and leads directly into wine production. The timeline for wine production varies in length by wine varietal and type but can be complete in as little as a few months or can be a process that extends into years. The seasonality pattern of this process is evident in monthly production and storage data. By using this data, we can assess the effect that production and storage have on U.S. wine value by volume. However, when unpredictable factors, such as natural disaster or extreme weather conditions, occur in U.S. wine production regions, the effects on the U.S. wine value by volume is uncertain. In this report, I implement econometric models to depict the relationship between wine production and storage and further assess the effect of several historical natural disasters and extreme weather conditions on U.S. wine value by volume.

Production and Storage data for this report was obtained from the Alcohol, Tobacco, Tax and Trade Bureau of the U.S. Department of Treasury. Monthly reports are publically available from October 1984 until one year prior to the current date. Reports provide information on wine production and storage categorized by bulk still wine, bottled still wine, effervescent wines, and other natural wines. I chose to extract specific values for total monthly wine production and storage for the categories of bulk still wine, bottled still wine and effervescent wines as they comprised the majority of the total storage and production. Specifically, the values for bulk still wine, bottled still wine and effervescent wine production were derived from the quantity reported as “Removed from Fermenters”, originally reported in gallons which I chose to represent in the units of millions of gallons. The bulk still wine, bottled still wine and effervescent wine storage values were represented by the values reported in the section “Stocks end-of-month, Total”, which is also reported in gallons though I represented in units of millions of gallons.

Next, I incorporated data from The Wine Institute which maintains information regarding U.S. wine consumption (in millions of gallons per year), U.S. per capita wine consumption (in gallons per year), U.S. wine sales (in millions of gallons) and the total retail value of the U.S. wine sales (in billions of U.S. dollars). From this data, I calculated a value of U.S. Wine Retail Value per Gallon by dividing the total retail value of the U.S. wine sales by the gallons of U.S. wine sales. This data is reported annually from 2002 through 2016. Since the production and storage data is reported monthly, I chose to represent the annual consumption and value data as the set value evenly divided across all months in a given year. Therefore, this data appears to have a “step-wise” nature, though the recorded value is only assessed annually.

Value Per Gallon = U.S. Wine Sales (Billions of U.S. Dollars) / U.S. Wine Sales (Millions of Gallons)

Since the Wine Institute’s data is only available from 2002 until 2017 (versus the Alcohol, Tobacco, Tax and Trade Bureau’s information from 1984), I was forced to truncate my data-set to the time frame of January 2002 until December 2017. Months are reported numerically from 1 (representing January 2002) to 180(representing December 2016).

The response variable for this analysis was “valpergal”, the U.S. Dollar retail value per gallon of wine. The independent variables in estimating “valpergal” include the Bulk Still Wine Production Totals (stillbulkprd), Bulk Still Wine End of Month (EOM) Stocks (stillbulkstock), Bottled Still Wine Production Totals (stillbottleprd), Bottled Still Wine EOM Stocks (stillbottlestock), Effervescent Wine Production Totals (effprd), Effervescent Wine EOM Stocks (effstock), U.S. Wine Consumption in Millions of Gallons (uswinecon) and Per Capita U.S. Wine Consumption in Gallons (percapcon). As discussed in the Introduction of this report, production and stock totals are seasonal by nature. By viewing these totals graphically, it is evident that the data is also trended and nonstationary. In the following section, Econometric Models, I will discuss how I handled these challenges in the data.

Other concerns included colinnearity among the independent variables. For example, percapcon is a function of uswinecon. Additionally, one may suspect that storage quantities of a particular type of wine may be a function of the amount of wine produced or that the amount of a type of wine produced is highly correlated to the amount of another type of wine produced. I visually assessed for high linear correlation which lead to further calculation of correlation of concerning pairs. From this, I determined that uswinecon and percapcon are highly correlated, as well as the production and stock of bulk and bottled wines are moderately correlated with such values of effervescent wines. From this determination, I removed the percapcon data from the analysis, representing wine consumption through the uswinecon variable and removed both variables associated with effervescent wines.

It is visually evident when observing the graphed variables as time series data that there is a presence of trend and seasonality. 

Next, I developed several models to best predict valpergal values from the independent variables. I did this by developing a linear model, a model containing interaction terms, and a polynomial model. In comparing the models, the polynomial model had the highest R-squared value and statistically significant variables. I then refined the polynomial model to contain only the statistically significant variables with the statistically significant degrees of polynomials and compared this refined model to the original polynomial fit. Though the R-squared value decreased, the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were both minimized with the refined polynomial model. Therefore, I use this model to estimate the coefficients associated with the independent variables, defined as:

poly model.JPG

I confirmed that the error term (epsilon) has a zero mean and is normally distributed. 

Next, I am interested in evaluating the impact of natural disasters and extreme weather conditions. I have chosen to evaluate this by imposing the following hypothesis test:

hypothesis test effect of event.JPG

I assumed that if an event did not have an effect on a given variable, I would not see a change in the trend and if there was an effect, a change in trend would occur. I assessed this change by defining a linear model between the first time period and just prior to the event and another linear model for the first time period until just prior to the next event. If the confidence intervals developed for trends before and after the events are disjoint and do not contain overlap, we can state that the trends are not equal and consider the event as reason for the difference in trend. If the confidence intervals have overlap, we cannot state with confidence that there was a change in the trend, thus we assume that the event did not cause change in the overall trend of the variable.

I conducted this analysis on both “uswinecon” and “stillbulkprd” as these variables are two statistically significant variables in the final model chosen to model “valpergal”.

The following graphs are of all original variables plotted as time series. The data is plotted in blue with the red line indicating the time trend.



In order to detrend and deseason the variables, I calculate the First Difference. The plots below depict the First Differences of each variable.


Since the First Differences data is stationary, I then developed regression models for “valpergal”. I developed three models (Model 1, Model 2, and Model 3) and developed Model 4 as a refinement of Model 3 based its statistical summary.

Based on model comparison outputs, I adopted Model 4 to represent the data, defined as: 

model 4.JPG

I assumed that the error term in Model 4 was independently and identically distributed with a Normal distribution, zero mean and unknown variance. I confirmed this assumption by plotting the residuals of Model 4 against the quantiles of the Normal distribution and also created a density plot of the residuals.

Now that I have accepted a model for “valpergal”, I am interested in evaluating the impact of natural disaster and extreme weather conditions on “valpergal”. Specifically, I consider four events that occurred in highly concentrated wine production regions of the United States:

(1) the 2006 California heat wave (Month 48),

(2) the 2007-2009 California Drought (Months 60-84),

(3) the 2011-2014 California Drought (Months 108-132), and

(4) the September 2015 Valley Fire of Lake, Sonoma and Napa Counties (Month 164).

By visually assessing the trend lines for “valpergal”, “uswinecon” and “stillbulkprd” after these events, I determine that trend lines of “valpergal” and “uswinecon” have fluctuations that are necessary for further assessment. I chose to assess effect of natural disaster or extreme weather conditions by evaluating the trend line of the data. I assume that change in a linear trend line from prior to the event to after the event is significant in categorizing the event as an “effect” on the variable.



The calculated 95 percent confidence interval of the slope of the linear trend line is depicted in the charts below by variable. If the intervals are disjointed, I assess that there is statistical significance to determine that the variable was “effected” by the event. Overlap in intervals lacks statistical significance of difference in slope, thus the variable was not “effected” by the event.



As a result of the overlapping 95 percent confidence intervals displayed above, we lack evidence in stating that, assuming a linear trend, the natural disasters and extreme weather conditions selected effected uswinecon or valpergal variables.

Analysis of wine production, wine storage, and U.S. wine consumption data reveals the major contributing variables to Retail Value of U.S. Wine: production of Still Bulk Wine and U.S. Wine Consumption. All of the variables considered are generally controlled by producers and consumers. However, natural disasters and extreme weather conditions are beyond our control. In an attempt to understand the effects of these events on wine production and consumption, I compared the trends over time before and after events. Assuming a linear trend model, statistically significant changes in trend were not detected.

For future work with this data, it is necessary to model trends with better accuracy than a linear model and attempt to include quantitative data that may capture effects of production and consumption from natural disasters and extreme weather conditions.

This site uses Akismet to reduce spam. Learn how your comment data is processed.