[[ Check out my Wordpress blog Context/Earth for environmental and energy topics tied together in a semantic web framework ]]

Tuesday, February 21, 2012

Rainfall Variability Solved

I will preface this entry by suggesting that sometimes the statistics are so simple that they fall in your lap.

As a premise, we want to consider whether a simple stochastic model can generate statistical patterns of rainfall events.

We assume a maximum entropy probability distribution function which assumes an average energy of rainfall rate for a storm i:

$$ P(E \mid E_i) = e^{\frac{-E}{E_i}} $$

The rationale for this is that the rainfall's energy is proportional to the rate of the rainfall, since that amount of moisture had to be held aloft by gravity.

$$ Rate_i \sim E_i $$

Yet we know that the Ei can vary from storm to storm, so we leave it as a conditional, and then set that as a maximal entropy estimator as well

$$ p(E_i) = \alpha e^{- \alpha E_i} $$

then we integrate over the conditional's range.

$$ P(E) = \int_0^\infty P(E \mid E_i) p(E_i) \,dE_i $$

This results again in an integration formula lookup, which leads to the following solution:

$$ P(E) = 2 \sqrt{\frac{E}{\bar{E}}} K_1(2 \sqrt{\frac{E}{\bar{E}}})$$

where K1 is the modified BesselK function of the second kind, in this case of order 1, which is found in any spreadsheet program (such as Excel).

Note that this is the same function that we used last week for determining wind speed distributions and that we used in TOC to determine the distribution of terrain slopes for the entire planet as well!  Again this reduces to the simplest model one can imagine -- a function with only one adjustable parameter, that of the mean energy of a sampled rainfall rate measurement. In other words the expression contains a single number that represents an average rainfall rate.  That's all we need, as maximum entropy fills in the rest.

So let us see how it works.  The analysis was compared against this recent paper:
"Can a simple stochastic model generate rich patterns of rainfall events?" by S.-M.Papalexiou, D.Koutsoyiannis, and A.Montanari [2011] and graphed as shown below in Figure 1. The green points constitute the BesselK fit which lies right on top of the blue empirical data set.

Fig. 1: Cumulative rainfall distribution statistically gathered from several storm events.
The BesselK fit assumes a double stochastic process, a MaxEnt within a storm and a MaxEnt across the storms.

From the table below reproduced from the paper, we can see that the mean value used in the BesselK distribution was exactly the same as that from the standard statistical moment calculation

Event# 1 2 3 4 5 6 7 All
SampleSize 9697 4379 4211 3539 3345 3331 1034 29536
Mean(mm/h) 3.89 0.5 0.38 1.14 3.03 2.74 2.7 2.29
StandardDeviation(mm/h) 6.16 0.97 0.55 1.19 3.39 2.2 2 4.11
Skewness 4.84 9.23 5.01 2.07 3.95 1.47 0.52 6.54
Kurtosis 47.12 110.24 37.38 5.52 27.34 2.91 -0.59 91
HurstExponent 0.94 0.79 0.89 0.94 0.89 0.87 0.97 0.89

In contrast, Papalexio try to apply Hurst-Kolmogorov statistics to the problem in search of a power law solution. They claim that the power law tail of -3 is somehow significant. I would suggest instead that the slight deviation in the tail region is likely caused by insufficient sampling in the data set. A slight divergence starts to occur at the 1 part in 5,000 level and there are only 30,000 points in the merged data set, indicating that statistical fluctuations could account for the difference. See Figure 2 below which synthesizes a BesselK distribution from the same sample size, and can clearly duplicate the deviation in the tail.

Fig. 2: Monte Carlo sampling from a synthetic BesselK distribution reveals the same departure at low probability events.
Fluctuations of 1 in 30,000 are clearly seen at this level, so care must be taken to not read fat-tail behavior that may not exist.
Bottomline is that this simple stochastic model beats the other simple stochastic model, which doesn't seem as simple anymore once we apply an elementary uncertainty analysis to the data.


For the moments of the Bessel K solution, assuming the mean of 2.29, we have standard deviation of 4.00, skew of 5.01, and kurtosis of 50.35. These map fairly well to the table above. The higher moments differ because of the tail difference I believe.

Interesting if we add another composite PDF to the BesselK then we get the Meijer G function:
$$\int_0^\infty 2 K_0(2 \sqrt{y})\frac{1}{y} exp(-\frac{x}{y}) dy = G_{0,3}^{3,0}(x | 0,0,0)$$

Fig. 3: PDFs of progressively greater scaling composition. Notice the gradual fattening of the tail.

As a summary, I made an argument based on observations and intuition that rainfall buildup is a potential energy argument. This translates to a Boltzmann/Gibbs probability with a large spread in the activation energy. The large spread is modeled by maximum entropy -- we don't know what the activation energy is so we assume a mean and let the fluctuations about that mean vary to the maximum amount possible. That is the maximum entropy activation which is proportional to a mean rainfall rate -- the stronger the rate, the higher the energy That becomes the premise for my thesis, predicting the probability of a given rainfall rate within the larger ensemble defined by a mean. Scientists are always presenting theses of varying complexity. This one happens to be rather simple and straightforward. Papalexio et al, were on exactly the same track but decided to stop short of making the thesis quite this simple. They seem to imply that the PDFs can go beyond the BesselK and into the MiejerG territory, and beyond, of variability. That is fine as it will simply make it into a fatter tail distribution, and if a fatter tail matches the empirical observations on a grander ensemble scale, then that might be the natural process.

Friday, February 10, 2012

Wind speeds of the World

See if you can follow this argument for the neatest little derivation of a wind speed estimator. I believe that this thing can model the distribution of wind speeds over the entire planet.

In The Oil Conundrum, we modeled the wind speed for a single location over a span of time. This turned into a simple maximum entropy estimator, which assumed a kinetic energy for a wind volume

$$ E \sim v^2 $$

and then a probability distribution function which assumes an average energy over a region i:

$$ P(E \mid E_i) = e^{\frac{-E}{E_i}} $$

but we know that the Ei can vary from region to region, so we leave it as a conditional, and then set that as a maximal entropy estimator as well

$$ p(E_i) = \alpha e^{- \alpha E_i} $$

then we integrate over the conditional's range according to standard practice.

$$ P(E) = \int_0^\infty P(E \mid E_i) p(E_i) \,dE_i $$

This results in a simple lookup in your favorite comprehensive table of cataloged integration formulas, which leads to the following solution:

$$ P(E) = 2 \sqrt{\frac{E}{\bar{E}}} K_1(2 \sqrt{\frac{E}{\bar{E}}})$$

where K1 is the modified BesselK function of the second kind, in this case of order 1, which is found in any spreadsheet program (such as Excel).

First of all, note that this is the same function that we used in TOC to determine the distribution of terrain slopes for the entire planet as well!  There, I took a slightly different route and derived it based on a cumulative probability distribution function (CDF) instead of a probability density function (PDF). No matter, in both cases it reduces to the simplest model one can imagine -- a function with only one adjustable parameter, that of the mean energy of a sampled wind measurement. In other words the expression contains a single number that represents an average planetary wind speed.  That's all we need.

So let us see how it works.

I used wind data from Bonneville Power Administration, which has over 20 meteorological stations set up around northern Oregon. The download consisted of over 2.5 million data points collected at 5 minute intervals, archived over the span of a little less than 2 years.

Fig. 1: Cumulative distribution function of wind energies from Bonneville with model fit.
I actually didn't have to even vary the parameter, as the average energy was derived directly by computing  the mean over the complete set of data separately. This corresponded to a value of 12.4 MPH, and I placed a pair of positive and negative tolerances to give an idea of the sensitivity of the fit.

As this is a single parameter model, the only leeway we have is in shifting the curve horizontally along the energy axis, and since this is locked by an average, the fit becomes essentially automatic with no room for argument. The probabilities are automatically normalized.

I also show the log-log plot next, which reveals a departure at high wind speeds.

Fig. 2: Cumulative distribution function of wind energies on a log-log plot.
At large wind speeds, the fit diverges from the data as the peak 5-minute wind speed observed was 60 MPH.

I am not too worried about this slight divergence as it only shows that excessive gale force winds (greater than 60 MPH) did not occur over the extended region during a span of two years.  The next logical step is to average this over longer times and larger spatial regions.

I have to admit that this investigation into the Bonneville dataset was prompted by a provocation and a dare on another blog.  In my opinion, anti-renewable-energy advocates seem to crawl out of the woodwork, and tend to spread their misery however they can. This commenter was named P.E. (possibly the professional engineer moniker), and what set me off was his assertion that wind "power output is essentially random":
  • I did a little research on the Bonneville system a year or so ago, because they have all their SCADA data available for several years online with excellent resolution. The results were downright shocking. And not in a good way. And this is real live system-wide data, not modeled results. Anyway, be sure to check Bonneville’s website out; it’s low hanging fruit.
    Bonneville’s own words: “power output is essentially random”.
  • Predictability is not the issue. Even if we knew just when the wind would not blow (about 70% of the time) we would still have to have a fueled power plant to produce the needed power, either that or some humongous expensive power storage systems. Renewables reduce emissions by idling existing fueled power plants. That is the most they can do. They are not an alternative to fueled power plants.
  • P.E. 
    In the specific case of Bonneville, they have enough hydro in the system to where reserve/backup isn’t an issue. The windmills allow them to use less water when it’s available, making more hydro power available when it’s needed. That’s the ideal situation for wind (and solar). But not that many places in the world can do that.
  • PE, You are so full of it as to be embarrassing. I have done statistical analysis of wind energy and it follows maximum entropy principles. You can read all about it and more to come, as I seem to have the knack for environmental modeling. Too bad you missed the boat on this one.
  • P.E.
    Web, go do the analysis yourself, and then come back after you’ve looked at some real data. They seem to have changed their web site around, but here’s a place to start:
    And this:
    There used to be some huge excel spreadsheets as evidenced by the two articles above (that my computer would choke on), but they’re either moved to another part of the site, or no longer available to the public.
    Maybe I’ll do a FOIA. :twisted:
  • P.E.
    Correction: they’re right here:
    Item #5. Now go to work, or eat your words.
  • P.E.
    Oh, and Web – if you need any help with Excel, I hear Phil Jones is really good at it.
  • P.E., That’s the problem with your ilk. You are basically ashamed about not being able to do the analysis yourself so lash out at someone like me that knows something about probability and statistics. It is all pathetic, this transparently petty jealousy.
  • hunter 
    WHT, who do we believe? Someone who cites a third party that is actually running a windfarm, or a self declared genius like you who has not bothered to challenge the citation, but blames the person who referred to it?
  • Hunter, The problem is that he shows a spreadsheet with raw data of a few weeks. I have analyzed data from wind farms, compiling the data over the span of a year in terms of a probability distribution. I have it right here on my blog

    And it is also in my recent book which you can click on my handle to read for free.
    You have to keep up with renewable research, otherwise you will be left behind.
  • hunter
    Web, wind is crap , delivering z small fraction of its rated capacity.
  • P.E. 
    For the record, that’s not a few weeks, that’s five years. And just for the edification of the ignorant, Bonneville Power Authority is the largest federal hydropower agency in the US, including the Grand Coulee dam at something like 7 gigawatts. This isn’t some small-potatoes outfit.
    But keep kvetching.
  • Kiddies all jealous over the fact that I know how to do the analysis.
The larger point is that the randomness can be modeled, which the garden-variety dogmatists don't seem to understand. This is really no different than understanding black-body Planck's response for radiative emission. Sure, the radiation appears random, but it also must still follow a distribution function with respect to wave-number, which is proportional to energy and thus follows statistical mechanical laws.  The same thing holds for wind energy distributions, and we are seeing statistical mechanics, superstatistics, and maximum entropy principles hold sway. 

Bottomline is that these environmental skeptics are renowned for all talk and no action when it comes to discussing energy and climate. I admit that I was needling the guy, but that's what prompts them to spill their true agenda, which is hatred against climate scientists coupled with a strange mix of Malthusian, Luddite, and Cornucopian ideals. They actually lash out angrily when they see somebody trying to scientifically advance the yardstick.

A few days later and in another thread, P.E. wouldn't let go, and went out on a limb:
  • This scold coming from someone who refuses to download wind data from the Bonneville Power Authority because it’s too much like work.
  • What a wanker, P.E.
    I downloaded data from Ontario and Germany wind farms about a year and a half ago and did a comprehensive analysis of the time series data. I happen to be on vacation over the holiday, so am not going to bend over backwards to work the Bonneville data at your whim and call.
    I tell you what; I will analyze the Bonneville data set when I get back, and I will add it to a future revision of my tome.
    But then again, maybe not, because trained seal and organ-grinder monkey is not my forte.
As in Judo, we learn to use our opponents force and redirect it back at them. They have no intellectual curiosity, and think that just by asserting something it must be true.  Only by applying models to empirical data can we make headway in our understanding of our environment.  The last laugh is on P.E. and I have to thank him for providing motivation to do this analysis.

The pushback I see against this model from the skeptical trenches is that it doesn't predict chaotic disturbances. I view chaotic bifurcation as a no-nothing theory. We can't predict anything using it, but know that an external forcing will always overpower the minor chaotic fluctuations that may occur. In other words, chaos is annoying noise in the bigger scheme of things.

The only caveat is whether that noise will also scale with the external forcing -- in other words the fluctuations growing bigger with added heat to the system. This is understood with respect to added heat raising the kinetic energy of the system - (1) increasing the average atmospheric wind speed, and (2) added heat to the ocean allowing for greater occasional releases into the atmosphere. These are likely subtle effects, but they will only go in that direction. Remember, reducing heat leads to lower thermal excitation.

The BesselK wind model has a thermal excitation built-in, as the maximum entropy estimator for average energy is nothing but a thermalized Boltzmann partition function describing the system's entropy.

Entropy measures the thermal excitation of the system and that scales with temperature. Non-linearities and chaos is just a way for the disordered system to explore the ergodic range of states, thus making the statistical mechanical/information theory valid. Look at how a naive pseudo-random number generator is created -- by sticking a handful of non-linear functions together, you will explore the state space (see the topic of iterated maps). Voila, it will approach an ergodic limit. Then add in energy constraints, such as the mean energy of the system and you have the Boltzmann estimator, aka MaxEnt estimation.

See also the topic of K-distribution model which models the disorder, or clutter, in electromagnetic interference. This also uses the BesselK model.

Another interesting observation. In the semiconductor world, the transport regimes analogous to the stochastic and chaotic/deterministic talked about in climate circles are diffusive and ballistic transport. It is very hard to get to ballistic transport because the carriers thermalize so rapidly, which is the essential dissipation route to diffusive transport. The idea of stochastic resonance is all over the place in semiconductor physics and technology, with carrier population inversions, bandgap engineering, etc. but we don't talk about it as stochastic resonance because there is already a well established terminology.

On top of that, there is the concept of dispersive transport, which is diffusive transport with huge amounts of disorder in the mobilities and bandgaps. This is the concept that I have done some recent research on because all the cheap photovoltaic technology is built on amorphous or polycrystalline garbage.

On that level, what I think is happening in the climate system is closer to a mix of dispersion and diffusion than determinism.

For example, relating to wind speed distribution, we need to get a handle on what controls the dispersion. From other posts that I have done on wind speed, remembered that the wind power dissipation is the cube of the wind speed, while the distribution that has a skew (the third statistical moment) equal to the mean is just the BesselK distribution that matches the empirical results. This is to me is somewhat mysterious ... is it just that the mean wind speed matches statistically the dissipation due to friction? Could that be some universal scaling result? In terms of maximum entropy it would be the distribution that has a constraint of a finite mean with the mean equating to cube root of the third moment (i.e Mean = standardDeviation * Skew). This then determines all the fluctuations observed over the larger ensemble.