INCORPORATION OF WEIBULL DISTRIBUTION IN L-MOMENTS METHOD FOR REGIONAL FREQUENCY ANALYSIS OF PEAKS-OVER-THRESHOLD WAVE HEIGHTS

The L-moments of the Weibull distribution are derived and incorporated in the regional frequency analysis of peaksover-threshold significant wave heights at eleven stations along the eastern coast of Japan Sea. The effective duration of wave measurements varies from 18.0 to 37.2 years with the mean rate of 10.4 to 15.1 events per year. The eleven stations are divided into three regions to assure homogeneity of the data. Both the Weibull and Generalized Pareto (GPA) distributions fit well to the observed data. The 100-year wave height varied from 8.2 to 11.2 m by the Weibull and 7.6 to 10.3 m by the GPA. The GPA distribution is not recommended for determination of design waves for these stations because it has an inherent upper limit and a tendency of under-prediction.


INTRODUCTION
Extreme wave analysis is the first step in coastal structure design by providing the basis of selecting design wave heights.A variety of methodology has been proposed for extreme analysis of environmental data such as flood discharge, storm wind speed, storm wave height, etc.An extreme data set is prepared by either the annual maximum (AM) data or the partial duration series data.The latter is also called the peaks-over-threshold (POT) data, because a peak value per storm over a preset threshold is employed to constitute the extreme data set.
There are many methods for distribution fitting to the data set such as the method of moments, probability-weighted moment method, least squares method, maximum likelihood method, L-moments method, etc., among which the L-moments method is the newest one proposed by Hosking (1990).It was developed from the probability-weighted moment method by Greenwood et al. (1978) and has the merits of easy application and straightforward estimation of distribution parameters.The FORTRAN program "lmoments" by Hosking can be downloaded at http://lib.stat.cmu.edu.Hosking and Wallis (1997) have promoted the methodology of regional frequency analysis using the L-moments method for estimation of some common distributions to be fitted to multiple data sets within a homogeneous region.Application of regional frequency analysis to extreme waves has been reported by van Gelder et al. (2000) and Ma et al. (2008).Van Gelder et al. analyzed the wave data at nine stations off the Dutch coast, while Ma et al. examined the buoy data of the National Data Buoy Center at seven stations off the California coast of U.S.A. Waves off the Dutch coast were fitted with the Generalized Pareto (GPA) distribution, while waves off the California coast was fitted with the Pearson type III distribution.
Both studies did not include the Weibull distribution because Hosking (1990) did not derive the L-moments of the Weibull distribution and the formulas for parameter estimation.However, the Weibull is one of the most favored distributions in coastal and ocean engineering since Petruaskas and Aagaard (1971) have worked out the unbiased plotting position formulas.A number of extreme wave data sets have been fitted to the Weibull distribution.Its omission from the candidate distributions certainly causes discontinuity in the practice of extreme wave analysis.To prevent such discontinuity, derivation of the L-moments of Weibull distribution is made and the regional frequency analysis of extreme waves at the Japan Sea coast with large data sets is described in the present paper.

L-MOMENTS OF WEIBULL DISTRIBUTION AND PARAMETER ESTIMATE
The following functional form is given for the Weibull distribution to be employed in the present paper: where F(x) denotes the cumulative distribution of data x, and A, B, and k are the scale, location, and shape parameter, respectively.The peaks-over-threshold (POT) storm significant wave heights constitute the data x.The quantile of the Weibull distribution for a given non-exceedance probability F is derived as the inverse function of Eq. (1) as below.
In the regional frequency analysis, the first to fourth L-moments denoted by  1 to  4 are derived from a given distribution function, and the mutual ratios of  =  2 / 1 ,   =  3 / 2 , and   =  4 / 2 are calculated.These ratios are called the L-CV (coefficient of L-variation), L-skewness, and L-kurtosis, respectively.Please refer to Appendix A for the definition of L-moments.The first L-moment represents the mean of the distribution function.
The L-moments and their ratios for the Weibull distribution are given by where (•) denotes the Gamma function.Van Gelder (2000) lists a set of formulas of the L-moments for the Weibull distribution, but his L-skewness (  ) is given the sign opposite to Eq. ( 3) mistakenly.
For a sample of extreme data, the L-moments are evaluated by algebraic calculations.The estimated values may deviate from Eq. (3) owing to sample variability.Hosking and Wallis (1997) use the symbols of t, t 3 , and t 4 for the sample values for their differentiation from the population values of ,  3 , and  4 .Nevertheless, the latter symbols are used in the present paper for the sake of simplicity.
Once the L-moment ratios are evaluated from a sample, the parameters of the Weibull distribution are directly estimated by the following formulas: Equation ( 4) is empirically obtained by fitting a polynomial function to the relation between   and k ; the fitting error is less than 0.3% for the range of 0.6 < k < 3.0.

CHARACTERIZATION OF DISTRIBUTION FUNCTIONS
The present paper employs the Generalized Extreme Value (GEV), the Generalized Pareto (GPA), and the Weibull distributions as the candidates of data fitting.The GEV distribution is a synthesis of the Fisher-Tippett type I, II, and III distributions, while the GPA distribution is a non-asymptotic distribution function.These functions are well discussed in statistical textbooks such as Coles (2001).Their functional forms are given below.

GEV:
 When the shape parameter k of GEV is positive, it becomes the FT-III and has the upper limit at B + A/k.When k = 0, it is the FT-I or Gumbel distribution.When k < 0, it becomes the FT-II or Frechet distribution and has the lower limit at B -A/k.The GPA distribution becomes the exponential distribution for k = 0.It has the lower limit at B and the upper limit at B + A/k for k > 0. The Lmoments and the formulas for parameter estimation are well described in the book by Hosking and Wallis (1997) and are not listed here.
These distribution functions are characterized with the mutual relationship between L-kurtosis and L-skewness as shown in Fig. 1, which includes the Pearson type III distribution.The GEV and GPA distributions are plotted with differentiations for k > 0, k = 0, and k < 0. Positions of the Weibull distribution with typical k-values are shown with square symbols, while positions of the GEV with typical negative k-values are marked with diamond symbols.Because the behavior of the Pearson type III distribution is not much different from the Weibull or GPA distribution, it is not included in the candidate distributions for extreme wave data; it is rather unfamiliar in coastal engineering applications.

PEAKS-OVER-THRESHOLD WAVE DATA FOR ANALYSIS
The Port-administrating agencies under the Ministry of Land, Infrastructure, Transport and Tourism of Japan have been operating a nation-wide wave observation network called NOWPHAS (Nationwide Ocean Wave information network for Port and HAbourS) since 1970 (See Goda et al, 2000).Currently it has 61 seabed-mounted wave sensors and 11 GPS buoys, most of which are sending wave data to the Port and Airport Research Institute continuously for 24 hours daily.
The wave data are statistically and spectrally analyzed for the duration of 20 minutes every two hours.Some stations have the wave record for nearly 40 years.Figure 2 shows the locations of major NOWPHAS stations.The data at eleven stations along the eastern coast of Japan Sea are analyzed for the regional frequency analysis.Table 1 lists the station names, locations, and other characteristics.The threshold level H c was selected to yield the mean occurrence rate in the range of 10 to 15 per year.Several levels of the threshold heights were tried, but changes in estimated return wave heights were slight.
All the POT data were normalized by being divided by the mean value of each data set.The first L-moment  1 and L-moment ratios of 11 station data have been calculated as listed in Table 2.The first L-moment  1 is calculated before normalization and is equal to the mean of the POT data.The discordancy measure D i is calculated for six stations from Rumoi to Niigata and five stations from Wajima to Hamada separately.According to the criterion by Hosking and Wallis, the wave data at these stations are judged as homogeneous.

L-kurtosis, t
The procedure of regional frequency analysis is summarized in Appendix B. As demonstrated in Figs. 3 and 4, the L-moment ratios are spread in relatively narrow ranges.Theoretical curves of  4 versus  3 for the Weibull and GPA distributions are also shown in Fig. 4, suggesting the both distribution would fit to the wave data.

REGIONAL DIVISION AND HETEROGENUITY MEASURES
The regional frequency analysis requires homogeneity of the data within a region.Hosking and Wallis (1997) propose to check the homogeneity by using the following heterogeneity measure H: where V is the quantity defined by the following with  V and  V being the mean and standard deviation of V to be evaluated by numerical simulation, respectively: where N is the number of stations, n i is the number of data at the i-th station, t (i) is a quantity related to the L-moment ratio of the i-th station, and t R is the regional mean of the quantity estimated by fitting a Kappa distribution having four parameters to the regional data.
The heterogeneity measure is checked with three quantities.The first measure H(1) concerns with the spread of L-CV value within a region.The second measure H(2) examines the distance of a data set from the center of gravity on the scatter diagram of L-CV and  3 .The third measure H(3) calculates the distance of a data set from the center of gravity on the scatter diagram of  4 and  3 .Hosking and Wallis (1997) states that the region can be judged as homogeneous when H ≤ 1 and may be heterogeneous if H > 2. Several regional divisions of 11 stations were tried as listed in Table 3.The three L-moment ratios are the regional weighted means with the weight of the number of data.According to the result of Table 3, the regions A, B, and C are regarded as heterogeneous as far as H( 1) is concerned and the regions D, E, F are judged as homogeneous.The heterogeneity measures H(2) and H(3) remains below 2 for all the regions A to F.  Hosking and Wallis (1997) have proposed to judge the degree of goodness-of-fitting of a distribution by using the following quantity Z: where t 4 R is the regional mean of L-kurtosis,  4 DIST is the value of L-kurtosis estimated from the regional L-skewness for a particular distribution, and  4 is the regional standard deviation of L-kurtosis for the Kappa distribution estimated by numerical simulation.They have been given an acceptance criterion of 64 in consideration of the 90% confidence interval of the normal distribution.The GEV distribution was found to have the Z value in excess of 6 for the regions A to F, and thus it was unaccepted as the regional distribution.The GPA distribution had the Z value from -0.37 to -1.12 and was accepted as the regional distribution.The Weibull distribution had the Z value of 1.32 to 1.61 for the regions C to F and was also accepted.Table 4 lists the parameter values of the regional Weibull and GPA distributions for the regions C to F.  Figures 5 to 7 show the comparison of non-dimensional wave heights with the estimated return wave heights for the range of return period from 0.1 to 1000 years.The return period of the m-th descending order wave height is assigned with the non-exceedance probability of where n i is the number of data at the i-th station and  i is the mean rate, because the Lmoments of the wave data have been computed using the unbiased plotting position as recommended by Hosking and Wallis (1997).The return period of the largest data is equal to the effective duration.
As seen in Figs. 5 to 7, the return wave heights estimated by the GPA distribution are smaller than those by the Weibull distribution for the range of return period longer than about 3 years.Individual wave data are scattered around the fitted distributions, but the fitted distributions are regarded as representing the population distribution at each region.The wave data at Fukui shown in Fig. 7 exhibit some deviation from other stations and the deviation is a cause of a high heterogeneity measure H(1) in Table 3; the distributions fitted to other four stations in the region C are nearly the same as those shown in Fig. 7. Choice of the Weibull or the GPA distribution for the coastal stations along the Japan Sea requires some engineering judgment, because the both distributions fit well to observed wave data.However, the GPA distribution predicts a smaller return height for a long return period than the Weibull.Table 5 is a comparison of 100-year significant wave heights estimated with the Weibull and GPA distributions as well as the maximum significant wave height observed during 18 to 37 effective years (see DISCUSSION for LSQ H 100 ).As exhibited in Table 5, the 100-year wave height estimated by the GPA distribution is less than the maximum observation height at four stations.It would not be wise to employ the GPA distribution for the viewpoint of safe structural design.Previously, Goda et al. (2000) made another approach in search of regional population distributions of extreme wave heights around Japan with the least squares method.The rejection criteria called REC and DOL were applied to POT wave data at multiple stations in a region, and the distribution function least rejected was recommended as the population distribution.For the eleven stations around the Japan Sea coast, same as the present ones, the Weibull distribution with the shape parameter of k = 1.4 was judged to represent the population.The data duration was 10 years less than the present ones.

DISCUSSIONS
The estimated 100-year wave heights by the least squares method (LSQ) with the present data set are listed at the rightmost column of Table 5.They are generally smaller than the Weibull estimates with k ≈ 1.2 but greater than the GPA estimate.Smaller prediction by LSQ may be caused by fitting of a larger shape parameter of k = 1.4,but occurrence of large storm waves in the last ten years must have contributed to the increase of 100-year wave heights in the present analysis.
In the present paper, confidence intervals for the estimated return wave heights have not been worked out yet.Hosking and Wallis (1997) recommend use of the Monte Carlo simulation technique for estimation of the confidence interval once a best-fitting distribution function is selected.This is one of the tasks which are intended to be pursued in the near future by the authors.

Figure 1 .
Figure 1.Relationship between L-kurtosis and L-skewness of several extreme distribution functions.

Figure 5 .Figure 7 .
Figure 5. Fitting of Weibull and GPA distribution functions for the region D.

Figure 12 .
Figure 12.Return wave height of Fukui Figure 13.Return wave height of Hamada

Table 4 : Parameter values of regional Weibull and GPA distribution.
R / H mean Return period, R (year)