U-統計量

U-統計量是統計學中一類特定的、具有對稱性的統計量，它在估計理論中扮演重要角色。名稱中的「 U」為無偏（unbiased）之意。在初等統計學中，U-統計量與最小方差無偏估計量 (UMVUE) 有密切聯繫。

U-統計量的一個重要性是，對概率分佈來說，其可估計參數的最小方差無偏估計量是一個U-統計量。 ^[1]^[2] 因此通過研究U-統計量的一般性質，可以系統地了解這些估計量的統計學性質。^[3]

U-統計量在非參數統計中尤其重要，不少用於估計和統計檢驗的統計量，在形式上都是U-統計量。U-統計量通常具有良好的漸近正態性，這方便了基於它的統計推論。近年來，U-統計量在研究複雜的隨機過程和隨機網絡類型數據的隨機性質方面，發揮了作用。^[4]^[5]^[6]

目前，統計學家們對U-統計量性質的了解，幾乎全都基於Hoeffding發表於1948年的經典論文^[7]。在這篇論文裏，Hoeffding給出了U-統計量最重要的性質——它的ANOVA分解。

定義

定義 $h(x_{1},\ldots ,x_{r}):\mathbb {R} ^{r}\to \mathbb {R}$ 為一個函數，其具有對稱性，即交換任意 $x_{i},x_{j}$ 的位置， $h$ 的值保持不變。對隨機變量 $X_{1},\ldots ,X_{n}$ ，基於 $h$ 的U-統計量定義如下：

U_{n}={\frac {1}{\binom {n}{r}}}\sum _{1\leq i_{1}<\cdots <i_{r}\leq n}h(X_{i_{1}},\ldots ,X_{i_{r}})

這裏， $h(\cdot )$ 稱為U-統計量的核函數（Kernel function），而核函數的維數 $r$ 稱為該U-統計量的度（degree）。^[8]

兩樣本U-統計量

定義 $h(x_{1},\ldots ,x_{r};y_{1},\ldots ,y_{s}):\mathbb {R} ^{r+s}\to \mathbb {R}$ 為一個函數，其對 $X$ 和 $Y$ 分別具有對稱性，即交換任意 $x_{i_{1}},x_{i_{2}}$ 的位置或交換任意 $y_{j_{1}},y_{j_{2}}$ 的位置， $h$ 的值保持不變（但不能隨意交換 $x_{i},y_{j}$ ）。對隨機變量 $X_{1},\ldots ,X_{m};Y_{1},\ldots ,Y_{n}$ ，基於 $h$ 的兩樣本U-統計量定義如下：

U_{m,n}={\frac {1}{{\binom {m}{r}}{\binom {n}{s}}}}\sum _{1\leq i_{1}<\cdots <i_{r}\leq m}\sum _{1\leq j_{1}<\cdots <j_{s}\leq n}h(X_{1},\ldots ,X_{r};Y_{1},\ldots ,Y_{s})

目前在機器學習中，最常見的情形是 $r=s=1$ ，例如能量距離和最大平均差異（MMD）。

Hoeffding的ANOVA分解定理

定理表述

Hoeffding的ANOVA分解定理是現代U-統計量理論的基礎。^[9]為表述該定理，定義： $\mu =\mathbb {E} [h(X_{1},\ldots ,X_{r})]$ 。對所有 $1\leq k\leq r$ ，定義投影函數：

$a_{k}(x_{1},\ldots ,x_{k})=\mathbb {E} [h(X_{1},\ldots ,X_{r})|X_{1}=x_{1},\ldots ,X_{k}=x_{k}]-\mu$

然後定義正交化投影函數：

$g_{1}(x_{1})=a_{1}(x_{1})$ ， $g_{2}(x_{1},x_{2})=a_{2}(x_{1},x_{2})-g_{1}(x_{1})-g_{2}(x_{2})$ ，等等，每一個 $g_{k}$ 都定義為相應的 $a_{k}$ 減去之前定義過的所有 $g_{1},\ldots ,g_{k-1}$ ，直至最後一個函數 $g_{r}$ ：

$g_{r}(x_{1},\ldots ,x_{r})=a_{r}(x_{1},\ldots ,x_{r})-\sum _{j=1}^{r-1}\sum _{1\leq i_{1}<\cdots <i_{j}\leq r}g_{j}(x_{i_{1}},\ldots ,x_{i_{j}})$

Hoeffding的ANOVA分解定理的內容是：

$U_{n}-\mu ={\binom {n}{r}}^{-1}\sum _{k=1}^{r}{\binom {n-k}{r-k}}\cdot \sum _{1\leq i_{1}<\cdots <i_{k}\leq n}g_{k}(X_{i_{1}},\ldots ,X_{i_{k}})$

分解項的性質

所有的正交化投影函數 $g_{k}$ 都滿足：

$\mathbb {E} [g_{k}(X_{1},\ldots ,X_{k})|X_{1},\ldots ,X_{k-1}]=0$

因此，所有的分解項之間是互不相關的^[9]，並且度為 $k$ 的分解項之平均的階為 $O_{p}\left(n^{-k/2}\right)$ .

在大多數應用中，一個U-統計量的ANOVA分解中最重要的是前一項或前兩項。根據分解項的性質，可以得到如下的兩項ANOVA分解式：

$U_{n}-\mu ={\frac {r}{n}}\sum _{i=1}^{n}g_{1}(X_{i})+{\frac {r(r-1)}{n(n-1)}}\sum _{1\leq i<j\leq n}g_{2}(X_{i},X_{j})+O_{p}(n^{-3/2})$

定理應用

U-統計量的漸近正態性是Hoeffding的ANOVA分解定理的簡單推論。具體而言，有如下結論：記 $\xi _{1}^{2}=\mathrm {Var} (g_{1}(X_{1}))$ ，則:

n^{1/2}\left(U_{n}-\mu \right)\ {\stackrel {d}{\to }}\ N\left(0,r^{2}\xi _{1}^{2}\right)

同時，分解定理也指出了應該如何正確地一階逼近U-統計量的方差，和對其進行t-標準化。

由該定理出發，在不同強度的假設條件下，可以用一項或兩項的Edgeworth展開來高精度地逼近U-統計量的分佈。^[8]^[10]^[11]^[12]

具體例子

度為1的例子：令 $h(x)=x$ ，則U-統計量 ${\frac {1}{n}}\sum _{i=1}^{n}h(X_{i})={\bar {X}}_{n}$ 是樣本均值。

度為2的例子：令 $h(x_{1},x_{2})=|x_{1}-x_{2}|$ ，則U-統計量

{\frac {1}{\binom {n}{2}}}\sum _{1\leq i<j\leq n}h(X_{i},X_{j})

稱為「平均成對偏差」。

另一個度為2的例子：令 $h(x_{1},x_{2})=(x_{1}-x_{2})^{2}/2$ ，則U-統計量有如下變形：

{\frac {1}{\binom {n}{2}}}\sum _{1\leq i<j\leq n}h(X_{i},X_{j})=\sum (X_{i}-{\bar {X}})^{2}/(n-1)

這正是人們熟知的樣本方差 $S_{n}^{2}$ 。

度為3的例子：樣本偏度定義中的分子項：

{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-{\bar {X}})^{3}

展開後可以寫成一個U-統計量。

在機器學習中，用核函數方法進行一樣本或兩樣本非參數統計檢驗時，檢驗統計量是一個能量距離或最大平均差異（MMD），兩者均為U-統計量或表達式包含兩樣本U-統計量。^[13]^[14]

參見

V-統計量

參考文獻

^ Cox & Hinkley (1974),p. 200, p. 258
^ Hoeffding (1948), between Eq's(4.3),(4.4)
^ U-Statistics : Theory and Practice.. Routledge. ISBN 9781351405850.
^ Page 508 in Koroljuk, V. S.; Borovskich, Yu. V. Theory of U-statistics. Mathematics and its Applications 273 Translated by P. V. Malyshev and D. V. Malyshev from the 1989 Russian original. Dordrecht: Kluwer Academic Publishers Group. 1994: x+552. ISBN 0-7923-2608-3. MR 1472486.
^ Pages 381–382 in Borovskikh, Yu. V. U-statistics in Banach spaces. Utrecht: VSP. 1996: xii+420. ISBN 90-6764-200-2. MR 1419498.
^ Page xii in Kwapień, Stanisƚaw; Woyczyński, Wojbor A. Random series and stochastic integrals: Single and multiple. Probability and its Applications. Boston, MA: Birkhäuser Boston, Inc. 1992: xvi+360. ISBN 0-8176-3572-6. MR 1167198.
^ Hoeffding, Wassily. A Class of Statistics with Asymptotically Normal Distribution. The Annals of Mathematical Statistics. 1948-09, 19 (3): 293–325. doi:10.1214/aoms/1177730196.
^ ^8.0 ^8.1 Bickel, P. J.; Gotze, F.; van Zwet, W. R. The Edgeworth Expansion for $U$-Statistics of Degree Two. The Annals of Statistics. 1986-12, 14 (4): 1463–1484. doi:10.1214/aos/1176350170.
^ ^9.0 ^9.1 Maesono, Yoshihiko. Edgeworth expansions of a studentized U-statistic and a jackknife estimator of variance. Journal of Statistical Planning and Inference. 1997-05, 61 (1): 61–84. doi:10.1016/S0378-3758(96)00148-6.
^ Putter, Hein; van Zwet, Willem R. Empirical Edgeworth expansions for symmetric statistics. The Annals of Statistics. 1998-08, 26 (4): 1540–1569. doi:10.1214/aos/1024691253.
^ Jing, Bing-Yi; Wang, Qiying. Edgeworth expansion for U -statistics under minimal conditions. The Annals of Statistics. 2003-08, 31 (4): 1376–1391. doi:10.1214/aos/1059655916.
^ Yuan Zhang; Dong Xia. Edgeworth expansions for network moments. The Annals of Statistics. 2022-04-01, 50 (2): 726–753. doi:10.1214/21-AOS2125.
^ Székely, Gábor J.; Rizzo, Maria L. Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference. 2013-08, 143 (8): 1249–1272. doi:10.1016/j.jspi.2013.03.018.
^ Gretton, Arthur; Borgwardt, Karsten M.; Rasch, Malte J.; Schölkopf, Bernhard; Smola, Alexander. A Kernel Two-Sample Test. Journal of Machine Learning Research. 2012, 13 (25): 723–773 [2020-06-26]. （原始內容存檔於2022-02-04）.

[1] Cox & Hinkley (1974),p. 200, p. 258

[2] Hoeffding (1948), between Eq's(4.3),(4.4)

[3] U-Statistics : Theory and Practice.. Routledge. ISBN 9781351405850.

[4] Page 508 in Koroljuk, V. S.; Borovskich, Yu. V. Theory of U-statistics. Mathematics and its Applications 273 Translated by P. V. Malyshev and D. V. Malyshev from the 1989 Russian original. Dordrecht: Kluwer Academic Publishers Group. 1994: x+552. ISBN 0-7923-2608-3. MR 1472486.

[5] Pages 381–382 in Borovskikh, Yu. V. U-statistics in Banach spaces. Utrecht: VSP. 1996: xii+420. ISBN 90-6764-200-2. MR 1419498.

[6] Page xii in Kwapień, Stanisƚaw; Woyczyński, Wojbor A. Random series and stochastic integrals: Single and multiple. Probability and its Applications. Boston, MA: Birkhäuser Boston, Inc. 1992: xvi+360. ISBN 0-8176-3572-6. MR 1167198.

[7] Hoeffding, Wassily. A Class of Statistics with Asymptotically Normal Distribution. The Annals of Mathematical Statistics. 1948-09, 19 (3): 293–325. doi:10.1214/aoms/1177730196.

[Bickel-8] 8.0 ^8.1 Bickel, P. J.; Gotze, F.; van Zwet, W. R. The Edgeworth Expansion for $U$-Statistics of Degree Two. The Annals of Statistics. 1986-12, 14 (4): 1463–1484. doi:10.1214/aos/1176350170.

[Maesono-9] 9.0 ^9.1 Maesono, Yoshihiko. Edgeworth expansions of a studentized U-statistic and a jackknife estimator of variance. Journal of Statistical Planning and Inference. 1997-05, 61 (1): 61–84. doi:10.1016/S0378-3758(96)00148-6.

[10] Putter, Hein; van Zwet, Willem R. Empirical Edgeworth expansions for symmetric statistics. The Annals of Statistics. 1998-08, 26 (4): 1540–1569. doi:10.1214/aos/1024691253.

[11] Jing, Bing-Yi; Wang, Qiying. Edgeworth expansion for U -statistics under minimal conditions. The Annals of Statistics. 2003-08, 31 (4): 1376–1391. doi:10.1214/aos/1059655916.

[NetEdgeworth-12] Yuan Zhang; Dong Xia. Edgeworth expansions for network moments. The Annals of Statistics. 2022-04-01, 50 (2): 726–753. doi:10.1214/21-AOS2125.

[13] Székely, Gábor J.; Rizzo, Maria L. Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference. 2013-08, 143 (8): 1249–1272. doi:10.1016/j.jspi.2013.03.018.

[14] Gretton, Arthur; Borgwardt, Karsten M.; Rasch, Malte J.; Schölkopf, Bernhard; Smola, Alexander. A Kernel Two-Sample Test. Journal of Machine Learning Research. 2012, 13 (25): 723–773 [2020-06-26]. （原始內容存檔於2022-02-04）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]