用户:Chen-Pan Liao/费雪正确概率检定

费雪正确概率检定（英文：Fisher's exact test），或称费雪精确检定，是统计学中的一种假说检定，用于检验列联表的显著性差异，由罗纳德·爱尔默·费雪于1935年所创。^[1]^[2]^[3]实务中，该方法常用于样本数较小的情况，但其实不限于小样本情况。它属于一种精确检定（英语：Exact test），也就是其p值可以由虚无假说的分布实际计算而不是借由足够的样本数逼近一个特定的几率分布。

据说，费雪根据缪丽·布里斯托尔（英语：Muriel Bristol）女士声称能够区别奶茶是先加了茶还是牛奶而设计了这项检定。他在女士品茶实验中亦实作了这项检定。 ^[4]

目的与使用情境

此检定在考验两种分类结果所产生的类别型变数很有用；它用于检查两种分类结果之间的关联（偶然性）是否显著。在费雪的原始例题中，一个分类结果是奶茶实际上的冲泡方式（先加牛奶还是茶），另一个分类标准是缪丽·布里斯托尔（英语：Muriel Bristol）认定的冲泡方式，并使用本方法检验这两种分类结果是否具有关联（受测者是否真的可以分辨出先倒入的是牛奶还是茶）。如同女士品茶实验，此检定大多数使用于2 × 2列联表（如下所述）。最终求得的p值是基于列联表边际是固定的，也就是受测者明确知晓八杯茶中有四杯先加牛奶，因此必然只会挑出四杯。这导致表格单元格中数字在独立性虚无假说下服从超几何分布。

若样本数较大，一般使用卡方检定或G检定（英语：G-test），其统计量近似于卡方分布。在样本数较小或是表格中次数差异很大的情况，这样的大样本近似方法不适用。通常可以预先检查表格中各细格的期望值是否皆大于5（或是只有一格小于10）以决定可否使用基于卡方分布的大样本近似方法，虽然这样的预先检查已被认定为过度保守。^[5]事实上，卡方近似方法的p值在过小、稀疏的或不平衡的数据与精确检定的p值可能南辕北辙而导致相反结论。^[6] ^[7]相比之下，费雪精确检定，正如其名称所述，只要实验过程保持行和列总和固定不变，它就是精确的，因此无论样本特征如何都可以使用。费雪的方法虽然使用于大样本或平衡良好的表格会使计算变得困难，但幸运的是，这些正是卡方检定适合的条件。

此检定在2 × 2列联表的情况下可以用手计算。然而，此方法其实可以扩展到m × n联表的情况，^[8]但计算并不容易，可改用统计软件计算（其中有些使用蒙特卡罗方法来获得p值的近似值）。^[9]

此检定还可用于量化两组之间的“重叠程度”。例如，在统计遗传学的富集分析（英语：Gene set enrichment analysis）中，可以为特定的表型加注一组基因（A）。使用者可以测试某些感兴趣的基因组（B）与基因组A的重叠程度。在这种情况下，可以归纳成一个 2 × 2 列联表以表示以下情况的次数：

同时存在于A基因组与B基因组的基因
仅存在于A的基因
仅存在于B的基因
同时不存在于A与B的基因

该测试的虚无假设是任一基因组的基因都来自更广泛的基因集，再以费雪正确概率检定检验是否显著重叠。^[10]

例题

以一群青少年样本为例，一方面可以将样本分为男性和女性，另一方面可以分为目前正在或尚未准备统计学考试。样本中正在准备考试的女性多于男性，而我们想检验我们观察到的比例差异是否显著。数据如下所示：

	男性	女性	列总和
正在准备考试	1	9	10
尚未准备考试	11	3	14
栏总和	12	12	24

关于这些数据，我们要问的问题是：已知这24名青少年中有10名正在准备考试，并且这24名青少年中有12名是女性。将虚无假说设定为男性和女性的学习比例是相等的，则这10名准备考试的青少年的性别分布是否不同于尚未准备考试者？更具体的说，如果我们随机选择10位青少年，则能够抽出12位女性中的9位（或更多）女性而12名男性中只抽出1位（或更少）的几率是多少？

在进行检验之前，我们首先介绍一些符号。我们用字母a、b、c和d表示各细格中的次数，将跨行和跨列的总计称为边际总计，并用n表示总和数。所以上述表格可写成：

	男性	女性	列总和
正在准备考试	a	b	a + b
尚未准备考试	c	d	c + d
栏总和	a + c	b + d	a + b + c + d = n

费雪表明，以表格中列总和与栏总和皆被故定为条件，a呈超几何分布，其中a + c从a+b成功和c+d失败的母体中抽出。获得这样一组结果的几率由下式给出：

p={\frac {\displaystyle {{a+b} \choose {a}}\displaystyle {{c+d} \choose {c}}}{\displaystyle {{n} \choose {a+c}}}}={\frac {\displaystyle {{a+b} \choose {b}}\displaystyle {{c+d} \choose {d}}}{\displaystyle {{n} \choose {b+d}}}}={\frac {(a+b)!~(c+d)!~(a+c)!~(b+d)!}{a!~~b!~~c!~~d!~~n!}}

其中 ${\tbinom {n}{k}}$ 是二项式系数，符号“!”表示阶乘运算。我们可以这样理解：若已知所有的边际总和（即a + b、c + d、a + c和b + d），则只剩下一个自由度，例如已知a则足以推导出其他数值。现在， ${\displaystyle p=p(a)}$ 是从包含n个元素的更大集合中抽出不放回地随机选择a + c个元素时抽出a元素，这正是超几何分布的定义。由上述资料可得，

p={{\tbinom {10}{1}}{\tbinom {14}{11}}}/{\tbinom {24}{12}}={\tfrac {10!~14!~12!~12!}{1!~9!~11!~3!~24!}}\approx 0.001346076

上面的公式给出了观察这种特定数据排列的确切超几何几率，其前题是男性和女性具有相同比例进行考试准备比例的虚无假说以及边际总数为定值。换句话说，如果我们假设男性与女性准备考试的几率都是p，并且男性和女性都是独立地被采样，无论他们是否正在准考试，那么这个超几何公式给出了在四个单元格中观察次数a、b、c、d的条件几率，而条件是观察到的边缘总数（也就是假设给出了表格边缘显示的列与栏总数）。即使男性与女性以不同的几率进入我们的样本（例如母体中性别比例不是1:1），这仍然是正确的。要求仅仅是两个分类特征（性别和是否准备考虑）互为独立事件。例如，假设我们知道几率P和Q分别表示男性与女性的边际比例，p与q分别表示有无准备考试的边际比例，自然存在P + Q = 1与p + q = 1的事实，且性别和是否准备考虑）互为独立事件，则上述资料各性别与是否准备考试的几率则分别为

已准备考试的男性几率：PQ
已准备考试的女性几率：pQ
未准备考试的男性几率：Pq
未准备考试的女性几率：pq

之后，如果我们计算给定边缘条件的分布，我们将获得上述的公式，其中p和P都不在式中。因此，我们可以计算出将24名青少年任意排列到表的四个单元格中的确切几率。费雪表明，统计显著性的计算只需要考虑边际总和与观测结果相同或更极端的情况即可。（巴纳德检定（英语：Barnard's test）则放宽了对一组边际总数的限制。）在该示例中，有11种排列方式与上述数据在相同的方向上更为极端，并可以简化为1种组合（如下表）：

	男性	女性	列总和
正在准备考试	0	10	10
尚未准备考试	12	2	14
栏总和	12	12	24

而发生这组资料的几率（在相同前题下）为 ${p={\tbinom {10}{0}}{\tbinom {14}{12}}}/{\tbinom {24}{12}}\approx 0.000033652$

若虚无假说为真，我们可以得到单尾检定（英语：One- and two-tailed tests）的p值，即目前资料及更极端的资料的几率总和，约等于0.001346076 + 0.000033652 = 0.001379728。在R语言环境下，这个值可以借由fisher.test(rbind(c(1,9),c(11,3)),alternative="less")$p.value，或者在Python中使用scipy.stats.fisher_exact(table=[[1,9],[11,3]], alternative="less")获取。该p值可以解释为观察数据（或任何更极端的表格）为虚无假说（男性和女性准备考试的比例没有差异）提供的证据总和。当p值越小，拒绝原假设的证据越多；因此例题中的数据强烈地表明男性和女性准备考试的可能性并不相同。

若考虑的是双尾检定（英语：One- and two-tailed tests），则需要额外考虑同样极端但方向相反的表格，即对称于目前资料方向的拒绝域。然而，此时“对称处更极端的表格”并没有唯一的定义。R语言提供的fisher.test函数采用的方法是对所有几率小于或等于目前资料概率的总和来计算p值，因此双尾检定的p值不一定是单尾检定的二倍（特别是小样本的情况），与其它具有对称性的几率分布不同。

如上所述，太多数现代统计软件（英语：List of statistical software）可以计算费雪精确检定的显著性，但当样本数很大时可能会无法运算，例如发生过大的阶乘而中断。此时可改以卡方分布的近似方法，或是利用Γ函数或对数Γ函数，但精确计算超几何和二项式概率的方法仍然是热门的研究领域。

争议

尽管费雪的检定方法能精确地计算p值，但一些作者认为它是保守的，也就是检定力较低。^[11]^[12]^[13]当离散统计量的特性与选用固定的显著性水准二者结合后可能发生这样的问题。^[14]^[15]更准确地说，费雪检定加总了在虚无假说成立时每种相同或更极端的表格之发生几率为p值，但由于所有表格的集合是离散的，可能不存在与实现情况相等的表格。若α_e是小于5%的最大p值并存在于某些表格的集合，建议应预先测试有效的α_e水准。对于小样本量的清况，α_e可能明显低于5%。^[11]^[12]^[13]虽然这种影响发生在任何离散统计数据中，但有人认为这一事实使费雪在边际上的检验条件使问题更加复杂。^[16]为了避免这个问题，许多作者在处理离散问题时不鼓励使用固定的显著性水准。^[14]^[15]

以表格边缘为条件的决定也存在争议。^[17]^[18]费雪检定得出的p值来自以列边际总和与栏边际总和被固定。从这个意义上讲，测试仅对条件分布是精确的，而不是原始表格。在原始资料中，边际总数可能因实验而异而不适合使用费雪检定。当边际总和不固定时，可以考虑使用其他方法以获得2 × 2表格的精确p值。例如，巴纳德检定（英语：Barnard's test）允许随机的边际总和。然而，一些作者（包括后来的巴纳德本人）批评了巴纳德基于此性质的检验。^[14]^[15]^[18]^[14]他们认为边际成功总数（即前先表格中的a + b）几乎是辅助统计量，^[15]几乎不包含有关测试属性的信息。

从2 × 2表格中以边际成功率为条件可能忽略了数据中关于未知胜算比（英语：Odds ratio）的一些信息。^[19]边际总数（几乎）是辅助统计量的论点意味着，用于推断这个胜算比的适当似然函数应该以边际成功率为条件。^[19]这种被忽略的信息对于推论的目的是否重要仍有争论。^[19]

替代方法

巴纳德检定（英语：Barnard's test）可用于代替费雪检定，^[20]特别是在2 × 2表格的情况有更高的检定力。^[21]此外，博世路检定（英语：Boschloo's test）是另一种精确检定，亦比费雪检定具有更高的检定力。^[22]

对于阶层式的类别资料，必须使用诸如CMH检定（英语：Cochran–Mantel–Haenszel statistics）等考虑采样阶层的方法，而不是费雪检定。

根据给定边际成功率的胜算比的条件分布可以提出基于似然比检定（英语：Likelihood-ratio test）的p值。^[19]此p值在推论上与正态分布数据的经典检定以及基于此条件似然函数的似然比和支持区间一致，并可在R语言上进行运算。^[23]

参考文献

^ Fisher, R. A. On the Interpretation of χ² from Contingency Tables, and the Calculation of P. Journal of the Royal Statistical Society. 1922-01, 85 (1). doi:10.2307/2340521.
^ Fisher, Ronald Aylmer, Sir. Statistical methods for research workers. 14th ed., rev. and enl. Darien, Conn.,: Hafner Pub. Co. 1970. ISBN 0-05-002170-2. OCLC 135627.
^ Agresti, Alan. A Survey of Exact Inference for Contingency Tables. Statistical Science. 1992-02-01, 7 (1). ISSN 0883-4237. doi:10.1214/ss/1177011454.
^ Newman, James R. Mathematics of a Lady Tasting Tea. The world of mathematics. Mineola, N.Y.: Dover Publications. <2000->. ISBN 978-0-486-41153-8. OCLC 43555029. 请检查|date=中的日期值 (帮助)
^ Larntz, Kinley. Small-Sample Comparisons of Exact Levels for Chi-Squared Goodness-of-Fit Statistics. Journal of the American Statistical Association. 1978-06, 73 (362). ISSN 0162-1459. doi:10.1080/01621459.1978.10481567 （英语）.
^ Mehta, Cyrus R.; Patel, Nitin R.; Tsiatis, Anastasios A. Exact Significance Testing to Establish Treatment Equivalence with Ordered Categorical Data. Biometrics. 1984-09, 40 (3). doi:10.2307/2530927.
^ Patel, Nitin R.; SPSS Inc. SPSS exact tests 6.1 for Windows. Chicago, Ill.: SPSS Inc. 1995. ISBN 0-13-450891-2. OCLC 34436454.
^ Mehta, Cyrus R.; Patel, Nitin R. A Network Algorithm for Performing Fisher's Exact Test in r × c Contingency Tables. Journal of the American Statistical Association. 1983-06, 78 (382). doi:10.2307/2288652.
^ Mehta, Cyrus R.; Patel, Nitin R. ALGORITHM 643: FEXACT: a FORTRAN subroutine for Fisher's exact test on unordered r×c contingency tables. ACM Transactions on Mathematical Software. 1986-06, 12 (2). ISSN 0098-3500. doi:10.1145/6497.214326 （英语）.
^ Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D. Large-scale gene function analysis with the PANTHER classification system. Nature Protocols. 2013-08, 8 (8). ISSN 1754-2189. PMC 6519453 . PMID 23868073. doi:10.1038/nprot.2013.092 （英语）. 引文格式1维护：PMC格式 (link)
^ ^11.0 ^11.1 Liddell, Douglas. Practical Tests of 2 × 2 Contingency Tables. The Statistician. 1976-12, 25 (4). doi:10.2307/2988087.
^ ^12.0 ^12.1 Berkson, Joseph. In dispraise of the exact test. Journal of Statistical Planning and Inference. 1978-01, 2 (1). doi:10.1016/0378-3758(78)90019-8 （英语）.
^ ^13.0 ^13.1 D'Agostino, Ralph B.; Chase, Warren; Belanger, Albert. The Appropriateness of Some Common Procedures for Testing the Equality of Two Independent Binomial Populations. The American Statistician. 1988-08, 42 (3). doi:10.2307/2685002.
^ ^14.0 ^14.1 ^14.2 ^14.3 Yates, F. Test of Significance for 2 × 2 Contingency Tables. Journal of the Royal Statistical Society. Series A (General). 1984, 147 (3). doi:10.2307/2981577.
^ ^15.0 ^15.1 ^15.2 ^15.3 Little, Roderick J. A. Testing the Equality of Two Independent Binomial Proportions. The American Statistician. 1989-11, 43 (4). doi:10.2307/2685390.
^ Mehta, Cyrus R.; Senchaudhuri, Pralay. Conditional versus unconditional exact tests for comparing two binomials (PDF). 4 September 2003 [20 November 2009].
^ Barnard, G. A. A New Test for 2 × 2 Tables. Nature. 1945-08, 156 (3954). ISSN 0028-0836. doi:10.1038/156177a0 （英语）.
^ ^18.0 ^18.1 Fisher, R. A. A New Test for 2 × 2 Tables. Nature. 1945-09, 156 (3961). ISSN 0028-0836. doi:10.1038/156388a0 （英语）.
^ ^19.0 ^19.1 ^19.2 ^19.3 Choi, Leena; Blume, Jeffrey D.; Dupont, William D. Olivier, Jake , 编. Elucidating the Foundations of Statistical Inference with 2 x 2 Tables. PLOS ONE. 2015-04-07, 10 (4). ISSN 1932-6203. PMC 4388855 . PMID 25849515. doi:10.1371/journal.pone.0121263 （英语）. 引文格式1维护：PMC格式 (link)
^ Lydersen, Stian; Fagerland, Morten W.; Laake, Petter. Recommended tests for association in 2×2 tables. Statistics in Medicine. 2009-03-30, 28 (7). doi:10.1002/sim.3531 （英语）.
^ Berger R.L. Power comparison of exact unconditional tests for comparing two binomial proportions. Institute of Statistics Mimeo Series No. 2266. 1994: 1–19.
^ Boschloo, R. D. Raised conditional level of significance for the 2 × 2-table when testing the equality of two probabilities. Statistica Neerlandica. 1970-03, 24 (1). ISSN 0039-0402. doi:10.1111/j.1467-9574.1970.tb00104.x （英语）.
^ Choi, Leena. ProfileLikelihood: profile likelihood for a parameter in commonly used statistical models; 2011. R package version 1.1.. 2011.

外部链接

[1] Fisher, R. A. On the Interpretation of χ² from Contingency Tables, and the Calculation of P. Journal of the Royal Statistical Society. 1922-01, 85 (1). doi:10.2307/2340521.

[2] Fisher, Ronald Aylmer, Sir. Statistical methods for research workers. 14th ed., rev. and enl. Darien, Conn.,: Hafner Pub. Co. 1970. ISBN 0-05-002170-2. OCLC 135627.

[3] Agresti, Alan. A Survey of Exact Inference for Contingency Tables. Statistical Science. 1992-02-01, 7 (1). ISSN 0883-4237. doi:10.1214/ss/1177011454.

[newman-4] Newman, James R. Mathematics of a Lady Tasting Tea. The world of mathematics. Mineola, N.Y.: Dover Publications. <2000->. ISBN 978-0-486-41153-8. OCLC 43555029. 请检查|date=中的日期值 (帮助)

[Larntz1978-5] Larntz, Kinley. Small-Sample Comparisons of Exact Levels for Chi-Squared Goodness-of-Fit Statistics. Journal of the American Statistical Association. 1978-06, 73 (362). ISSN 0162-1459. doi:10.1080/01621459.1978.10481567 （英语）.

[Mehta1984-6] Mehta, Cyrus R.; Patel, Nitin R.; Tsiatis, Anastasios A. Exact Significance Testing to Establish Treatment Equivalence with Ordered Categorical Data. Biometrics. 1984-09, 40 (3). doi:10.2307/2530927.

[Mehta1995-7] Patel, Nitin R.; SPSS Inc. SPSS exact tests 6.1 for Windows. Chicago, Ill.: SPSS Inc. 1995. ISBN 0-13-450891-2. OCLC 34436454.

[8] Mehta, Cyrus R.; Patel, Nitin R. A Network Algorithm for Performing Fisher's Exact Test in r × c Contingency Tables. Journal of the American Statistical Association. 1983-06, 78 (382). doi:10.2307/2288652.

[9] Mehta, Cyrus R.; Patel, Nitin R. ALGORITHM 643: FEXACT: a FORTRAN subroutine for Fisher's exact test on unordered r×c contingency tables. ACM Transactions on Mathematical Software. 1986-06, 12 (2). ISSN 0098-3500. doi:10.1145/6497.214326 （英语）.

[10] Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D. Large-scale gene function analysis with the PANTHER classification system. Nature Protocols. 2013-08, 8 (8). ISSN 1754-2189. PMC 6519453 . PMID 23868073. doi:10.1038/nprot.2013.092 （英语）. 引文格式1维护：PMC格式 (link)

[Liddell-1976-11] 11.0 ^11.1 Liddell, Douglas. Practical Tests of 2 × 2 Contingency Tables. The Statistician. 1976-12, 25 (4). doi:10.2307/2988087.

[Berkson1978-12] 12.0 ^12.1 Berkson, Joseph. In dispraise of the exact test. Journal of Statistical Planning and Inference. 1978-01, 2 (1). doi:10.1016/0378-3758(78)90019-8 （英语）.

[DAgostino1988-13] 13.0 ^13.1 D'Agostino, Ralph B.; Chase, Warren; Belanger, Albert. The Appropriateness of Some Common Procedures for Testing the Equality of Two Independent Binomial Populations. The American Statistician. 1988-08, 42 (3). doi:10.2307/2685002.

[Yates1984-14] 14.0 ^14.1 ^14.2 ^14.3 Yates, F. Test of Significance for 2 × 2 Contingency Tables. Journal of the Royal Statistical Society. Series A (General). 1984, 147 (3). doi:10.2307/2981577.

[Little1989-15] 15.0 ^15.1 ^15.2 ^15.3 Little, Roderick J. A. Testing the Equality of Two Independent Binomial Proportions. The American Statistician. 1989-11, 43 (4). doi:10.2307/2685390.

[16] Mehta, Cyrus R.; Senchaudhuri, Pralay. Conditional versus unconditional exact tests for comparing two binomials (PDF). 4 September 2003 [20 November 2009].

[Barnard1945a-17] Barnard, G. A. A New Test for 2 × 2 Tables. Nature. 1945-08, 156 (3954). ISSN 0028-0836. doi:10.1038/156177a0 （英语）.

[NatureDiscussion-18] 18.0 ^18.1 Fisher, R. A. A New Test for 2 × 2 Tables. Nature. 1945-09, 156 (3961). ISSN 0028-0836. doi:10.1038/156388a0 （英语）.

[Choi2015-19] 19.0 ^19.1 ^19.2 ^19.3 Choi, Leena; Blume, Jeffrey D.; Dupont, William D. Olivier, Jake , 编. Elucidating the Foundations of Statistical Inference with 2 x 2 Tables. PLOS ONE. 2015-04-07, 10 (4). ISSN 1932-6203. PMC 4388855 . PMID 25849515. doi:10.1371/journal.pone.0121263 （英语）. 引文格式1维护：PMC格式 (link)

[20] Lydersen, Stian; Fagerland, Morten W.; Laake, Petter. Recommended tests for association in 2×2 tables. Statistics in Medicine. 2009-03-30, 28 (7). doi:10.1002/sim.3531 （英语）.

[21] Berger R.L. Power comparison of exact unconditional tests for comparing two binomial proportions. Institute of Statistics Mimeo Series No. 2266. 1994: 1–19.

[Boschloo-22] Boschloo, R. D. Raised conditional level of significance for the 2 × 2-table when testing the equality of two probabilities. Statistica Neerlandica. 1970-03, 24 (1). ISSN 0039-0402. doi:10.1111/j.1467-9574.1970.tb00104.x （英语）.

[Choi2011-23] Choi, Leena. ProfileLikelihood: profile likelihood for a parameter in commonly used statistical models; 2011. R package version 1.1.. 2011.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

目的与使用情境

例题

争议

替代方法

相关条目

参考文献

外部链接