字节

字节的次方单位
十进制前缀; (SI)
二进制前缀; (IEC 60027-2)
	查; 论; 编;

字节（英语：byte）是通常用在电脑、手机及智能手表等设备上的信息计量单位，不分数据类型。^[1]^[2] 。一个字节代表八个比特。从历史的观点上，“字节”表示用于编码单个字符所需要的比特数量，因此它是许多计算机体系结构中最小的可寻址内存单元。历史上字节长度曾基于硬件为1-48 bit不等，最初通常使用6 bit或9 bit为一字节。今日标准以8 bit作为一字节。为了消除常见8 位定义中任意大小的字节的歧义，八个比特在一些规范（例如工业标准、计算机网络、电信技术等）中常被称为八位组（octet）。Internet 协议（RFC 791 ) 将 8 位字节称为八位字节。^[3]

国际电工委员会(IEC) 和电气与电子工程师协会(IEEE) 将字节的单位符号指定为大写字母 B。例如MB表示兆字节（megabyte）；比特（bit）可缩写成b，例如Mb表示兆比特（megabit（英语：megabit）），与字节进行区分。国际上，单位八位字节（octet，符号 o）明确定义了八位的序列，消除了术语“字节”的潜在歧义。

字节的大小历来取决于硬件，并且不存在强制规定大小的明确标准。曾经使用过的字节的大小包含 1 到 48 位。六位字符代码是早期编码系统中常用的实现方式，使用六位和九位字节的计算机在 20 世纪 60 年代很常见。这些系统通常具有 12、18、24、30、36、48 或 60 位的存储器字，对应于 2、3、4、5、6、8 或 10 个六位字节。在术语“字节”变得普遍之前，指令流中的位分组通常被称为syllable^[a]或slab 。

ISO/IEC 2382-1:1993 中记录的现代事实上的标准（8位）是相对方便的2 的幂，因为 2 的 8 次方是 256，允许一个字节使用 0 到 255的二进制编码值。国际标准IEC 80000-13定义了这一常见含义。许多类型的应用程序使用可用八位或更少位表示的信息，并且处理器设计者通常针对这种用法进行优化。主要商业计算架构的普及有助于 8 位字节的普遍接受。现代架构通常使用 32 位字或 64 位字，分别由 4 个或 8 个字节构成。

历史

字节 (byte) 一词由Werner Buchholz（英语：Werner Buchholz）于 1956 年 6 月创造，当时正值 IBM Stretch 计算机的早期设计阶段，该计算机具有位寻址和可变字段长度 (VFL) 指令，指令中编码了字节大小。这是 bite 的故意拼写，以避免意外更改为 bit。

字节的另一个起源是用于表示小于计算机字大小的位组，特别是四位组，由 Louis G. Dooley 记录，他声称他在 1956 年或 1957 年与 Jules Schwartz 和 Dick Beeler 在麻省理工学院林肯实验室合作开发名为 SAGE 的防空系统时创造了该术语，该系统由兰德公司、麻省理工学院和 IBM 联合开发。后来，施瓦茨的语言 JOVIAL 实际上也使用了这个术语，但作者隐约记得它源自 AN/FSQ-31。

早期的计算机使用各种四位二进码十进数表示法和美国陆军 (FIELDATA) 和海军中常见的可打印图形图案的六位代码。这些表示法包括字母数字字符和特殊图形符号。这些集合在 1963 年扩展为七位编码，称为美国信息交换标准代码 (ASCII)，即联邦信息处理标准，它取代了 20 世纪 60 年代美国政府不同部门和大学使用的不兼容电传打字机代码。ASCII 包括大小写字母的区别和一组控制字符，以方便书面语言的传输以及打印设备功能，例如页面前进和换行，以及对传输介质上数据流的物理或逻辑控制。20 世纪 60 年代初，IBM 积极参与 ASCII 标准化工作，同时在其 System/360 产品线中引入了 8 位EBCDIC，这是早期卡片打孔机中使用的 6 位二进制编码十进制 (BCDIC) 表示法的扩展。 System/360 的突出地位导致人们普遍采用 8 位存储大小，而 EBCDIC 和 ASCII 编码方案在细节上有所不同。

20 世纪 60 年代初，AT&T 在长途中继线上引入了数字电话。这些电话使用了 8 位 μ 律编码。这项大笔投资有望降低 8 位数据的传输成本。

在《计算机编程艺术》第 1 卷（首次出版于 1968 年）中，Donald Knuth 在他假想的 MIX 计算机中使用字节来表示“包含未指定数量的信息……能够容纳至少 64 个不同值……最多 100 个不同值。因此，在二进制计算机上，一个字节必须由六位组成”。他指出，“自 1975 年左右以来，字节一词的含义已经精确地变成了八个二进制数字的序列……当我们在 MIX 中谈论字节时，我们将局限于该词以前的含义，回溯到字节尚未标准化的时代。”

20 世纪 70 年代八位微处理器的发展使这种存储大小变得流行起来。微处理器（例如 8086 的直接前身 Intel 8080）也可以对一个字节中的四位对执行少量操作，例如十进制加法调整 (DAA) 指令。四位数通常称为半字节，也称为 nybble，用一个十六进制数字表示很方便。

术语八位字节（octet）明确指定了八位的大小。它在通讯协议定义中被广泛使用。

注释

^ 术语“syllable”用于包含指令或指令组成部分的字节，而不是数据字节。

参考资料

^ Blaauw, Gerrit Anne; Brooks, Jr., Frederick Phillips; Buchholz, Werner, 4: Natural Data Units, Buchholz, Werner (编), Planning a Computer System – Project Stretch (PDF), McGraw-Hill Book Company, Inc. / The Maple Press Company, York, PA.: 39–40, 1962 [2017-04-03], LCCN 61-10466, （原始内容 (PDF)存档于2017-04-03）, […] Terms used here to describe the structure imposed by the machine design, in addition to bit, are listed below.
Byte denotes a group of bits used to encode a character, or the number of bits transmitted in parallel to and from input-output units. A term other than character is used here because a given character may be represented in different applications by more than one code, and different codes may use different numbers of bits (i.e., different byte sizes). In input-output transmission the grouping of bits may be completely arbitrary and have no relation to actual characters. (The term is coined from bite, but respelled to avoid accidental mutation to bit.)
A word consists of the number of data bits transmitted in parallel from or to memory in one memory cycle. Word size is thus defined as a structural property of the memory. (The term catena was coined for this purpose by the designers of the Bull fr computer.)
Block refers to the number of words transmitted to or from an input-output unit in response to a single input-output instruction. Block size is a structural property of an input-output unit; it may have been fixed by the design or left to be varied by the program. […]
^ Bemer, Robert William, A proposal for a generalized card code of 256 characters, Communications of the ACM, 1959, 2 (9): 19–23, doi:10.1145/368424.368435
^ Postel, J.. Internet Protocol DARPA INTERNET PROGRAM PROTOCOL SPECIFICATION. September 1981: p. 43 [28 August 2020]. RFC 791 （英文）. octet An eight bit byte.

延伸阅读

Tafel, Hans Jörg. 写于RWTH, Aachen, Germany. Einführung in die digitale Datenverarbeitung [Introduction to digital information processing]. Munich, Germany: Carl Hanser Verlag. 1971: 300. ISBN 3-446-10569-7 （德语）. Byte = zusammengehörige Folge von i.a. neun Bits; davon sind acht Datenbits, das neunte ein Prüfbit (NB. Defines a byte as a group of typically 9 bits; 8 data bits plus 1 parity bit.)
Programming with the PDP-10 Instruction Set (PDF). PDP-10 System Reference Manual 1. Digital Equipment Corporation (DEC). August 1969 [2017-04-05]. （原始内容存档 (PDF)于2017-04-05）.
Computer History Museum – Exhibits – Internet History – 1964: Internet History 1962 to 1992. Computer History Museum. 2017 [2015] [2017-04-03]. （原始内容存档于2017-04-03）.
Jaffer, Aubrey. Metric-Interchange-Format. 2011 [2008] [2017-04-03]. （原始内容存档于2017-04-03）.
Kozierok, Charles M. The TCP/IP Guide – Binary Information and Representation: Bits, Bytes, Nibbles, Octets and Characters – Byte versus Octet. 3.0. 2005-09-20 [2001] [2017-04-03]. （原始内容存档于2017-04-03）.

参阅

八字节（octet）

外部链接

ГОСТ 8.417-2002 | Страница 19 （俄文）

[4] 术语“syllable”用于包含指令或指令组成部分的字节，而不是数据字节。

[Buchholz_1962-1] Blaauw, Gerrit Anne; Brooks, Jr., Frederick Phillips; Buchholz, Werner, 4: Natural Data Units, Buchholz, Werner (编), Planning a Computer System – Project Stretch (PDF), McGraw-Hill Book Company, Inc. / The Maple Press Company, York, PA.: 39–40, 1962 [2017-04-03], LCCN 61-10466, （原始内容 (PDF)存档于2017-04-03）, […] Terms used here to describe the structure imposed by the machine design, in addition to bit, are listed below.
Byte denotes a group of bits used to encode a character, or the number of bits transmitted in parallel to and from input-output units. A term other than character is used here because a given character may be represented in different applications by more than one code, and different codes may use different numbers of bits (i.e., different byte sizes). In input-output transmission the grouping of bits may be completely arbitrary and have no relation to actual characters. (The term is coined from bite, but respelled to avoid accidental mutation to bit.)
A word consists of the number of data bits transmitted in parallel from or to memory in one memory cycle. Word size is thus defined as a structural property of the memory. (The term catena was coined for this purpose by the designers of the Bull fr computer.)
Block refers to the number of words transmitted to or from an input-output unit in response to a single input-output instruction. Block size is a structural property of an input-output unit; it may have been fixed by the design or left to be varied by the program. […]

[Bemer_1959-2] Bemer, Robert William, A proposal for a generalized card code of 256 characters, Communications of the ACM, 1959, 2 (9): 19–23, doi:10.1145/368424.368435

[3] Postel, J.. Internet Protocol DARPA INTERNET PROGRAM PROTOCOL SPECIFICATION. September 1981: p. 43 [28 August 2020]. RFC 791 （英文）. octet An eight bit byte.

[1]

[2]

[3]

[a]

查论编进制的记数系统
基本进位制	一进制二进制三进制四进制五进制六进制七进制八进制九进制十进制十一进制十二进制十三进制十四进制十五进制十六进制十七进制十八进制十九进制二十进制二十六进制三十六进制六十进制六十四进制一百进制
平衡进位制	平衡三进制
广义的进制系统	Base64 十进位制二进指数法非整数进位制根号2进制黄金进制 e进制负底数进制（英语：Negative base）复底数进制 2i进制 -1±i进制混合底数（英语：Mixed radix）阶乘进制斐波那契编码双射记数系统
相关条目	底数数位比特字节进位制米迪定理记数系统

查论编数据类型
无解释的	比特字节三进制位三进制字节字
数值	整数符号性有符号数无符号数定点数浮点数双精度扩展精度（英语：Extended precision）半精度迷你浮点数八倍精度（英语：Octuple-precision floating-point format）四倍精度（英语：Quadruple-precision floating-point format）单精度有理数（英语：Rational data type）复数（英语：Complex data type）任意精度算术区间（英语：interval arithmetic）
文本	字符字符串
指针	存储器地址物理地址虚拟地址引用
组合	代数数据类型广义（英语：generalized algebraic data type）数组关联数组类串列对象元对象可选类型积类型（英语：Product type）记录集合元组联合体标签
其他	布尔型底层类别（英语：Bottom type）容器枚举类型异常头等函数不透明数据类型（英语：Opaque data type）递归数据类型信号标字符串流顶类型（英语：Top type）类型类类型系统单位类型（英语：Unit type） Void 不定类型
相关议题	抽象资料类型数据结构接口种类（英语：Kind (type theory)）元类对象类型（英语：Boxing (computer programming)）原始类型与复合类型协议子类型 C++模板类型构造器参数多态