类型双关
类型双关是计算机科学的术语,指任何编程技术能颠覆或者绕过一门程序设计语言的类型系统,以达成在形式语言内部难以甚至不可能实现的效果。
C语言与C++语言,语法结构如类型转换与union
,以及C++增加的reinterpret_cast
运算符,用于实现类型双关。
Pascal语言使用records与variants来按照多种方法处理特定数据类型。
Socket例子
[编辑]Berkeley sockets使用类型双关来处理IP地址。函数bind绑定一个位初始化的套接字到一个IP地址,其声明如下:
int bind(int sockfd, struct sockaddr *my_addr, socklen_t addrlen);
bind
函数通常如此使用:
struct sockaddr_in sa = {0};
int sockfd = ...;
sa.sin_family = AF_INET;
sa.sin_port = htons(port);
bind(sockfd, (struct sockaddr *)&sa, sizeof sa);
这是因为struct sockaddr_in
与struct sockaddr
有相同的内存布局。两个类型的指针可以互相转换。
浮点例子
[编辑]类型双关不仅限于struct。对于浮点数,判断其是否为负值:
bool is_negative(float x) {
return x < 0.0;
}
假定浮点比较的代价高昂,并假定浮点数用IEEE 754标准,就可以用类型双关获取浮点数的符号位(sign bit)做整型比较:
bool is_negative(float x) {
unsigned int *ui = (unsigned int *)&x;
return *ui & 0x80000000;
}
注意有一些特例,如x
是负0,前一种实现返回false
而第二种实现返回true
.
这样的实现适合于实时计算而又不能被优化实现的情形。注意把所有假定均写为注释记录下来,并写入静态断言(static assertions)验证可移植期望是否满足。雷神之锤III竞技场游戏用此方法实现平方根倒数速算法。
使用union
[编辑]为了遵循C99/C++的严格别名规则,可以使用union
:[1]
bool is_negative(float x) {
union {
unsigned int ui;
float d;
} my_union = { .d = x };
return my_union.ui & 0x80000000;
}
其他的类型双关,见数组步长。
Pascal
[编辑]A variant record permits treating a data type as multiple kinds of data depending on which variant is being referenced. In the following example, integer is presumed to be 16 bit, while longint and real are presumed to be 32, while character is presumed to be 8 bit:
type variant_record = record
case rec_type : longint of
1: ( I : array [1..2] of integer );
2: ( L : longint );
3: ( R : real );
4: ( C : array [1..4] of character);
end;
Var V: Variant_record;
K: Integer;
LA: Longint;
RA: Real;
Ch: character;
...
V.I := 1;
Ch := V.C[1]; (* This would extract the first binary byte of V.I *)
V.R := 8.3;
LA := V.L; (* This would store a real into an integer *)
In Pascal, copying a real to an integer converts it to the truncated value. This method would translate the binary value of the floating-point number into whatever it is as a long integer (32 bit), which will not be the same and may be incompatible with the long integer value on some systems.
These examples could be used to create strange conversions, although, in some cases, there may be legitimate uses for these types of constructs, such as for determining locations of particular pieces of data. In the following example a pointer and a longint are both presumed to be 32 bit:
Type PA = ^Arec;
Arec = record
case rt : longint of
1: (P: PA);
2: (L: Longint);
end;
Var PP: PA;
K: Longint;
...
New(PP);
PP^.P := PP;
Writeln('Variable PP is located at address ', hex(PP^.L));
Where "new" is the standard routine in Pascal for allocating memory for a pointer, and "hex" is presumably a routine to print the hexadecimal string describing the value of an integer. This would allow the display of the address of a pointer, something which is not normally permitted. (Pointers cannot be read or written, only assigned .) Assigning a value to an integer variant of a pointer would allow examining or writing to any location in system memory:
PP^.L := 0;
PP := PP^.P; (*PP now points to address 0 *)
K := PP^.L; (*K contains the value of word 0 *)
Writeln('Word 0 of this machine contains ',K);
This construct may cause a program check or protection violation if address 0 is protected against reading on the machine the program is running upon or the operating system it is running under.
C#
[编辑]In C# (and other .NET languages), this is a bit harder to achieve because of the type system, but can be done nonetheless, using pointers or struct unions.
Pointers
[编辑]C# only allows pointers to so-called native types, i.e. any primitive type (except string
), enum, array or struct that is composed only of other native types. Note that pointers are only allowed in code blocks marked 'unsafe'.
float pi = 3.14159;
uint piAsRawData = *(uint*)π
Struct unions
[编辑]Struct unions are allowed without any notion of 'unsafe' code, but they do require the definition of a new type.
[StructLayout(LayoutKind.Explicit)]
struct FloatAndUIntUnion
{
[FieldOffset(0)]
public float DataAsFloat;
[FieldOffset(0)]
public uint DataAsUInt;
}
// ...
FloatAndUIntUnion union;
union.DataAsFloat = 3.14159;
uint piAsRawData = union.DataAsUInt;
Raw CIL code
[编辑]Raw CIL can be used instead of C#, because it doesn't have most of the type limitations. This allows one to, for example, combine two enum values of a generic type:
TEnum a = ...;
TEnum b = ...;
TEnum combined = a | b; // illegal
This can be circumvented by the following CIL code:
.method public static hidebysig
!!TEnum CombineEnums<valuetype .ctor ([mscorlib]System.ValueType) TEnum>(
!!TEnum a,
!!TEnum b
) cil managed
{
.maxstack 2
ldarg.0
ldarg.1
or // this will not cause an overflow, because a and b have the same type, and therefore the same size.
ret
}
The cpblk
CIL opcode allows for some other tricks, such as converting a struct to a byte array:
.method public static hidebysig
uint8[] ToByteArray<valuetype .ctor ([mscorlib]System.ValueType) T>(
!!T& v // 'ref T' in C#
) cil managed
{
.locals init (
[0] uint8[]
)
.maxstack 3
// create a new byte array with length sizeof(T) and store it in local 0
sizeof !!T
newarr uint8
dup // keep a copy on the stack for later (1)
stloc.0
ldc.i4.0
ldelema uint8
// memcpy(local 0, &v, sizeof(T));
// <the array is still on the stack, see (1)>
ldarg.0 // this is the *address* of 'v', because its type is '!!T&'
sizeof !!T
cpblk
ldloc.0
ret
}
参考文献
[编辑]- ^ ISO/IEC 9899:1999 s6.5/7
- ^ GCC: Non-Bugs. [2017-11-20]. (原始内容存档于2021-03-25).
外部链接
[编辑]- Section of the GCC manual on -fstrict-aliasing, which defeats some type punning
- Defect Report 257 (页面存档备份,存于互联网档案馆) to the C99 standard, incidentally defining "type punning" in terms of
union
, and discussing the issues surrounding the implementation-defined behavior of the last example above - Defect Report 283 (页面存档备份,存于互联网档案馆) on the use of unions for type punning