Hi all,
In this post I will try to discuss some inner details of OpenSmalltalk-VM immediate floats. Immediate floats are present only in 64 bits hence I won’t talk about 32 bits VM in the whole blog post. In addition, OpenSmalltalk-VM supports only double precision IEEE floating pointer, hence I won’t discuss single precision IEEE floating pointer.
Immediate objects
OpenSmalltalk-VM uses an immediate object scheme to represent object oriented pointers (oop) in memory. Basically, due to 64 bits alignment, the last 3 bits of all pointers to objects are 000. This is abused to encode in the oop itself specific objects, in our context, SmallIntegers, Characters and ImmediateFloats. This optimization allows to save memory and to improve performance by avoiding boxing allocation for common arithmetic operations. The last 3 bits of an oop are called a tag. The Immediate Float tag is 100 (4 in decimal). Objects encoded directly in the oop are in our terminology called immediate objects.

Immediate floats
OpenSmalltalk-VM and its clients use the double precision IEEE format to represent floating pointers, supported by most modern hardware.

The key idea to the immediate float design is to use an immediate representation of double precision floats to avoid boxing and save memory, while still being 100% compatible with the IEEE double precision format (Customers requirement).
Therefore, in 64 bits, OpenSmalltalk-VM use two implementations for floats. The most common floats are represented with immediate floats, where 3 bits of the exponents are abused to encode the tags. The rest of the floats are represented as boxed floats.

By design, immediate floats occupy just less than the middle 1/8th of the double range. They overlap the normal single-precision floats which also have 8 bit exponents, but exclude the single-precision denormals (exponent-127) and the single-precision NaNs (exponent +127). +/- zero is just a pair of values with both exponent and mantissa 0.
So the non-zero immediate doubles range from
+/- 0x3800,0000,0000,0001 / 5.8774717541114d-39
to +/- 0x47ff,ffff,ffff,ffff / 6.8056473384188d+38
Encoding and decoding
The encoded tagged form has the sign bit moved to the least significant bit, which allows for faster encode/decode because offsetting the exponent can’t overflow into the sign bit and because testing for +/- 0 is an unsigned compare for <= 0xf.
So given the tag is 4, the tagged non-zero bit patterns are
0x0000,0000,0000,001[c(8+4)]
to 0xffff,ffff,ffff,fff[c(8+4)]
and +/- 0d is 0x0000,0000,0000,000[c(8+4)]
Decoding of non-zero values in machine code is performed as follow:

Encoding of non-zero values in machine code is performed as follow:

Reading floats in general is fairly easy, the VM checks the class index, if the class index of immediate float is present, then the float is decoded from the oop, if the boxed float class index is present, the float is read from the boxed object.
Each primitive operation (arithmetic, comparison, etc.) has now to be implemented twice, once in both classes, where the first operand is expected to be an instance of the class where it is installed. In Smalltalk float primitive operations succeed if the second operand is one of the 2 float classes or a SmallInteger. It fails for large integers and arbitrary objects, in which case the VM takes a slow path to perform correctly the operation.
At the end of arithmetic operations, the resulting float has to be converted back from the unboxed format to either an immediate float or a boxed float. To do so, the VM checks the exponent of the float against the smallFloatExponentOffset, 896. 896 is 1023 – 127, where 1023 is the mid-point of the 11-bit double precision exponent range, and 127 is the mid-point of the 8-bit SmallDouble exponent range. If the exponent is in range, it can be converted to an immediate float. If not, one needs to check if the float is +/- 0, in which case it can still be converted to an immediate float, else it has to be converted to a boxed float. The code looks like that in Slang:
^exponent > self smallFloatExponentOffset
ifTrue: [exponent <= (255 + self smallFloatExponentOffset)]
ifFalse:
[(rawFloat bitAnd: (1 << self smallFloatMantissaBits - 1)) = 0
ifTrue: [exponent = 0]
ifFalse: [exponent = self smallFloatExponentOffset]]
x86_64 encoding/decoding
To conclude the post, here are the instructions generated in x86_64 to encode immediate floats. I put the instruction so that you can see how to encode efficiently using the theoretical design from the figures above and in addition quick checks for +/- 0.
000020da: rolq $1, %r9 : 49 D1 C1
000020dd: cmpq $0x1, %r9 : 49 83 F9 01
000020e1: jbe .+0xD (0x20f0=+@F0) : 76 0D
000020e3: movq $0x7000000000000000, %r8 : 4D B8 00 00 00 00 00 00 00 70
000020ed: subq %r8, %r9 : 4D 2B C8
000020f0: shlq $0x03, %r9 : 49 C1 E1 03
000020f4: addq $0x4, %r9 : 49 83 C1 04
Decoding is easier:
00002047: movq %rdx, %rax : 48 89 D0
0000204a: shrq $0x03, %rax : 48 C1 E8 03
0000204e: cmpq $0x1, %rax : 48 83 F8 01
00002052: jle .+0xD (0x2061=+@61) : 7E 0D
00002054: movq $0x7000000000000000, %r8 : 4D B8 00 00 00 00 00 00 00 70
0000205e: addq %r8, %rax : 49 03 C0
00002061: rorq $1, %rax : 48 D1 C8
00002064: movq %rax, %xmm0 : 66 48 0F 6E C0
Let me know if you have any question or you want me to expand this post with something else.
Note: Part of the blog post was extracted from the SpurMemoryManager class comment on immediate float, I thank Eliot Miranda and other OpenSmalltalk-VM contributors for writing it.
I think that it’s possible to do encoding and decoding slightly more simply than you’ve presented here. Possibly the “simpler” way is not faster, but it is fewer instructions, and does not need 64-bit immediate values so the machine code is more compact.
The exponent field of an IEEE double is eleven bits. That of the immediate float is eight bits. If I understand correctly, the doubles that can be represented as immediate floats are either zero, or have an exponent whose high-order bits are 011. And there is one float with those exponent bits that cannot be immediate, because it would be interpreted as zero.
When encoding, you have separate branches for the zero case and the non-zero case while detecting whether it *can* be an immediate, so it would be OK to use separate code for the actual encoding. And those cases are only two instructions each, a rotate and an add.
Encode when known in-range and non-zero:
rolq $4, %r9
addq $1, %r9 — we know that the low-order bits are 011, so correct that to 100.
Encode when known zero:
rolq $4, %r9
addq $4, %r9 — we know that the low-order bits are 000, so correct that to 100.
Decode could be similarly done with two instructions, but comes out to five instructions to handle both zero and non-zero cases.
Decode, zero or non-zero:
subq $1, %rax — we know that the bottom three bits were 100, correct them for the non-zero case to 011
cmpq $0xB, %rax
jg NONZERO
subq $3, %rax — if it is +-0, high-order bits of the exponent must become 000
NONZERO:
rorq $4, %rax
Or am I missing something?
Regards,
-Martin
…and I *was* missing something. I had the offset wrong. What I said would work if the offset were half what it is, but that would not center the immediate range around 1.0, which is what was designed.
Thanks for the opportunity to re-visit a design from quite a while ago!
-Martin
…but it looks like this approach can still be applied, just not quite as simply. I was thinking that the in-range values for 11-bit exponent had top four bits of 0110 and 0111, but I think it’s actually 0111 and 1000. And if I’m right about that:
Encode zero is still:
rolq $4, %r9
addq $4, %r9
Encode non-zero becomes (thanks to 100 as the desired tag value!):
rolq $5, %r9
addq $1, %r9
rorq $1, %r9
Decode:
rolq $1, %rax
subq $1, %rax
cmpq $0x18, %rax
jg NONZERO
andq $16r10, %rax
NONZERO:
rorq $5, %rax
Once again, I don’t know whether this is any faster, but it does make the code shorter by using 8-bit immediates instead of requiring some 64-bit immediates, and eliminates the use of the scratch register that the 64-bit immediates used.
If I’ve botched this one *again* please let me know.
Thanks,
-Martin