7-point summary of the Spur memory manager

06 Thursday Feb 2014

You are a Smalltalker, a Pharoer or a Squeaker. Word has been that Eliot Miranda has implemented a new Memory Manager (Spur) for the Cog VM. You are wondering how it will change your work flow with your favorite Smalltalk dialect. This one page article was made for you.

Spur features

1) Fast become: As you know Smalltalk features become: which allows you to swap two objects in memory. This is used for example when you add an instance variable to a class, then all the class instances are migrated to support the extra instance variable field by becoming a new object with the same values but an extra slot. This operation used to be very slow, making large images hard to work with. It will now be much faster.

2) Ephemerons: You can now instantiate efficient Ephemerons. An Ephemeron is an object which refers strongly to its contents as long as the Ephemeron’s key is not garbage collected, and weakly from then on.

3) Pinned objects: You can now mark an object to have a fixed position in memory. Then this object will never be moved in the memory, the garbage collector will not move it. This simplifies some FFI (Foreign function interface) implementations.

4) Variable sized and segmented memory: The memory allocated in the operating system by the virtual machine will now grow and shrink according to the image size. Have you ever encountered an ‘out of memory’ error while reaching a 1 Gb image on your 8 Gb RAM computer ? This will not happen any more. In addition, the heap’s memory will be divided in several segments, making the allocation of big images simpler, especially on windows.

5) Object format 64 bit compatible: As you know the main Pharo and Squeak release are not yet 64 bit compatible. One main problem was the object format that was not 64 bit compatible and this problem is solved. The 64 bit VM is not ready, but a huge step in that direction was done. This means that in the very near future you should be able to use a 64 bit Stack VM.

6) Incremental and efficient garbage collector: Sometimes in big images (over 300Mb) the image become unresponsive during several seconds because the VM is performing a full garbage collect. Most of the time, the garbage collection will now be split in multiple steps, each step being fast to execute and being executed on a regular basis. Therefore, you will not have any more this full garbage collector pauses. In addition, the garbage collector is more efficient, meaning that big images (let’s say up to 1Gb) will be much faster and responsive.

7) Performance improvement: classic benchmarks are run around 35% faster with Spur as Cog’s memory manager. This is due to (from most to least important):

the new GC has less overhead.
the new object format speeds up some VM internal caches.
characters are now immediate objects, which speeds up String accessing.
the new object format has a larger hash which speeds up big hashed collections such as big sets and dictionaries.
become is faster.

Expected dead lines

When will you be able to use Spur in your favorite Smalltalk dialect ?

Pharo: Spur is currently planned for Pharo 4 (alpha version in April, beta version in december, release around April 2015). The first alpha version of Pharo 4 with Spur should be ready this summer.
Squeak: Spur is planned for Squeak 5.
NewSpeak: the deployment of Spur on NewSpeak is currently a work in progress.

Thanks Eliot for this new Memory Manager !

17 thoughts on “7-point summary of the Spur memory manager”

Nicolas Cellier said:

February 6, 2014 at 3:25 pm

Very good summary. Maybe you could also add that Character will now be immediate, which should also fast-up String oriented benchmarks/applications.

Reply
- clementbera said:
  
  February 6, 2014 at 4:15 pm
  
  Right ! I added it under performance improvement.
  
  Reply
- Ben Coman said:
  
  February 6, 2014 at 5:17 pm
  
  In what way would the speed-up for Characters improve Unicode performance, and/or would it be worthwhile having an immediate representation of UnicodeCharacters? I found Roassal to go an order of magnitude slower when displaying labels with Unicode (but there may be many reasons for that)
  
  Reply
Pingback: 7-point summary of the Spur memory manager | The Weekly Squeak
Philippe Back (@philippeback) said:

February 6, 2014 at 8:43 pm

Excellent news. What would be a usable 64-bit memory in terms of size?

Reply
Eliot Miranda said:

February 6, 2014 at 10:09 pm

Hi Ben,

here are current Squeak/Pharo definitions for wide string access.

WideString>>at: index
“Answer the Character stored in the field of the receiver indexed by the argument.”
^ Character value: (self wordAt: index).

Character class>>value: anInteger
“Answer the Character whose value is anInteger.”

anInteger > 255 ifTrue: [^self basicNew setValue: anInteger].
^ CharacterTable at: anInteger + 1.

Currently characters are non-immediate and only the first 256 are unique. So every access of a character in a wide string requires an allocation of a character, if its code is > 255. In Spur, with immediate characters there is no instantiation. Further, because characters are simpler the JIT can produce machine-code to implement the primitives for WideString at: and at:put:, so WideString>>at: and WideString>>#at:put: are primitive. So access is very much faster.

Here are some numbers measured on my 2.2GHz Core i7 MacBook Pro running 10.6.8.

In current Cog

| ws |
ws := ‘Hello world!’ asWideString.
[1 to: 100 * 1000 * 1000 do: [:i| ws at: 1]] timeToRun 2693

and

| ws |
ws := ‘Hello world!’ asWideString.
ws at: 1 put: (Character value: 256).
[1 to: 100 * 1000 * 1000 do: [:i| ws at: 1]] timeToRun 6100

If you redefine WideString>>at: in Spur to read

WideString>>at: index
“Primitive. Answer the Character stored in the field of the receiver
indexed by the argument. Fail if the index argument is not an Integer or
is out of bounds. Essential. See Object documentation whatIsAPrimitive.”

^ Character value: (self wordAt: index).

then

| ws |
ws := ‘Hello world!’ asWideString.
[1 to: 100 * 1000 * 1000 do: [:i| ws at: 1]] timeToRun 711

and

| ws |
ws := ‘Hello world!’ asWideString.
ws at: 1 put: (Character value: 256).
[1 to: 100 * 1000 * 1000 do: [:i| ws at: 1]] timeToRun 761

These numbers are very noisy because I’ve got two images, Safari, Chrome and other things running. But you can see that in the case where the character is > 8 bits, the Spur code is > 8 times faster.

Reply
Sebastian Sastre said:

February 7, 2014 at 10:56 am

Amazing work!

Thanks and hats off

Reply
Carl Gundel said:

February 7, 2014 at 3:27 pm

Nice work, as always.

Reply
Pingback: 5000 views ! Thank you ! | Clément Béra
Pingback: Squeak / Pharo VM documentation links | Clément Béra
Pingback: Sortie du langage Pharo et de son environnement de développement en version 5.0 – My Tiny Tools
Pingback: Sortie du langage Pharo et de son environnement de développement en version 5.0 – Open Romandie
louis michel said:

April 3, 2017 at 1:18 pm

morning, i want to know all about pharo’s virtual machine, how is it’s implement and it’s specificity with the virtual machine of java

Reply
- Clement Bera said:
  
  April 3, 2017 at 1:57 pm
  
  Hi Louis,
  
  Ask this question on the VM mailing list (subscribe here http://lists.squeakfoundation.org/mailman/listinfo/vm-dev), many people can answer this question.
  
  The Pharo VM is a fork of the Squeak VM described in the Back to the Future paper written by Dan Ingalls (you may google it). The paper explains how it is implemented in a restricted Smalltalk compiling through C to machine code.
  
  There are many differences with the Java VM, I can’t list them all. Pharo supports features not present in the JVM such as non local returns, a different execution stack reification, the become operations to exchange references, etc.
  
  Reply
  - louis michel said:
    
    April 3, 2017 at 2:05 pm
    
    ok thanks, can you give me some books to learn more about pharo’VM and JavaVM, thrue that i will be abble to explain and give the differences
Kenly said:

October 17, 2020 at 10:25 pm

Curios if it uses buddy system memory technique? Or similar. Not sure of running Linux takes care of this. It seems squeak should run much faster in Linux

Reply
Eliot Miranda said:

January 18, 2021 at 5:06 pm

Hi Kenley,

no, Spur does not use any kind of buddy algorithm. The primary allocator/collector is a conventional generation scavenger . Old space is managed by a conventional mark-sweep-compact collector with free lists/free trees. Both collectors are extended with support for weak references and ephemerons.

The old space free list is a vector as long as the word size. Indexes 2 to word size minus one (2 to 31/63) are the heads of free lists of all chunks of that size. Free chunks of size 1 are invalid in Spur. All objects must contain at least two words, a header and space for an indirection pointer. Hence the index 1 entry is always empty. The zeroth element is the root of the tree of free chunks of word size or larger, organized as an ordered binary tree sorted on size.

Each node in the tree is the head of a list of all free chunks of the same size as the node. The left subtree points to all nodes smaller than it, the right to nodes larger than it. Allocation is exact fit or best fit, preferentially from the element next to the node whose size is the best fit. We do not spend significant effort to keep the tree balanced. If a node has to be split we perform at most one rotation to keep some semblance of balance in the tree. We do not go as far as AVL trees.

In terms of the interface between the memory manager and the operating system, old space is allocated in memory mapped segments. At startup we allocate the initial old space segment, making it large enough to contain the entire image, and one empty segment. The size of an empty segment is a parameter one can set and defaults to 16Mb. Allocations of huge objects typically involve allocating a segment big enough to contain the object itself. I was running the system recently using the VM simulator and allocated a 17Gb (!!) ByteArray, itself the result of several doublings from a ByteArray in the 20Mb range.

On compacting old space empty segments are freed and given back to the OS via unmapping, freeing enough segments to bring free memory down to the value of another parameter one can set. After compaction the free lists are typically empty and the fr5ee tree consists of one node for every free segment or remaining free space in each occupied segment remaining.

We plan to extend the old space collector to make it incremental. Spur is one of the few memory architectures that can support efficiently and naturally incremental compaction. Clément’s paper here describes this scheme:

Lazy pointer update for low heap compaction pause times
Clément Béra, Eliot Miranda, Elisa Gonzalez Boix.
15th ACM SIGPLAN International Symposium on Dynamic Languages, October 2019
DOI: 10.1145/3359619.3359741

“To keep applications highly responsive, garbage collectors (GCs) try to minimize interruptions of the application threads. While pauses due to non moving GCs can be drastically reduced through concurrent or incremental strategies, compaction pauses remain a big problem. A strategy to decrease stop the world compaction pauses is to compact subsets of the heap at any one time. But this only reduces the time spent in moving compacted objects, not the time spent updating all references to those objects, which may be significant in large heaps. In this paper, we propose to only move compacted objects during the compaction pause, replacing moved objects by low-overhead forwarding objects. References to compacted objects are lazily updated while the application is running and during the next GC marking phase, outside of the compaction pause. We evaluate our technique on a suite of high workload (2 to 14Gb) benchmarks built from a real industrial application. Results show that not updating pointers during the compaction pause decreases the median pause up to 31% and the longest pause up to 71% on these benchmarks, while the forwarding objects slow down execution time without GC by no more than 1%.”

Does this answer your question adequately?

Reply