Improved Tile Culling for a CCSpriteBatchNode

Trisector uses tile maps for the Base terrain layer and the Collision terrain layer. The tiles for each layer are managed using a CCSpriteBatchNode which includes the tiles being displayed in the current viewport. This article will discuss two methods of tile culling for a CCSpriteBatchNode, where the first is a node based approach (slower) and the second is a quad based approach (faster).

There are a bunch of tiny parts that make the engine work, but the tile creation and culling is very important from a performance perspective. The tile structure and culling was heavily refactored during development, and the following is the most efficient and fastest out of 6 or so refactorings.

Before discussing the culling, it’s important to understand how the tile layer data is stored in memory. Each tile layer is represented by a NSMutableArray that includes additional NSMutableArrays for each column in the tile map. These additional NSMutableArrays include the CCSprites for each tile in that column. By using column arrays, it was simply a matter of finding the column array, and then adding or removing those CCSprites to/from the CCSpriteBatchNode. Here is a diagram that shows the structure of the baseColumnArray:

Layer Array to Column Array to Sprite Structure

In the above example, the sprites for the base layer in the first column of the map may be found by finding the column array 1 in position 1 of the baseColumnArray, and then iterating on the sprites in the column array 1, which includes sprites for tile positions 1, 2, 3, 13, and 14. Additionally, column 5 has sprites created for tile positions 3, 4, 7 and 13, and column 9 has sprites created for tile positions 5 and 10. Here is a visualization of the tiles from the above example:

Layer Array to Column Array to Sprite Example

The same process is repeated for the Collision tile layer. Once both the Base and Collision layer arrays are created and are full of sprites, the initial set of tiles, which match the device viewport tile width (e.g. 15-18 tiles wide depending on device), are added to their respective CCSpriteBatchNode. After the game tiles are fully loaded, the game is started and the interval timer starts ticking.

The culling is performed each major tick and is responsible for adding and removing columns of sprites from the CCSpriteBatchNode for each layer. Since Trisector uses maps that are 15 tiles high, that means the culling adds a maximum of 15 new tile sprites and removes a maximum of 15 old tile sprites from the CCSpriteBatchNode each major tick.

First Culling Method:

The first culling method here follows the node parent/child structure that is used for in cocos2d. In this example, new tiles are added to the CCSpriteBatchNode as children, and old tiles are removed from the CCSpriteBatchNode after they scroll off screen. The performance was good for all devices except the iPhone4,1; iPod4,1; and iPad1,1. These devices were not always rendering at a smooth 60fps when using this first culling method.

-(void) updateCollisionTiles;
{
    // collision: cull those now just offscreen
    if (collisionColumnFirst < [collisionColumnArray count] && collisionColumnFirst >= 0){
        [self removeColumnArraySpritesFromLayer:collisionLayer array:collisionColumnArray first:collisionColumnFirst];
    }
    // update the current position for the first column
    collisionColumnFirst++;
    
    // collision: show those about to come on screen
    if (collisionColumnLast < [collisionColumnArray count] && collisionColumnLast >= 0){
        [self addColumnArraySpritesToLayer:collisionLayer array:collisionColumnArray last:collisionColumnLast];
    }
    // update the current position for the last column
    collisionColumnLast++;
}

The following method named addColumnArraySpritesToLayer is responsible for adding a set of sprites for a particular column to the CCSpriteBatchNode for a given layer (e.g. CCSpriteBatchNode for the Collision layer):

-(void) addColumnArraySpritesToLayer:(CCSpriteBatchNode *) targetLayer array:(NSMutableArray *) sourceArray last:(NSInteger) targetColumn;
{
    // take a column array from the source array 
    NSMutableArray *targetColumnSingle = [sourceArray objectAtIndex:targetColumn];
    
    // since the layer is a Batch Node, add each ccsprite in column to parent layer
    for (int i=0; i<[targetColumnSingle count]; i++){
        // add sprite to batch node if it's not already added
        CCSprite *tileSprite = [targetColumnSingle objectAtIndex:i];
        
        // make sure parent is nil so this sprite isn't already added
        if (tileSprite.parent == nil){
            [targetLayer addChild:tileSprite z:0 tag:targetColumn]
        }
        tileSprite = nil;
    }
    targetColumnSingle = nil;
}

The following method named removeColumnArraySpritesFromLayer is responsible for removing a set of sprites for a particular column from the CCSpriteBatchNode for a given layer (e.g. CCSpriteBatchNode for the Collision layer):

-(void) removeColumnArraySpritesFromLayer:(CCSpriteBatchNode *) targetLayer array:(NSMutableArray *) sourceArray first:(NSInteger) targetColumn;
{
    // take a column array from the source array
    NSMutableArray *targetColumnSingle = [sourceArray objectAtIndex:targetColumn];
    
    // since the layer is a Batch Node, remove each ccsprite in this column from parent layer
    for (int i=0; i<[targetColumnSingle count]; i++){
        // add sprite to batch node if it's not already added
        CCSprite *tileSprite = [targetColumnSingle objectAtIndex:i];

        // make sure parent is not nil so that means it exists in the layer
        if (tileSprite.parent != nil){
            [targetLayer removeChild:tileSprite cleanup:NO];
        }
        tileSprite = nil;
    }
    targetColumnSingle = nil;
}

Note that the CCSprite memory is not being released when the CCSprite is removed from the targetLayer, since the associated sourceArray is still the true owner of that CCSprite. It’s important to make sure that all of the CCSprite objects are properly released from all of the owning source arrays and column arrays during dealloc as to not leak a ton of memory.

About 90% of Trisector’s code is designed to adhere to the ARC memory conventions, but some of these cocos2d objects needed to be specifically alloc’d and dealloc’d in order to be property retained during run time. It’s important to figure out what allocates the object and when to release that object after the level attempt has completed.

Second Culling Method:

I started investigating what could be done with adding the quads directly to the quad array of the CCTextureAtlas in the CCSpriteBatchNode. The second culling method adds the quads for the sprites directly to the CCTextureAtlas, and then performs culling by overwriting the quads for the tile sprites that are off-screen with new sprites that are coming on-screen. One benefit of the first method is that the CCTextureAtlas of the CCSpriteBatchNode only has quads for the sprites that are on-screen and does not have any null quads for the empty tile positions in the layer. Even with some quad memory wasted to null quads, the second culling method is still much faster than the first node based culling method.

Here’s an example of the quad array for the CCTextureAtlas of the CCSpriteBatchNode that is going to be described below. In this example, there are 18 tile columns (0..17) with 15 tiles per column. The shaded positions indicate a quad associated with a displayed tile sprite, and the white positions indicate a zero’d quad.

CCSpriteBatchNode Tile Quad Culling

In the above, the quad array for the CCTextureAtlas of the CCSpriteBatchNode is loaded with a quad for each tile sprite for columns 0..17. At the next major tick, column 0 needs to be culled since it’s now off-screen and column 18 needs to be added since it’s about to come on-screen. Thus, the quads corresponding to column 0 are overwritten with quads corresponding to column 18 and then filled with blank quads to fully clear it out. I didn’t notice a significant difference in performance between zeroing the old non-overwritten quads vs just leaving the old non-overwritten quads in place to be culled by the GPU.

The setupInitialTileQuads method inserts quads for each tile in the initial columns shown in the viewport into the quad array. The purpose of this is to ensure that the quad array is full of quads matching the viewport (e.g. viewport width * map height, which is somewhere in the range of 240 to 285 quads depending on the device) and to ensure that quad array for the CCTextureAtlas of the CCSpriteBatchNode is at max capacity so there is not any resizing during gameplay.

-(void) setupInitialTileQuads;
{
    // add a column strip to batch node. setup the initial set of 18 or so columns (device dependent)
    for (NSInteger col=0; col<GAME_BOARD_VIEWPORT_COLS; col++){
        // collisionLayer is the batch node
        // collisionColumnArray is the array of tile column arrays, which are full of formed sprites
        [self insertFullColumnQuads:collisionLayer tileArray:collisionColumnArray colNumber:col];
    }
    
    // reset the first column.
    collisionColumnFirst = 0;
}

The insertFullColumnQuads method inserts a quad for each tile in the target column into the quad array for the CCTextureAtlas of the CCSpriteBatchNode. In order to setup the quad array for the CCTextureAtlas of the CCSpriteBatchNode, a maximum number of initial quads are added for each column, where the quads correspond to the tiles in the column array and the remainder are zero’d quads to finish out the column.

-(void) insertFullColumnQuads:(CCSpriteBatchNode *) targetLayer tileArray:(NSMutableArray *) sourceArray colNumber:(NSInteger) targetColumn;
{
    // make sure the column is in range
    if (targetColumn < [sourceArray count]){ 

         // get the column array of sprites to add (0..N sprites)
        NSMutableArray *spritesForColArray = [sourceArray objectAtIndex:targetColumn];
        
        // determine the first quad position in the quad array for this column
        // e.g. given a map height of 15: Col 0*15=0, Col 1*15=15, Col 2*15=30... 2*N=whatever
        NSInteger firstQuadIndex = targetColumn * mapDimensions.y;
        
        // add each ccsprite in this column array to the batch node as a quad (not added as a child node)
        // there is one quad for each tile in the y axis for this column
        for (int y=0; y<mapDimensions.y; y++){
 
            // if there is a quad left to add from the column
            if (y<[spritesForColArray count]){
                
                // the tileSprite is already fully formed with the quad in the spritesForColArray
                CCSprite *tileSprite = [spritesForColArray objectAtIndex:y];
                
                // set the position in the quad list relative to the first quad
                [tileSprite setAtlasIndex:firstQuadIndex+y];
                
                // now add the quad to the batch node
                [targetLayer insertQuadFromSprite:tileSprite];
                
                // cleanup
                tileSprite = nil;
            }
            // insert the remaining quads in this column with the zero quad
            else {
                // add this quad as zeros
                [targetLayer updateZeroQuadAtIndex:firstQuadIndex+y];
            }
        } // end for each y in the column 
        
        // cleanup
        spritesForColArray = nil;
    } // end in array range
}

Now that the initial set of quads are setup in the quad array for the CCTextureAtlas of the CCSpriteBatchNode, now it’s time to perform the actual culling that happens during the game. The cullColTileVisibility method is called each major tick and is responsible for culling the Collision layer tiles that have gone off-screen and adding new tiles that are about to come on-screen.

-(void) cullColTileVisibility;
{
    // range check that the end is still inside the sprite col array and that the start valid
    if (collisionColumnFirst+GAME_BOARD_VIEWPORT_COLS < [collisionColumnArray count] && collisionColumnFirst >= 0){

        // replace quads for the column that just went off screen with those that are about to come on screen
        [self addRemoveColumnArraySpritesToFromLayer:collisionLayer
                                           tileArray:collisionColumnArray
                                           removeCol:collisionColumnFirst
                                              addCol:collisionColumnFirst+GAME_BOARD_CUTOFF_DEF];
    }
    collisionColumnFirst++;    
}

This cullColTileVisibility method makes a call to the addRemoveColumnArraySpritesToFromLayer method that performs all of the culling work. The addRemoveColumnArraySpritesToFromLayer method is replacing mapDimensions.y number of quads each major tick. For example, in Trisector, the maps are 15 tiles high, and thus this method replaces 15 (e.g. mapDimensions.y) quads per major tick.

The addRemoveColumnArraySpritesToFromLayer method first takes a column array from the map source array (which is mapDimensions.x wide). Next, the position for the first quad in the quad array is determined by calculating the current offset for the column using: (current column % viewport width) * mapDimensions.y. After the quad position is calculated, the sprites for the column array replace the quads that are being removed in the quad array via the updateQuadFromSprite method. Then, the remaining quads that were not replaced in the quad array for that column are updated with zero’d quads via the updateZeroQuadAtPosition method.

-(void) addRemoveColumnArraySpritesToFromLayer:(CCSpriteBatchNode *) targetLayer tileArray:(NSMutableArray *) sourceArray removeCol:(NSInteger) removeColumn addCol:(NSInteger) addColumn;
{
    // get the column sprite array full of sprites to add
    NSMutableArray *spritesForColArray = [sourceArray objectAtIndex:addColumn];
    
    // get starter quad index for the first column that is going to have it's quads replaced
    NSInteger firstQuadIndex = (addColumn % GAME_BOARD_VIEWPORT_COLS) * mapDimensions.y;
    
    //NSLog(@"addRemoveColumnArraySpritesToFromLayer: Replace quads for old Col[%i] with quads for new Col[%i] starting at Index[%i]", removeColumn, addColumn, firstQuadIndex;
    
    // update/overwrite each quad for the old column in the batch node with a quad from the new column (not added as a child node)
    // there is one quad for each tile in the y axis for this column
    for (int i=0; i<mapDimensions.y; i++){
        
        // if there is a quad left to add from the column
        if (i<[spritesForColArray count]){
            
            // the tileSprite is already fully formed with the quad in the spritesForColArray
            CCSprite *tileSprite = [spritesForColArray objectAtIndex:i];

            // set the position in the quad list relative to the first quad
            [tileSprite setAtlasIndex:firstQuadIndex+i]; 
            
            // now update/overwrite the existing quad (the old quad) with the new quad.
            [targetLayer updateQuadFromSprite:tileSprite];
            
            // cleanup
            tileSprite = nil;
        }
        // overwrite the remaining quads in this column with a zero quad
        else {
            // set this quad as zeros
            [targetLayer updateZeroQuadAtIndex:firstQuadIndex+i];
        }
    }
    
    // cleanup
    spritesForColArray = nil;
}

I added the methods insertQuadFromSprite, updateQuadFromSprite and updateZeroQuadAtIndex to the “CCSpriteBatchNode Extension” in CCSpriteBatchNode.m in order to be able to update the quads directly. The insertQuadFromSprite method takes a sprite that already has a quad and a target atlasIndex and inserts the quad into the quad array at the target atlasIndex. The updateQuadFromSprite method takes a sprite that already has a quad and a target atlasIndex and updates the quad array with the quad at the target atlasIndex. The updateZeroQuadAtIndex method updates the atlasIndex with a zero’d quad at the target atlasIndex.

// -------------------------------------------------------------------------------------------------
// ADDED the sprite should have the quad generated and the atlasIndex already set 
// -------------------------------------------------------------------------------------------------
-(void) insertQuadFromSprite:(CCSprite *) sprite;
{
	NSAssert( sprite != nil, @"Argu ment must be non-nil");
	NSAssert( [sprite isKindOfClass:[CCSprite class]], @"CCSpriteBatchNode only supports CCSprites as children");
	
	// allow for texture atlas resizing.
	while(sprite.atlasIndex >= textureAtlas_.capacity || textureAtlas_.capacity == textureAtlas_.totalQuads )
		[self increaseAtlasCapacity];
	
    // update the quad directly. Don't add the sprite to the scene graph
	[sprite setBatchNode:self];
	
    // use the sprite's pre-generated quad and insert the quad at the specified index.
    // Unless setdirty=yes and updateTransform has been called, this quad will be empty.
    // Upon spirte creation, set dirty and call updateTransform.
	ccV3F_C4B_T2F_Quad quad = [sprite quad];
	[textureAtlas_ insertQuad:&quad atIndex:sprite.atlasIndex];
}

// -------------------------------------------------------------------------------------------------
// ADDED the sprite should have the quad generated and the atlasIndex already set 
// -------------------------------------------------------------------------------------------------
-(void) updateQuadFromSprite:(CCSprite *) sprite;
{
	NSAssert( sprite != nil, @"Argument must be non-nil");
	NSAssert( [sprite isKindOfClass:[CCSprite class]], @"CCSpriteBatchNode only supports CCSprites as children");
	
	// will never increase quad capacity during update. no new memory allocations.
    if (sprite.atlasIndex < textureAtlas_.capacity){
        
        // update the quad directly. Don't add the sprite to the scene graph
        [sprite setBatchNode:self];
        
        // use the sprite's pre-generated quad and update the quad at the specified index.
        // Unless setdirty=yes and updateTransform has been called, this quad will be empty.
        // Upon spirte creation, set dirty and call updateTransform.
        ccV3F_C4B_T2F_Quad quad = [sprite quad];
        [textureAtlas_ updateQuad:&quad atIndex:sprite.atlasIndex];
    }
}

// -------------------------------------------------------------------------------------------------
// ADDED inserts a zero quad at the specified index
// -------------------------------------------------------------------------------------------------
-(void) updateZeroQuadAtIndex:(NSInteger) atlasIndex;
{
	// will never increase quad capacity during update. no new memory allocations.
    if (atlasIndex < textureAtlas_.capacity){
        
        // generate the zero'd quad and update it at the specified index
        ccV3F_C4B_T2F_Quad quad;
        bzero( &quad, sizeof(quad) );
        [textureAtlas_ updateQuad:&quad atIndex:atlasIndex];
    }
}

Also, for the above insert/update methods to work (without calling updateTransform for each quad update/insert), the sprite’s quad is created at level load time. This ensures the quad is ready to go and doesn’t need to be created during culling. NOTE, the tile sprites don’t change rotation, scale, etc during play so the updateTransform is NOT called before inserting/updating here. If the sprites did change in some way, they would need to be set dirty and updateTransform would need to be called before inserting/updating the quad.

Here is an example of generating the quad for the tile sprite when the tile sprite is created:

CCSprite *baseSprite = getSpriteFromSpritesheet(tileTextureSheet, baseID);
[baseSprite setPosition:ccp(xPos,yPos)]; 
[baseSprite setAnchorPoint:ccp(0,0)];
// create the quad
[baseSprite setDirty:YES];  
[baseSprite updateTransform];

With this second culling method, I was finally able to get smooth performance on the older devices (e.g. iPad1,1; iPhone4,1; iPod4,1). Those devices are pretty close to having all of the graphical effects turned on such as the Base layer and/or the background, but is still a little choppy (~55fps) with those turned on. The second culling method provides a much smoother experience on these older devices than the first culling method. The performance difference on the iPad1,1 is very noticeable. However, the more modern devices worked fine using either method and I didn’t notice a significant difference in the Xcode profiler.

Hope you found this article on Trisector’s Culling interesting,

Jesse from Smash/Riot