Advice for working with complex kernel encoding computations:
-Account for all commandBuffer instances and .commit() s its important since any added instance without a .commit on the commandBuffer in a draw loop is a memory leak. May not show on XCode Instruments stack trace but will show on Debug Memory Graph while running in Debug mode (see https://medium.com/@xcadaverx/locating-the-source-of-a-memory-leak-712667bf8cd5)
Important to enable malloc stack tracing to see the code source of memory leaks. Inevitably if you have an instanced commandBuffer that isn't committed in loop will source back to its instance creation as a culprit. The remedy is easily in loop committing this instanced commandBuffer.
Why re instance commandBuffer?
As it turns out with kernel computations if you are doing complex kernel computation work with multiple pass encodings sequentially. You can singly instance any device buffers that need be passed to the GPU on kernel instancing (usually at the outside of your viewcontrollers initialization) and then re write to these buffers with memcpy() functions
for instance:
-Account for all commandBuffer instances and .commit() s its important since any added instance without a .commit on the commandBuffer in a draw loop is a memory leak. May not show on XCode Instruments stack trace but will show on Debug Memory Graph while running in Debug mode (see https://medium.com/@xcadaverx/locating-the-source-of-a-memory-leak-712667bf8cd5)
Important to enable malloc stack tracing to see the code source of memory leaks. Inevitably if you have an instanced commandBuffer that isn't committed in loop will source back to its instance creation as a culprit. The remedy is easily in loop committing this instanced commandBuffer.
Why re instance commandBuffer?
As it turns out with kernel computations if you are doing complex kernel computation work with multiple pass encodings sequentially. You can singly instance any device buffers that need be passed to the GPU on kernel instancing (usually at the outside of your viewcontrollers initialization) and then re write to these buffers with memcpy() functions
for instance:
let hillbufferptr = hilllockbuffer?.contents()
memcpy(hillbufferptr, &(hilllockLabelMap!),hilllockB yteLength)
let relabelbufferptr = relabelbuffer?.contents()
memcpy(relabelbufferptr, &(relabelMap!), relabelByteLength)
let maxhbufferptr = maxhbuffer?.contents()
memcpy(maxhbufferptr, &(maxHMap!), maxhByteLength)
plist = [Float32](repeating: -1.0, count: (hillockmatrixtexture?.width)! *(hillockmatrixtexture?.height )!)
let plistbufferptr = plistbuffer?.contents()
memcpy(plistbufferptr, &(plist!), plistByteLength)
Then buffers need not be instanced in loop. Also important instancing textures outside of draw loop otherwise these can translate into memory leaks if not properly dealt with in terms of deallocation.
Anytime data retrieval occurs from a kernel encoding requires a commandBuffer.commit() and waitUntilCompleted() method...this translates into a new instancing (as far as I can tell) of a commandBuffer. The memory on the old command Buffer is freed otherwise.
Strategy for complex kernel pass encoding with multiple kernels passing a given data buffer from one kernel to the next. My advice (as per Apple's direct advice) avoid running a series of commandBuffer.commit() and waitUntilCompleted() method calls for buffer to array call backs only to in turn re write these back to the buffers. Instead use a single buffer and pass that same pointer buffer from one encoding kernel to the next. Callback instancing memory with byte copies to an array is slow and will likely cause GPU to hang error messages...or as an engineer describes this is merely serializing data flow from the CPU to the GPU. It is slow and cumbersome. I personally found only one instance where CPU processing data was used...my instance was CPU processing array data needed to sort and create set instancing of array data: sorting an array and eliminating duplicate values. This data would in turn determine the iterative structure of added encoding necessary to the algorithm.
I haven't found a recipe nor have been able to construct an adequate recipe in passing something like float array data into structs and pointing to the struct with such objects since working with instanced struct data (unless it is in pointer form) in the struct on the GPU side has a discretely defined array container size...instead I've just passed array pointers directly to the kernel functions as needed. Plenty of code recipes for this.
Then buffers need not be instanced in loop. Also important instancing textures outside of draw loop otherwise these can translate into memory leaks if not properly dealt with in terms of deallocation.
Anytime data retrieval occurs from a kernel encoding requires a commandBuffer.commit() and waitUntilCompleted() method...this translates into a new instancing (as far as I can tell) of a commandBuffer. The memory on the old command Buffer is freed otherwise.
Strategy for complex kernel pass encoding with multiple kernels passing a given data buffer from one kernel to the next. My advice (as per Apple's direct advice) avoid running a series of commandBuffer.commit() and waitUntilCompleted() method calls for buffer to array call backs only to in turn re write these back to the buffers. Instead use a single buffer and pass that same pointer buffer from one encoding kernel to the next. Callback instancing memory with byte copies to an array is slow and will likely cause GPU to hang error messages...or as an engineer describes this is merely serializing data flow from the CPU to the GPU. It is slow and cumbersome. I personally found only one instance where CPU processing data was used...my instance was CPU processing array data needed to sort and create set instancing of array data: sorting an array and eliminating duplicate values. This data would in turn determine the iterative structure of added encoding necessary to the algorithm.
I haven't found a recipe nor have been able to construct an adequate recipe in passing something like float array data into structs and pointing to the struct with such objects since working with instanced struct data (unless it is in pointer form) in the struct on the GPU side has a discretely defined array container size...instead I've just passed array pointers directly to the kernel functions as needed. Plenty of code recipes for this.
No comments:
Post a Comment