Parallel Implementation of a Ray Tracer
Field
Parallel Computing
Institution
RIT
Year
2023
Project Type
Group Project
Course
Multiple Processing Systems
Keywords
Parallel Computing, Ray Tracing, Static Block, Cyclical Row, Dynamic Block, Image Stitching, C-to-C Ratios, Speedup, Load Balancing, Dynamic Allocation
Responsible for implementing the static block, cyclical and dynamic portion of the partitioning
The objective of this assignment was to create parallel implementations of an existing ray tracing program. These parallel implementations included a static column division method, a static block method, a cyclical row method and a dynamic block method. The methods were implemented and tested by rendering two scenes, one simple and one complex, to determine proper functionality of each implementation. Tests also included varying parameters, such as the number of computing nodes and size of block partitions. The master node in these programs (with the exception of dynamic) was responsible for an even chunk of work, including the stitching of the image back together. Computation and communication time were measured independently, to calculate C-to-C ratios for each implementation. Ultimately, results presented correct rendering for all implementations tested, and viable speedups for increased numbers of nodes.
Static Partitioning
Each of these static methods focuses on balancing the rendering workload among multiple processing nodes while minimizing communication overhead. They also involve strategic decisions on how to handle uneven workloads and ensure efficient stitching of the rendered image by the master node. The choice of method depends on the specific requirements of image size, node count, and desired balance between computation and communication.
-
Static Strips Method: Divides rendering into equal column strips, with workload evenly distributed among processing nodes. The master node stitches the completed image.
-
Static Blocks Model: Splits both rows and columns for partitioning, assigning any leftover data to the master node to maintain constant slave node block sizes.
-
Cyclical Model: Cycles through rows in strips; the master node handles any remainder. This method may require sending the entire partially rendered image for stitching, affecting communication efficiency.
-
Load Balancing: Addresses uneven workload by assigning extra work to the first few nodes or the master node (implementation dependent), minimizing communication overhead.
Load balancing example (clean division)
Load balancing example (uneven division)
Dynamic partitioning flow chart
Dynamic Partitioning
The dynamic block allocation in the ray tracing program is a sophisticated approach, differing significantly from the static models. Overall, this dynamic allocation method enhances scalability and flexibility in task distribution and image rendering, albeit with increased communication overhead.
-
Queue-like Design: Implements a dynamic system where the master node continuously assigns and collects image segments, without engaging in rendering itself.
-
Flexible Task Allocation: Blocks are dynamically sized and distributed in a cyclical order among worker nodes, adapting to varying image parts.
-
Task Stretching and Stealing: Allows for flexibility in rendering, where slave nodes can stretch their assigned block to cover unallocated areas, optimizing the use of available resources.
-
Increased Communication: Involves more frequent communication between master and slave nodes for task allocation and image data transfer.
-
Efficient Handling of Load Imbalance: The dynamic model adapts better to changes in block size and load distribution, making it more scalable and responsive to varying workloads.
-
Profiling Methodology: Profiling includes incremented tracking of communication and computation times with each cycle, providing a comprehensive view of the model's performance.
Result Analysis
We tested the speedup of each partitioning scheme in two different scenarios (simple and complex scenes) compared to their sequential runtime and overheads. Additionally, we examined the implementation's performance across various factors, such as changing work unit size and the number of processors utilized, presenting a comprehensive overview of the effectiveness of our implementation.
-
Factors considered:
-
​effect of work unit size on computational efficiency
-
effect of increasing the number of processors
-
effect of work unit shape on computation efficiency (dynamic block allocation)
-
-
Able to achieve:
-
8.84 speed up on average
-
up to 14 times faster compared to sequential calculation.​
-
C-to-C ratio around 0.9980 given dynamic allocation with the complex scene (measured on dedicated SPORC cluster)
-
Simple scene
Complex scene
Example: Effect of Work Unit Size on Static Cyclical Allocation Visualized (Simple)
In conclusion, this project successfully explored the balance between communication and computation in parallel computing through ray tracing. By testing different models, we discovered how static and dynamic work allocations can impact performance. While static models benefited from fewer nodes, dynamic models showed the challenges of uneven distribution. Ultimately, this project highlighted the real-world advantages of parallel programming in image rendering and provided hands-on experience with the complexities of dividing work in high-performance applications.