Abstract The ray tracing rendering algorithm can produce high-fidelity images of 3-D scenes, including shadow effects, as well as reflections and transparencies. This is currently done at a processing speed of 30 frames per second. Therefore, current implementations of the algorithm are not yet suitable for interactive real-time rendering, which is required in games and virtual reality based applications. Nonetheless, the algorithm allows for massive parallelization of its computations, which is a strong reason of further improvements. Also, we present a parallel architecture for ray tracing based on a uniform spatial subdivision of the scene that exploits an embedded computation of ray-triangle intersections. This approach allows for a significant acceleration of intersection computations, as well as a reduction of the total number of the required intersections checks. Furthermore, it allows for these checks to be performed in parallel and in advance for each ray. In this paper we discuss and analyze an ASIP-based implementation using FPGAs and a GPGPU-based parallel implementation of the proposed architecture, both running different 3-D scenes. The performance of both implementations are reported and compared. Furthermore, a second GPU has been included in the GPGPU-based implementation, running the same parallel architecture. Thus, primary rays are split among both GPUs for parallel execution and their performance are also presented and compared.