Humans are inherently social beings that benefit from their perceptional capability to embody another point of view, typically referred to as perspective taking. Perspective taking is an essential feature in our daily interactions and is pivotal for human development. However, much remains unknown about the precise mechanisms that underlie perspective taking. Here, we show that formalizing perspective taking in a computational model can detail the embodied mechanisms employed by humans in perspective taking. The model's main building block is a set of action primitives that are passed through a forward model. The model employs a process that selects a subset of action primitives to be passed through the forward model to reduce the response time. The model demonstrates the results that mimic those captured by human data, including response times differences caused by the angular disparity between the perspective-taker and the other agent, the impact of task-irrelevant body posture variations in perspective taking, and differences in the perspective taking strategy between individuals. Our results provide support for the hypothesis that perspective taking is a mental simulation of the physical movements that are required to match another person's visual viewpoint. Furthermore, the model provides several testable predictions, including the prediction that forced early responses lead to an egocentric bias and that a selection process introduces dependencies between two consecutive trials. Our results indicate potential links between perspective taking and other essential perceptional and cognitive mechanisms, such as active vision and autobiographical memories.