Despite growing progresses in recent years, cross-scenario person re-identification remains challenging, mainly due to the pedestrians commonly surrounded by highly-complex environment contexts. In reality, the human perception mechanism could adaptively find proper contextualized spatial-temporal clues towards pedestrian recognition. However, conventional methods fall short in adaptively leveraging the long-term spatial-temporal information due to ever-increasing computational cost. Moreover, CNN-based deep learning methods are hard to conduct optimization due to the non-differentiable property of the built-in context search operation. To ameliorate, this paper proposes a novel Context-Interactive CNN (CI-CNN) to dynamically find both spatial and temporal contexts by embedding multi-task Reinforcement Learning (MTRL). The CI-CNN streamlines the multi-task reinforcement learning by using an actor-critic agent to capture the temporal-spatial context simultaneously, which comprises a context-policy network and a context-critic network. The former network learns policies to determine the optimal spatial context region and temporal sequence range. Based on the inferred temporal-spatial cues, the latter one focuses on the identification task and provides feedback for the policy network. Thus, CI-CNN can simultaneously zoom in/out the perception field in spatial and temporal domain for the context interaction with the environment. By fostering the collaborative interaction between the person and context, our method could achieve outstanding performance on various public benchmarks, which confirms the rationality of our hypothesis, and verifies the effectiveness of our CI-CNN framework.