Integrating visual and tactile information in the temporal domain is critical for active perception. To accomplish this, coordinated timing is required. Here, we study perceived duration within and across these two modalities. Specifically, we examined how duration comparisons within and across vision and touch were influenced by temporal context and presentation order using a two-interval forced choice task. We asked participants to compare the duration of two temporal intervals defined by tactile or visual events. Two constant standard durations (700 ms and 1,000 ms in ‘shorter’ sessions; 1,000 ms and 1,500 ms in ‘longer’ sessions) were compared to variable comparison durations in different sessions. In crossmodal trials, standard and comparison durations were presented in different modalities, whereas in the intramodal trials, the two durations were presented in the same modality. The standard duration was either presented first (<sc>) or followed the comparison duration (<cs>). In both crossmodal and intramodal conditions, we found that the longer standard duration was overestimated in <cs> trials and underestimated in <sc> trials whereas the estimation of shorter standard duration was unbiased. Importantly, the estimation of 1,000ms was biased when it was the longer standard duration within the shorter sessions but not when it was the shorter standard duration within the longer sessions, indicating an effect of temporal context. The effects of presentation order can be explained by a central tendency effect applied in different ways to different presentation orders. Both crossmodal and intramodal conditions showed better discrimination performance for <sc> trials than <cs> trials, supporting the Type B effect for both crossmodal and intramodal duration comparison. Moreover, these results were not dependent on whether the standard duration was defined using tactile or visual stimuli. Overall, our results indicate that duration comparison between vision and touch is dependent on presentation order and temporal context, but not modality.