Deep learning in computer vision has made significant progress and could achieve high accuracy on image object detection task in recent years. Video object detection, a similar but more challenging task, has also been proposed and investigated by many researchers and practitioners. However, different from the image object detection task, video is more complicated since it carries richer contextual and temporal information. Directly applying state-of-the-art still-image object detection frameworks usually results in poor performance. In the blog, I will introduce two papers and discuss how they address this problem.