Several modern network embedding methods learn vector representations from sampled context nodes. The sampling strategies are often carefully designed and controlled by specific parameters that enable them to adapt to different networks. However, the following fundamental question remains: what is the key factor that causes some sampling context results to yield better vectors than others on a certain network? We attempted to answer the question from the perspective of information theory. First, we defined the weighted entropy of the sampled context matrix, which denotes the amount of information it takes. We discovered that context matrices with higher weighted entropy generally produce better vectors. Second, we proposed maximum weighted entropy sampling methods for sampling more informative context nodes; thus, it can be used to produce more informative vectors. Herein, the results of the extensive experiments on the link prediction and node classification tasks confirm the effectiveness of the proposed methods.