"The concepts of message and probability enable one, for a definite source of N messages, to define Shannon’s information. If p_i,\quad i = 1, 2, ..., N, is the relative probability of message i and \log p_i is its base-2 logarithm, then the information I of the given source is(1) \quad I = - \sum_{k=1}^N p_i \log p_i.The minus sign makes I positive because all probabilities, which are necessarily greater than or equal zero, are less than unity (their sum being\textstyle \sum_{i}^N p_i = 1, so that their logarithms are all negative."
January 1, 1970
https://en.wikiquote.org/wiki/Julian_Barbour