A key element in modern text retrieval systems is the weighting of individual words for importance. Early in the development of document retrieval methods it was recognized that performance could be improved if weights were based at least in part on the frequencies of individual terms in the database. This observation led investigators to propose inverse document frequency weighting, which has become the most commonly used approach. Inverse document frequency weighting can be given some justification based on probabilistic arguments. However, many different formulas have been tried and it is difficult to distinguish between these on a purely theoretical basis. Witten, Moffat and Bell, have proposed a monotonicity condition as fundamental: ‘a term that appears in many documents should not be regarded as more important than a term that appears in a few’. Based on this monotonicity assumption and probabilistic arguments we show here how the TREC data can be used to learn ideal global weights. Using cross-validation we show that these weights are a modest but statistically significant improvement over IDF weights. One conclusion is that IDF weights are close to optimal within the probabilistic assumptions that are commonly made.