Perplexity is a metric from artificial intelligence, particularly language modeling, that measures how well a language model (e.g. GPT) can predict the next word in a sequence. Technically, it is the exponential of the average negative log-likelihood of the predicted probabilities for a given dataset. The lower the perplexity, the better the model understands the context. A high perplexity indicates uncertainty or a poor fit. In practice, perplexity is widely used to evaluate and compare the performance of different language models.