Tuesday, February 1st @ 11:00-12:30 PM (ONLINE)
The Universality of Zipf's Law
Ricardo Fernholz, Claremont McKenna College
Abstract: A set of data with positive values follows a Pareto distribution if the log–log plot of value versus rank is approximately a straight line. A Pareto distribution satisfies Zipf's law if the log–log plot has a slope of −1. Since many types of ranked data follow Zipf's law, it is considered a form of universality. We show that time-dependent systems with growth and variance parameters that are constant across ranks will follow Zipf's law if and only if two natural conditions, conservation and completeness, are satisfied. We also show that conservative and complete systems that have constant growth parameters but variance parameters that increase with rank are quasi-Zipfian, with a log-log plot that is concave and has a tangent line of slope -1 at some point. Our results explain the universality of Zipf's law for data generated by time-dependent rank-based systems, but ranked data generated by other means frequently follow non-Zipfian Pareto distributions. Our analysis explains why, for example, Zipf's law holds for word frequency, firm size, household wealth, and city size, while, for example, it does not hold for earthquake magnitude, cumulative book sales, the intensity of solar flares, and the intensity of wars, all of which follow non-Zipfian Pareto distributions.