# Dictionary Definition

surprisal n : the act of surprising someone [syn:
surprise]

# Extensive Definition

In information
theory (elaborated by Claude E.
Shannon, 1948),
self-information is a measure of the information content associated
with the outcome of a random
variable. It is expressed in a unit
of information, for
example bits, nats, or
hartleys
(also known as digits, dits, bans), depending on the base of the
logarithm used in its definition.

By definition, the amount of self-information
contained in a probabilistic event
depends only on the probability of that event:
the smaller its probability, the larger the self-information
associated with receiving the information that the event indeed
occurred.

Further, by definition, the measure of
self-information has the following property. If an event C is
composed of two mutually independent
events A and B, then the amount of information at the proclamation
that C has happened, equals the sum of the amounts of information
at proclamations of event A and event B respectively.

Taking into account these properties, the
self-information I(\omega_n) (measured in bits) associated with outcome
\omega_n is:

- I(\omega_n) = \log_2 \left(\frac \right) = - \log_2(\Pr(\omega_n))

This definition, using the binary
logarithm function, complies with the above conditions. In the
above definition, the logarithm of base 2 was used, and thus the
unit of \displaystyle I(\omega_n) is in bit. When using the logarithm of
base \displaystyle e, the unit will be in nat. For
the log of base 10, the unit will be in hartley.

This measure has also been called surprisal, as
it represents the "surprise"
of seeing the outcome (a highly probable outcome is not
surprising). This term was coined by Myron Tribus
in his 1961
book Thermostatics and Thermodynamics.

The information
entropy of a random event is the expected
value of its self-information.

Self-information is an example of a proper
scoring rule.

## Examples

- On tossing a coin, the chance of 'tail' is 0.5. When it is proclaimed that indeed 'tail' occurred, this amounts to

- I('tail') = log2 (1/0.5) = log2 2 = 1 bits of information.

- When throwing a fair die, the probability of 'four' is 1/6. When it is proclaimed that 'four' has been thrown, the amount of self-information is

- I('four') = log2 (1/(1/6)) = log2 (6) = 2.585 bits.

- When, independently, two dice are thrown, the amount of information associated with equals

- I('throw 1 is two & throw 2 is four') = log2 (1/P(throw 1 = 'two' & throw 2 = 'four')) = log2 (1/(1/36)) = log2 (36) = 5.170 bits. This outcome equals the sum of the individual amounts of self-information associated with and ; namely 2.585 + 2.585 = 5.170 bits.

- Suppose that the average probability of finding survivors in a large evolving population is P, then - when a survivor has been found, the amount of self-information will be -loge(P) nats (-log2(P) bits).

## References

- C.E. Shannon, A Mathematical Theory of Communication, Bell Syst. Techn. J., Vol. 27, pp 379-423, (Part I), 1948.

## External links

surprisal in German: Informationsgehalt

surprisal in Japanese: 情報量

surprisal in Italian: Autoinformazione

surprisal in Dutch:
Zelfinformatie