Entropy and Probability Analysis of

Phone Call Data

James Canavan
E-Mail: jaco@donotenter.com
WWW: www.donotenter.com
© May, 1993

Table of Contents

Introduction

My task for this project was to select an occurrence which I could model into an analyzable form. The process I was to analysis was to have a large ( more than 80 ) number of data points to be recorded. I selected the occurrence of my long distance phone calls over the course of a year. Particularly, I wanted to find patterns of calls. For example, I wanted to know if there were high probabilities that I would make certain calls after I made calls to particular individuals. At first, I thought I knew my own calling patterns - but, when looking at all the data collected in one table - I couldn't see the expected patterns. Therefore I created several joint matrices that, when processed using probability and entropy analysis, would reveal patterns.

RETURN TO TOP OF PAGE

Data Collection Process

Interestingly when data is tabulated together in one table it takes on a new form; where before the subtle patterns are not noticeable, patterns are detected - also, is a lack of certain patterns noticeable; which, in my analysis, prompted me to re-tabulate the data. Because of this I was forced to sift through my phone call data three times before I came up with a collection of bits that could be transferred into analyzable matrices.

My first task was to put my phone bills in order and write down the calls I made restrained to four constraints: calls to Laura, calls to Anne, calls to Texas, and other calls. I went back more than 14 months - since I wanted to simplify my analysis, along with expressing the table in a binary format, I made a second table that only listed the data in binary format when a call occurred. This second table also restricted calls going back one year. But this only gave information if a call was made - I needed more information; specifically, I wanted to know when, during the year's time, I made a call and when I did not. As my professor pointed out, this added dimension to the data. So I referred to the original phone bills and created a third table which accounted the days which no calls were made.

RETURN TO TOP OF PAGE

Analysis

A tabulation of the data gives:

Other Home Laura Anne Totals Probability Entropy
0 0 0 0 210 .5785123 1.389
0 0 0 1 30 .0826446 0.4122
0 0 1 0 49 .1349862 0.5394
0 0 1 1 14 .0385674 0.1894
0 1 0 0 16 .0440771 0.2467
0 1 0 1 0 0.0 0
0 1 1 0 13 .0358126 0.1675
0 1 1 1 5 .0137741 0.0890
1 0 0 0 17 .0468319 0.0245
1 0 0 1 0 0 0
1 0 1 0 3 .0082644 0.0090
1 0 1 1 1 .0027544 0.0016
1 1 0 0 2 .0055096 0.0030
1 1 0 1 1 .0027544 0.0016
1 1 1 0 1 .0027544 0.0016
1 1 1 1 1 .0027544 0.0016


Group analysis taken from the number of consecutive days times the number of calls - this creates a value that includes information about the number of consecutive calls and their consecutive nature.

Days * # of calls

Function Value

Probability

Entropy

1

26

0.35135

0.9348

2

10

0.13513

0.5394

3

1

0.01351

0.0860

4

13

0.17567

0.6443

6

2

0.02702

0.1456

8

1

0.01351

0.0860

9

5

0.06756

0.3678

10

1

0.01351

0.0860

12

2

0.02702

0.1456

15

4

0.05405

0.2723

16

1

0.01351

0.0860

18

2

0.02702

0.1456

24

1

0.01351

0.0860

35

2

0.02702

0.1456

36

1

0.01351

0.0860

42

1

0.01351

0.0860

66

1

0.01351

0.0860

RETURN TO TOP OF PAGE

Conclusion

Although not being able to analyze the data as much as I would like, certain conclusions can be extracted from the tabulated data. Within the last year, for any day there was: 57.85 % chance I would not make a call, 8.26 % chance I would call Anne, 13.49 % chance I would call Laura, 4.40 % chance I would call home, and 4.68 % chance I would call someone different. More interestingly: the chance I would call both Laura and Anne is 3.85 %, and the chance I would call all four categories is only 0.275 %.

I next created a function that multiplied the number of consecutive calls multiplied times the number of calls in order to analyze the grouping of calls and the associated probability. From this tabulation I notice that I either make only separate or limited calls consecutively (35 % of only making one call a day) or I make many calls consecutively (the function values jump from around "12" to "35" very quickly).

Graph of total data would be displayed

References

1) Personal phone records of the author dating back more than one year between March, 1992 and March 1993.
2) Class notes from CSC 541 Systems Theory, Instructor: Roger Cavallo, Spring Term, 1993.
RETURN TO TOP OF PAGE

Here is the author's E-mail address: jaco@donotenter.com