Entropy
and Probability Analysis of| E-Mail: | jaco@donotenter.com |
| WWW: | www.donotenter.com |
Table of ContentsIntroductionMy task for this project was to select an occurrence which I could model into an analyzable form. The process I was to analysis was to have a large ( more than 80 ) number of data points to be recorded. I selected the occurrence of my long distance phone calls over the course of a year. Particularly, I wanted to find patterns of calls. For example, I wanted to know if there were high probabilities that I would make certain calls after I made calls to particular individuals. At first, I thought I knew my own calling patterns - but, when looking at all the data collected in one table - I couldn't see the expected patterns. Therefore I created several joint matrices that, when processed using probability and entropy analysis, would reveal patterns. RETURN TO
TOP OF PAGE
Data Collection ProcessInterestingly when data is tabulated together in one table it takes on a new form; where before the subtle patterns are not noticeable, patterns are detected - also, is a lack of certain patterns noticeable; which, in my analysis, prompted me to re-tabulate the data. Because of this I was forced to sift through my phone call data three times before I came up with a collection of bits that could be transferred into analyzable matrices. My first task was to put my phone bills in order and write down the calls I made restrained to four constraints: calls to Laura, calls to Anne, calls to Texas, and other calls. I went back more than 14 months - since I wanted to simplify my analysis, along with expressing the table in a binary format, I made a second table that only listed the data in binary format when a call occurred. This second table also restricted calls going back one year. But this only gave information if a call was made - I needed more information; specifically, I wanted to know when, during the year's time, I made a call and when I did not. As my professor pointed out, this added dimension to the data. So I referred to the original phone bills and created a third table which accounted the days which no calls were made. RETURN TO
TOP OF PAGE
AnalysisA tabulation of the data gives:
Group analysis taken from the number of consecutive days times the number of calls - this creates a value that includes information about the number of consecutive calls and their consecutive nature.
RETURN TO
TOP OF PAGE
ConclusionAlthough not being able to analyze the data as much as I would like, certain conclusions can be extracted from the tabulated data. Within the last year, for any day there was: 57.85 % chance I would not make a call, 8.26 % chance I would call Anne, 13.49 % chance I would call Laura, 4.40 % chance I would call home, and 4.68 % chance I would call someone different. More interestingly: the chance I would call both Laura and Anne is 3.85 %, and the chance I would call all four categories is only 0.275 %. I next created a function that multiplied the number of consecutive calls multiplied times the number of calls in order to analyze the grouping of calls and the associated probability. From this tabulation I notice that I either make only separate or limited calls consecutively (35 % of only making one call a day) or I make many calls consecutively (the function values jump from around "12" to "35" very quickly).
References1) Personal phone records of the author dating back more than one year between March, 1992 and March 1993.2) Class notes from CSC 541 Systems Theory, Instructor: Roger Cavallo, Spring Term, 1993. RETURN TO
TOP OF PAGE
|