EnronMailUSC1 {netdata}R Documentation

Network of Enron E-mail Communication Based on USC Enron Dataset (version 1)

Description

This is the network of e-mail communication of select employees of Enron. The nodes are 151 employees of Enron used in the University of South California dataset. A directed edge exists if the sender employee has sent at least one e-mail message to the receiver employee. All message counts are pooled over the years represented in the dataset.

Usage

data(EnronMailUSC1)

Format

The dataset is a network object. It has the following attributes:

Attribute Type Class Description
messages edge numeric total number of messages sent from the sender to the receiver employee
size edge numeric total number of bytes sent from the sender to the receiver employee
employeeID nodal numeric arbitrary employee ID used by the source dataset
firstName nodal character first name of the employee
lastName nodal character last name of the employee
email nodal character e-mail address of the employee used by the source dataset
fullName nodal character full name of the employee

Details

The following steps were taken to produce this dataset:

  1. The source dataset was loaded into a MySQL database.
  2. An SQL cross join was used to produce lists of edges, message counts, and message sizes sent from one e-mail address (listed in the employee list) to another. All messages counted, including those where the recipient was CC-ed.
    Vertex attributes were also exported.
  3. The edge and vertex attributes were imported into R, and a network object was constructed.
  4. Some minor corrections were made in spellings and order of the names.

Some employees have had more than one e-mail address during their tenures at Enron. While some attempt was made by those working on the source dataset (and those working on its source) to coalesce them, there are reasons to believe that this effort was an incomplete success: that some of the e-mail messages have not been attributed to their sender, and some e-mail addresses belonging to different people have been erroneously coalesced. For this reason, the dataset is versioned.

Use data(package="netdata") to get a full list of networks.

Licenses and Citation

If the source of the data set does not specified otherwise, this data set is protected by the Creative Commons License http://creativecommons.org/licenses/by-nc-nd/2.5/.

When publishing results obtained using this data set the original authors should be cited. In addition this package should be cited as:

Mark S. Handcock, David Hunter, Carter T. Butts, Steven M. Goodreau, and Martina Morris. 2003 statnet: An R package for the Statistical Modeling of Social Networks
http://www.csde.washington.edu/statnet.

Source

http://www.isi.edu/~adibi/Enron/Enron.htm

References

Shetty, Jitesh and Adibi, Jafar. ``The Enron Email Dataset Database Schema and Brief Statistical Report.'' Online. Available http://www.isi.edu/~adibi/Enron/Enron_Dataset_Report.pdf. 26 April 2006.

See Also

network, sna

Examples

data(EnronMailUSC)

[Package netdata version 0.5-1 Index]