Download

Network of Enron E-mail Communication Based on USC Enron Dataset (V1)

Description

This is the network of e-mail communication of select employees of Enron. The nodes are 151 employees of Enron used in the University of Southern California dataset. A directed edge exists if the sender employee has sent at least one e-mail message to the receiver employee. All message counts are pooled over the years represented in the dataset.

Attributes

The dataset is a network object. It has the following attributes:

Collection

This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.

Versions

The following steps were taken to produce this dataset:

  1. The source dataset was loaded into a MySQL database.

  2. An SQL cross join was used to produce lists of edges, message counts, and message sizes sent from one e-mail address (listed in the employee list) to another. All messages counted, including those where the recipient was CC-ed.
    Vertex attributes were also exported.
  3. The edge and vertex attributes were imported into R, and a network object was constructed.

  4. Some minor corrections were made in spellings and order of the names.
Some employees have had more than one e-mail address during their tenures at Enron. While some attempt was made by those working on the source dataset (and those working on its source) to coalesce them, there are reasons to believe that this effort was an incomplete success: that some of the e-mail messages have not been attributed to their sender, and some e-mail addresses belonging to different people have been erroneously coalesced. For this reason, the dataset is versioned.

Licenses and Citation

Enrom Email USC1 Data (2006) https://www.network.data.ics.uci.edu
If the source of the data set does not specified otherwise, this data set is protected by the Creative Commons License
http://creativecommons.org/licenses/by-nc-nd/2.5/.

Source

http://www.isi.edu/~adibi/Enron/Enron.htm.

References

The original source should be cited as:
Shetty, Jitesh and Adibi, Jafar. “The Enron Email Dataset Database Schema and Brief Statistical Report.” Online. Available http://www.isi.edu/~adibi/Enron/Enron_Dataset_Report.pdf. 26 April 2006.

When publishing results obtained using this data set the original authors should be cited. In addition this package should be cited as:
Christopher L. DuBois, Emma S. Spiro, Zack Almquist, Mark S. Handcock, David Hunter, Carter T. Butts, Steven M. Goodreau, and Martina Morris. 2003 netdata: A Collection of Network Data
http://www.csde.washington.edu/statnet