Every Move That You Make: Internet Privacy at Risk

Privacy advocates have good reason to worry about a recent flurry of activity related to Internet data retention by ISPs. (In this context, data retention means keeping records about subscribers and their Internet activities beyond what is required for normal business purposes.)

In late April, Colorado Representative Diana DeGette, a Democrat, drafted legislation that would require ISPs to retain data about their subscribers until one year after their accounts were closed (see "Congress May Consider Mandatory ISP Snooping" and "Backer of ISP Snooping Slams Industry").

Then, in Mid-May, it was reported that Wisconsin Representative F. James Sensenbrenner, the Republican chairman of the House Judiciary Committee, was drafting legislation to mandate Internet data retention (see "Congress May Make ISPs Snoop on You"). The Judiciary Committee’s Communications Director backpedaled a few days after this revelation, issuing a statement that said: "Staff sometimes starts working on issues—throwing around ideas, doing oversight—and (they) get ahead of where the members are and what they want to tackle" (see "ISP Snooping Plans Take Backseat").

In late May, the Attorney General was reported to be privately asking major ISP’s to "retain subscriber information and network data for two years" (see "Gonzales Pressures ISPs on Data Retention").

What does data retention mean for reader privacy in an era where users are increasingly turning to Internet-based information resources instead of print resources? It depends on what data is retained, whether the user is authenticated (e.g., some libraries provide unauthenticated public Internet access), and under what circumstances it can be revealed. Let’s assume for the moment that there is fairly detailed data retention (e.g., user A went to URL B), but not total data retention (e.g., user A went to URL B, where the content of B is also retained).

Determining what the user saw at a particular URL may be dependent on how static the content is. Formally published material is presumably static. Access barriers may temporarily prevent the disclosure of licensed and other protected content until such barriers can be overcome by legal means, but nothing stops the immediate disclosure of freely available, formally published static material. Dynamic information, formally published or not, may have changed since the user accessed it, but how much? Information that is not formally published could have simply vanished, but the Internet Archive may permit reconstruction of what the user saw, and, for freely available material, it may also overcome the problem of changing content. In short, it may now be possible, for mandated retention periods, to determine every e-article, e-book, or other e-resource that a reader has used down to the level of specificity that a URL represents (e.g., page views within an HTML-based e-book).

Stepping back, you might ask: How is this different from the familiar library check-out record privacy problem? The difference is that libraries do not check out journal articles and a variety of other materials, such as reference books. Moreover, libraries are not required to retain circulation records, and readers always have the option for unrecorded in-library use. In the digital age, if it’s online, its use can be recorded.

Consequently, reader privacy may be going the way of the dinosaur. Stay tuned.