"Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization"

Paul Ohm, Associate Professor of Law at the University of Colorado Law School, has self-archived "Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization" at SSRN.

Here's an excerpt:

Computer scientists have recently undermined our faith in the privacy-protecting power of anonymization, the name for techniques for protecting the privacy of individuals in large databases by deleting information like names and social security numbers. These scientists have demonstrated they can often 'reidentify' or 'deanonymize' individuals hidden in anonymized data with astonishing ease. By understanding this research, we will realize we have made a mistake, labored beneath a fundamental misunderstanding, which has assured us much less privacy than we have assumed. This mistake pervades nearly every information privacy law, regulation, and debate, yet regulators and legal scholars have paid it scant attention. We must respond to the surprising failure of anonymization, and this Article provides the tools to do so.

Read more about it at "What Information Is 'Personally Identifiable'?"

The Google Books Settlement and the Future of Information Access Conference

The University of California School of Information's Google Books Settlement and the Future of Information Access Conference was held on August 28, 2009. Below is a selection of articles and posts about the conference.

EFF Raises Concerns over Privacy Issues in Goggle Book Search

In "Warrants Required: EFF and Google's Big Disagreement about Google Book Search," Cindy Cohn discusses the Electronic Frontier Foundation's concerns over privacy issues in Google Book Search.

Here's an excerpt:

One of the most important of those protections is the assurance that your browsing and reading habits are safe from fishing expeditions by the government or lawyers in civil cases. In order to maintain freedom of inquiry and thought, the books we search for, browse, and read should simply be unavailable for use against us in a court of law except in the rarest of circumstances. We have other concerns about Google Book Search as well—concerns and data collection, retention, and reader anonymity—so this won't end the debate, but safeguards against disclosure are a central point of concern for us. . . .

Given this backdrop, we asked Google to promise that it would fight for those same standards to be applied to its Google Book Search product. . . .

Unfortunately, Google has refused. It is insisting on keeping broad discretion to decide when and where it will actually stand up for user privacy, and saying that we should just trust the company to do so. So, if Bob looks like a good guy, maybe they'll stand up for him. But if standing up for Alice could make Google look bad, complicate things for the company, or seem ill-advised for some other reason, then Google insists on having the leeway to simply hand over her reading list after a subpoena or some lesser legal process. As Google Book Search grows, the pressure on Google to compromise readers' privacy will likely grow too, whether from government entities that have to approve mergers or investigate antitrust complaints, or subpoenas from companies where Google has a business relationship, or for some other reason that emerges over time.

EFF Releases Letter to Google about Reader Privacy and Google Book Search

The Electronic Frontier Foundation has released a letter to Google about reader privacy and Google Book Search.

Here's an excerpt:

  1. Protection Against Disclosure: Readers should be able to use Google books without worrying that the government or a third party is reading over their shoulder. Google needs to promise that it will protect reader records by responding only to properly-issued warrants from law enforcement and court orders from third parties. It also must promise that it will let readers know if anyone has demanded access to information about them.
  2. Limited Tracking: Just as readers can anonymously browse books in a library or bookstore, they should also be able to search, browse, and preview Google books without being forced to register or provide any personal information to Google. And for any of its Google Book Search services, Google must not keep logging information longer than 30 days. Google should also not link any information it collects about reader use of Google Book Search to that reader’s usage of any other Google services without specific, affirmative consent.
  3. User Control: Readers should have complete control of their purchases and purchasing data. Readers should be able to delete their records and have extensive permissions controls for their "bookshelves" or any other reading displays to prevent others from seeing their reading activities. Readers should be able to “give” books to anyone, including to themselves, without tracking. Google also should not reveal any information about Google book use to credit card processors or any other third parties.
  4. User Transparency: Readers should know what information is being collected and maintained about them and when and why reader information has been disclosed. Google needs to develop a robust, enforceable privacy policy and publish the number and type of demands for reader information that are received on an annual basis.

Read more about it at "Don't Let Google Close the Book on Reader Privacy!."

ISPs Allow Ad Agencies to Conduct Massive Deep-Packet Inspection of Customers' Internet Traffic

As many as 10% of all U.S. ISP customers may be subject to deep-packet inspection of their Internet traffic. Unnamed ISPs have contracts with ad agencies, such as Front Porch and NebuAd, that permit them to monitor customers' activities. Front Porch can track 100,000 U.S. users and NebuAd can track up to 10% of all U.S. users.

How can ISPs do this? It's because customers agreed to let them monitor their Internet activity by accepting the terms of their ISP's service contract.

Read more about it at "Can an Eavesdropper Protect Your Privacy?," "Every Click You Make: Internet Providers Quietly Test Expanded Tracking of Web Use to Target Advertising," "I.S.P. Tracking: The Mother of All Privacy Battles," and "Web Service Contracts Say ISPs Can Block Sites, Snoop."

Italian Agency Says Tracking File Sharing Activity without Permission Violates Privacy Rights

The Italian agency in charge of protecting personal data has ruled that Logistep violated the privacy rights of Italian file sharers by tracking their activity and ordered that these tracking records be destroyed. Previously, the Swiss data protection commissioner made a similar ruling against Logistep.

Read more about it at "Anti-Piracy Company Breaches Privacy, Ordered to Shut Down"; "Anti-Piracy Company Illegally Spied on P2P Users"; and "Italian File-Sharers Let Off The Hook."

MPAA Takes Down the MPA University Toolkit Because of GNU GPL Legal Issues

Slashdot reports that the Motion Picture of Association of America has removed the MPA University Toolkit software from the software's website after Matthew Garrett contacted the MPAA's ISP indicating that the software violated the GNU GPL. Garrett had attempted to contact the MPAA directly, but it was unresponsive. Currently, only Toolkit documentation remains on the website.

MPAA Toolkit May Allow Internet Users to See Internal University Network Traffic

The Washington Post reports that the Motion Picture of Association of America is trying to persuade universities to utilize its new MPA University Toolkit, which uses Snort and ntop to provide detailed internal network use statistics that may identify possible copyright infringers.

Security experts have determined that, in its default configuration, the MPA University Toolkit sets up a Web server that provides use statistics to any Internet user unless it is blocked from doing so by a firewall. There is a user/password option, but network administrators are not prompted to set it. Moreover, the software "phones home" to the MPAA upon setup, providing the organization with the IP address of the server.

Read more about it at "MPAA University 'Toolkit' Raises Privacy Concerns."

DigitalPreservationEurope Publishes Report on Copyright and Privacy Issues for Cooperating Repositories

DigitalPreservationEurope has published PO3.4: Report on the Legal Framework on Repository Infrastructure Impacting on Cooperation Across Member States.

Here's excerpt from the "Introduction."

The focus of this paper is the legal framework for the management of content of cooperating repositories. The focus will be on the regulation of copyright and protection of personal data. That copyright is important when managing data repositories is common knowledge. However, there is an increasing tendency among authors not only to deposit their published scientific work, scientific articles, dissertations or books, but also the underlying data. In addition to this ordinary publicly available sources like internet web pages contain personal data, often of a sensitive nature. Due to this emergent trend repositories will have to comply with the rules governing the use and protection of personal data, especially in the medical and social sciences.

The scenario is the following:

  • National repositories acquire material from different sources and in different formats.
  • The repositories cooperate with repositories in other countries in the preservation of data.
  • There is some degree of specialisation, some repositories specialise on preserving certain formats and other repositories on the preservation of other formats.

This paper describes the legal framework regulating the two decisive actions which have to take place if this scenario is to become a reality:

  1. The reproduction of data
  2. The transfer of data to other repositories

Other copyright issues like the rules concerning communication with the public and the protection of databases will also be touched upon.

Creative Commons Sued

The Creative Commons, along with Vigin Mobile, has been sued by Susan Chang and Justin Ho-Wee Wong over the "unauthorized and exploitive use of Alison's Chang's image in an advertising campaign launched in June 2007 to promote free text messaging and other mobile services."

Here's an excerpt from Lawrence Lessig's posting:

Slashdot has an entry about a lawsuit filed this week by parents of a Texas minor whose photograph was used by Virgin Australia in an advertising campaign. The photograph was taken by an adult. He posted it to Flickr under a CC-Attribution license. The parents of the minor are complaining that Virgin violated their daughter's right to privacy (by using a photograph of her for commercial purposes without her or her parents permission). The photographer is also a plaintiff. He is complaining that Creative Commons failed "to adequately educate and warn him . . . of the meaning of commercial use and the ramifications and effects of entering into a license allowing such use." (Count V of the complaint).

The comments on the Slashdot thread are very balanced and largely accurate. (The story itself is a bit misleading, as the photographer also complains that Virgin did not give him attribution, thereby violating the CC license). As comment after comment rightly notes, CC licenses have not (yet) tried to deal with the complexity of any right of privacy. The failure of Virgin to get a release before commercially exploiting the photograph thus triggers the question of whether the minor's right to privacy has been violated.

Source: Lessig, Lawrence. "On the Texas Suit against Virgin and Creative Commons." Lessig 2.0, 22 September 2007.

Digital Rights Management and Consumer Privacy: An Assessment of DRM Applications under Canadian Privacy Law

The Canadian Internet Policy and Public Interest Clinic of the University of Ottawa Faculty of Law has released Digital Rights Management and Consumer Privacy: An Assessment of DRM Applications under Canadian Privacy Law.

Here's an excerpt from the report's "Executive Summary":

This report confirms that DRM is currently being used in the Canadian marketplace in ways that violate Canadian privacy laws. DRM is being used to collect, use and disclose consumers’ personal information, often for secondary purposes, without adequate notice to the consumer, and without giving the consumer an opportunity to opt-out of unnecessary collection, use or disclosure of their personal information, as required under Canadian privacy law.

Top Five Technology Trends

As usual, the LITA top 10 technology trends session at ALA produced some thought-provoking results. And, as usual, I have a somewhat different take on this question.

I’ll whittle my list down to five.

  • Digital Copyright Wars: Big media and publishers are far from finished changing copyright laws to broaden, strengthen, and lengthen the rights of copyright holders. And they are not yet done protecting their digital turf with punitive lawsuits either. One big copyright impact on libraries is digitization: you can only safely digitize what’s in the public domain or what you have permission for (and the permission process can be difficult or impossible). There’s always fair use of course, if you have the deep pockets and institutional backing needed to defend yourself (like Google does) or if your efforts are tolerated (like e-reserves has been so far, except for a few sub rosa publisher objections). In opposition to this trend is a movement by the Creative Commons and others to persuade authors, musicians, and other copyright holders to license their works in ways that permit liberal use and reuse of them.
  • DRM: The Sony BMG rootkit fiasco was a blow, but think again if you believe that this will stop DRM from controlling your digital content in the future. The trick is to get DRM embedded in your operating system, and to have every piece of computer hardware and every consumer digital device that can access and/or manipulate content to support it (or to refuse access to material protected by unsupported DRM schemes). That’s a tall order, but incremental progress is likely to continue to be made towards this goal. Big media will continue to try to pass laws that mandate certain types of DRM and, like the DMCA, protect its use.
  • Internet Privacy: If you believe this still exists on the Internet, you are either using anonymous surfing services or you haven’t been paying attention. Net monitoring will become far more effective if ISPs can be persuaded or required to retain user-specific Internet activity logs. Would you be upset if every licensed e-document that your library users read could be traced back to them? Unless you still offer unauthenticated Internet access in your library, that may depend upon your retention of login records and whether you are legally compelled to reveal them.
  • Net Neutrality: If ISPs can create Internet speed lanes, you don’t want your library or digital content provider to be in the slow one. Hope you (or they) can pay for the fast one. But Net neutrality issues don’t end there: there are issues of content/service blockage and differential service based on fees as well.
  • Open Access: If there is a glimmer of hope on the horizon for the scholarly communication crisis, it’s open access. Efforts to produce alternative low-cost journals are important and deserve full support, but the open access movement’s impact is far greater, and it offers global access to scholars whose institutions may not be able to pay even modest subscription fees and to unaffiliated individuals.

Every Move That You Make: Internet Privacy at Risk

Privacy advocates have good reason to worry about a recent flurry of activity related to Internet data retention by ISPs. (In this context, data retention means keeping records about subscribers and their Internet activities beyond what is required for normal business purposes.)

In late April, Colorado Representative Diana DeGette, a Democrat, drafted legislation that would require ISPs to retain data about their subscribers until one year after their accounts were closed (see "Congress May Consider Mandatory ISP Snooping" and "Backer of ISP Snooping Slams Industry").

Then, in Mid-May, it was reported that Wisconsin Representative F. James Sensenbrenner, the Republican chairman of the House Judiciary Committee, was drafting legislation to mandate Internet data retention (see "Congress May Make ISPs Snoop on You"). The Judiciary Committee’s Communications Director backpedaled a few days after this revelation, issuing a statement that said: "Staff sometimes starts working on issues—throwing around ideas, doing oversight—and (they) get ahead of where the members are and what they want to tackle" (see "ISP Snooping Plans Take Backseat").

In late May, the Attorney General was reported to be privately asking major ISP’s to "retain subscriber information and network data for two years" (see "Gonzales Pressures ISPs on Data Retention").

What does data retention mean for reader privacy in an era where users are increasingly turning to Internet-based information resources instead of print resources? It depends on what data is retained, whether the user is authenticated (e.g., some libraries provide unauthenticated public Internet access), and under what circumstances it can be revealed. Let’s assume for the moment that there is fairly detailed data retention (e.g., user A went to URL B), but not total data retention (e.g., user A went to URL B, where the content of B is also retained).

Determining what the user saw at a particular URL may be dependent on how static the content is. Formally published material is presumably static. Access barriers may temporarily prevent the disclosure of licensed and other protected content until such barriers can be overcome by legal means, but nothing stops the immediate disclosure of freely available, formally published static material. Dynamic information, formally published or not, may have changed since the user accessed it, but how much? Information that is not formally published could have simply vanished, but the Internet Archive may permit reconstruction of what the user saw, and, for freely available material, it may also overcome the problem of changing content. In short, it may now be possible, for mandated retention periods, to determine every e-article, e-book, or other e-resource that a reader has used down to the level of specificity that a URL represents (e.g., page views within an HTML-based e-book).

Stepping back, you might ask: How is this different from the familiar library check-out record privacy problem? The difference is that libraries do not check out journal articles and a variety of other materials, such as reference books. Moreover, libraries are not required to retain circulation records, and readers always have the option for unrecorded in-library use. In the digital age, if it’s online, its use can be recorded.

Consequently, reader privacy may be going the way of the dinosaur. Stay tuned.

On the Internet Everyone Knows You’re a Dog (Bark Carefully)

A recent BusinessWeek article ("You Are What You Post") by Michelle Conlin may give Millennials (and everyone else) pause.

The article leads with a story about then 22-year-old Josh Santangelo’s 2001 posting on a dusty corner of the Internet about a bad drug trip. This caught the eye of super blogger Jason Kottke, and, after he linked too it, it became very popular. Now, Santangelo’s name pops up about 92,600 Google hits. Unfortunately, as the article states:

That was back when Santangelo was an up-all-night raver in giant pants and flame-red hair. Today he’s a Web development guy with a shaved head who shows up at meetings on time and in khakis. Clients have included such family-friendly enterprises as Walt Disney and Nickelodeon, as well as Starbucks, AT&T, and Microsoft.

And the business world is now tuned in to search engines as a rich source of information about potential employees:

Google is an end run around discrimination laws, inasmuch as employers can find out all manner of information—some of it for a nominal fee—that is legally off limits in interviews: your age, your marital status, the value of your house (along with an aerial photograph of it), the average net worth of your neighbors, fraternity pranks, stuff you wrote in college, liens, bankruptcies, political affiliations, and the names and ages of your children.

So, this could be trouble for those pouring out the intimate details of their personal and work lives on blogs, vlogs, social networking sites (e.g., MySpace), and other cool sites.

The article gives several amusing examples of employees fired for revealing too much on the Internet (amusing, that is, unless you’re the one fired).

Of course, there is the counter-notion that any publicity is good publicity. For example, celebrity sex tapes. A recent New York Times article ("Sex, Lawsuits and Celebrities Caught on Tape") by Lola Ogunnaike says about the Paris Hilton tape:

Ms. Hilton tried to stop distribution of the tape, although its notoriety paradoxically catapulted her to an even higher orbit of fame, establishing her as a kind of postmodern celebrity, leading to perfume deals, a memoir and the covers of Vanity Fair and W.

And, after discussing the latest round of celebrity sex tapes threatening to emerge, it says:

Celebrity sex tapes surface with such regularity that cynics question whether the stars themselves may be complicit, despite their efforts to suppress them in court, because of the publicity they bring.

However, this counter-notion may only apply to the already famous.

No doubt we’ll find out as part of a generation that’s digitally exposed itself on the Internet increasingly enters the workplace and competes within it.

An Important Partial Win for Google and Privacy

U.S. District Court Judge James Ware ruled on Friday that Google does not have to turn over 5,000 search queries to the Justice Department; however, it does have to turn over 50,000 random Web URLs.

The Google Blog posting ("Google Wins!") was ecstatic, stating that:

This is a victory for both online rights activists and users of Google. Google may not always be perfect, but this time they stood up for what is right.

According to an article in Red Herring ("Judge Limits US Data Hunt"):

The government’s subpoena originally told Google it must turn over massive amounts of data in two broad categories: all the URLs available on the company’s search engine as of last July 31, and all search queries entered into Google’s search engine during June and July of 2005. That likely would have included tens of millions of data points.

A San Francisco Chroncile article ("Google Must Divulge Data Judge Cuts Amount of Info Company Has to Give Feds") noted that:

Google, along with privacy advocates, argued that sometimes users can reveal personal information in search queries, including their Social Security Numbers. Or they can suggest the sexual preferences of public officials or use inflammatory phrases such as "bomb-making equipment," which would pique the interest of law enforcement. The privacy advocates said that the Justice Department couldn’t be trusted with access to such sensitive data, despite the administration’s promises to use the queries only for its online pornography case.

Judge Ware expressed concern about the impact of search-term disclose on Google due to user privacy issues:

The expectation of privacy by some Google users may not be reasonable, but may nonetheless have an appreciable impact on the way in which Google is perceived, and consequently the frequency with which users use Google. Such an expectation does not rise to the level of privilege, but does indicate that there is a potential burden as to Google’s loss of goodwill if Google is forced to disclose search queries to the government.