Data, Data Everywhere

As the Big Data beast fattens, will privacy and ethics get gobbled up?

By Joseph Janes | April 16, 2012

The Michigan Theater is at 603 East Liberty St. in Ann Arbor. Athletes Tom Brady and Cal Ripken have the same body mass index, 27—lower than Dr. Phil’s but higher than Abraham Lincoln’s. Austria’s fertility rate peaked in 1963 and has been falling steadily ever since. Q Lending Inc., of Coral Gables, Florida, received the smallest bailout from the TARP program, at $10,000.

I’m sure you found all of these as fascinating as I did, undoubtedly also wondering where this was going. These facts and a few gazillion others come to you courtesy of Factual, the brainchild of mathematician Gilad Elbaz, who gave us the company that is now Google’s AdSense. In Factual’s 500 terabytes of storage, there’s data from sources governmental and private, on topics broad and narrow, profound and trivial. It’s worth a wander through the website and its featured data sets to see just what it’s been vacuuming up.

A feature article in the March 24 New York Times tells us the company’s plan is “to build the world’s chief reference point for thousands of interconnected supercomputing clouds,” and goes on to describe of Factual’s clientele and and how they use the product. It also names a few competitors, including Infochimps, Gnip, and of course Wolfram Alpha, which partially powers Siri. Factual, by the way, is hiring; its “data specialist” jobs sound more than a little familiar, even if the page describing them lists 2010–2011 internship opportunities. Oops—I guess bad data can creep in everywhere.

This came hard on the heels of the announcement that the Statistical Abstract of the United States had been saved at the last moment by ProQuest. I’m glad of that; it seemed a shame that the government no longer felt it was worth publishing. I should be clear: I’ve never been a fan of the Abstract. (I’m a World Almanac sort of guy.) While its various elements are valuable and come in handy, the way in which it was organized—particularly the index that gave table numbers rather than pages—seemed stubbornly user-hostile to me. And the web version, consisting of large PDF slabs of tables, has gone from understandably simple to gratingly low-tech. Adding Excel versions was nice, though the whole thing still comes off as antediluvian.

Maybe ProQuest will attend to these shortcomings. In any event, these make for a sharp and illustrative counterpoint. One way of thinking about compiling Lots of Data is to organize it, by category—which perhaps yields some context and texture—add some metadata and a search mechanism, all in the service of providing access, so individual people can find a specific fact or set of facts in answer to a question.

Another way, only now feasible, is to mush it all together and see what can be learned. Not by an individual, necessarily, but rather by throwing tons of computing power at it to see what emerges. Both are attempts to somehow wrap our arms and minds around the vertiginous scope and complexity of data being generated and stored every second.

The name “Big Data” gets thrown around a lot, to denote this massive-data-conglomeration phenomenon. We’re told this will be an opportunity for information-focused people to collect, curate, manage, organize. All likely true, and all worth pursuing as extensions of work we’re familiar with.

Go one step further, though. How about professionals who work to humanize this field? Those who think about questions of privacy, authority, quality, authenticity, rationality, and ethicality. Who center these processes in efforts to better the human condition and the lives of individuals. Who build tools to gyre and gimbal in the taffeta of data to find just the right thread for a person in need. Somebody like, I don’t know, a reference librarian . . . but that’s another story.

JOE JANES is associate professor at the Information School of the University of Washington.

Tagged Under

Opinion & Commentary

Bookmobiles: A Proud History, a Promising Future

On National Bookmobile Day, the mobile libraries are running strong

My Year of RDA

Latest Library Links

5h

Keahi Adolpho and Stephen G. Krueger write: “This comprehensive review illuminates the current state of scholarly literature on trans and gender diverse inclusion in libraries, with the intention to provide a foundation and identify gaps for further research. We found that, with the exception of works on archives and cataloging, little scholarly literature goes beyond introductory talking points on basic information about trans and gender diverse people. We conclude with a call for much more in-depth research on this essential topic.”

In the Library with the Lead Pipe, Apr. 24
22h

Allison Escoto writes: “There’s no doubt about it. Librarianship can be a demanding career path, and the quest to find just the right work-life balance while managing stress is perennial. This eclectic collection of books speak to that delicate balance by covering the importance of wellness, preventing burnout, practicing mindfulness, and essential self-care practices.”

American Libraries column, May
1d

Emily Drabinski writes: “When you’re president of the ALA, you get asked a lot of questions. But here’s one that’s hard to answer: ‘What’s your favorite library?’ When you’ve seen as many amazing libraries as I have, it’s impossible to choose, so my favorite library is always the most recent one I’ve visited. As much as individual libraries are distinct, they also have a lot in common. Librarians select, acquire, describe, organize, and provide access to information. These fundamentals might look different in time and place, but they remain the core of our profession.”

American Libraries column, May
1d

Marshall Breeding writes: “The library technology industry had a quiet year in 2023. But in the absence of major business moves and acquisitions, companies set their sights on executing strategies to strengthen their market position. Previous cycles of business consolidation have yielded a layered landscape with distinct levels of competition. Competition at each level remains vigorous. Libraries may have fewer product choices because of past acquisitions, but the options remaining are distinctive. Most libraries can choose between for-profit and nonprofit vendors, and between proprietary and open source products.”

American Libraries feature, May
1d

ALA’s redesigned website went live on April 30. Upgrades include simplified menus, smoother navigation, enhanced functionality, and fewer pages for a cleaner experience. The enhanced calendar collects all events in one place with filters to search by topic, event type, location, audience, and more. A new awards and grants interface makes it easier to find and apply for funding opportunities. The new site is compliant with Web Content Accessibility Guidelines 2.2, and the Reference and User Services Association’s Accessibility Assembly is continuing efforts to improve accessibility.

ALA, Apr. 30
2d

Tania Otero Martinez writes: “Since a shocking plunge in math and reading scores on the latest National Assessment of Educational Progress, educators, administrators, and policymakers have grappled with how to address learning challenges following the pandemic. One factor that has largely escaped notice, however, is the role school libraries and librarians play in academic performance. School libraries are too often treated as a luxury rather than an essential part of the public education system with a proven impact on learning. It is time to turn around years of disinvestment in school libraries and librarians.”

Center for American Progress, Apr. 18
2d

Cynthia Hudson Vitale writes: “The Association of Research Libraries has issued Research Libraries Guiding Principles for Artificial Intelligence (AI). AI technologies, and in particular, generative AI, have significant potential to improve access to information and advance openness in research outputs. AI also has the potential to disrupt information landscapes and the communities that research libraries support and serve. The increasing availability of AI models sparks many possibilities and raises several ethical, professional, and legal considerations. These principles will serve as a foundational framework for the ethical and transparent use of AI and reflect the values we hold in research libraries.”

Association of Research Libraries, Apr. 25