Big Data’s Promise and Limitations [New Yorker]:
In reality, most computational models of most things have, historically speaking, been wrong—or at least incomplete, effective in some circumstances, not all. Even Google, which likely has the biggest data of anyone, still uses humans to hand-curate some of it, because unanalyzed gobs of information are no guarantee of anything, and giant servers still can’t serve as fully trustworthy replacements for human judgment.
For perspective, it might help to consider the challenge of inferring the structure of protein from its underlying DNA sequence, a problem with an enormous number of applications in medicine, and throughout biology. Hundreds if not thousands of researchers have worked on the problem for fifty years, and for the last decade have had large databases to help; yet, in the words of a review published a few months ago in Science, “no single group [of researchers] yet consistently produces accurate models,” especially with more complex DNA sequences that don’t closely resemble genes that are already well understood. The more complex a problem is, and the more particular instances differ from those that came before, the less likely Big Data is to be a sure thing.
They used to say that the best algorithm for inferring the structure of protein was a researcher at M.I.T., whose name I’ve forgotten; he could look at a sequence and make a pretty darn good sketch of the structure. These days computational models can approximate much of that work and yet despite the advances, these computational models still leave much to be desired. For example look at the history of Rosetta@home [Wiki] and the subsequent creation of the protein prediction game Foldit [Wiki],
Some users of Rosetta@home became frustrated with the program when they realised they could see ways of solving the protein structures themselves but could not interact with the program. When Baker realised that humans could have considerable potential over computers attempting to solve protein structures, he approached David Salesin, a fellow computer scientist, and Zoran Popović, a game designer studying at the same university, to help conceptualize and build an interactive program that would both appeal to the public and assist in their efforts to find the native structures of proteins - a game.