Databases can have birthdays and anniversaries too. The Protein Data Bank (PDB)—a gigantic repository that allows anyone with an internet connection to view the atomic structures of more than 77,000 biological molecules in all their three dimensional glory—just celebrated its 40th anniversary.
The occasion was marked by a special symposium at the end of October at Cold Spring Harbor Laboratory, where a meeting held 40 years ago sparked the discussions that gave birth to the PDB.
By 1971, the structures of about a dozen proteins had been “solved.” Because a protein’s function depends on its ability to coil into a specific three-dimensional shape, structural biologists had begun to chart the precise architectural coordinates of every atom within a protein (or a nucleic acid) to learn how these molecules work and how alterations in their structure can affect their function.
As recounted in this historical perspective on the PDB, scientists at the time were keeping track of structural coordinates using a system that seems unimaginably complicated and labor-intensive when viewed from the fast-moving front of the cloud computing, internet era. In this system, each atom in a protein was represented by a punched card. So if one lab wanted to send structural information about a protein such as hemoglobin to another lab, this could mean a transfer of more than 1000 cards.
At the 1971 CSH symposium titled “Structure and Function of Protein at the Three Dimensional Level,” Max Perutz, who had won the Nobel Prize in 1962 for discovering the structure of hemoglobin, gathered his colleagues for an informal meeting on how best to collect and distribute structural data. The meeting, which included many notable protein scientists, including fellow Nobel Laureates Dorothy Hodgkin, Aaron Klug, William Lipscomb, spawned intense discussions that took place all over the campus – Blackford Bar, the lawns, the beach – and spilled past official meeting hours.
With everyone agreeing on the need for a public data bank, the task of actually establishing and maintaining one in the US (another would be set up in the UK) fell to Walter Hamilton, a chemist who was developing graphics technologies and remote computing at Brookhaven National Laboratory, which is just a few highway exits away from CSHL. With support from several leading scientists of the structural biology community, plans for the PDB took shape with astonishing speed and the archive was officially launched in October 1971.
“The PDB is an early example of a research community developing an ‘open access’ model of data sharing long before it became fashionable,” says David Stewart, Executive Director of CSHL’s Meetings and Courses program. “It neatly showcases the impact of the CSHL symposium on science.”
CSHL structural biologist Leemor Joshua-Tor agrees. “PDB is a great model for how a community comes together and sets up a repository for sharing information,” she says. “And it’s still evolving – there are constant discussions of, for example, what to share, what new tools are needed to validate structures, when to release data and how to make this information readily useful to the non-expert. All these factors have improved the quality of information in a profound way. I don’t think there is anything like it.”
These discussions continued at the 40th anniversary, which served a reunion for many of the scientists who were part of the early efforts to establish the PDB, including Nobel laureate Johann Deisenhofer, an instructor in the first CSHL structural biology course in 1988. The CSHL Archives will soon make public oral histories and other historical items related to the PDB, so stay tuned!