With Little Training, Machine-Learning Algorithms Can Uncover Hidden Scientific Knowledge

Typography

Sure, computers can be used to play grandmaster-level chess, but can they make scientific discoveries?

Sure, computers can be used to play grandmaster-level chess, but can they make scientific discoveries? Researchers at the U.S. Department of Energy’s Lawrence Berkeley National Laboratory have shown that an algorithm with no training in materials science can scan the text of millions of papers and uncover new scientific knowledge.

A team led by Anubhav Jain, a scientist in Berkeley Lab’s Energy Storage & Distributed Resources Division, collected 3.3 million abstracts of published materials science papers and fed them into an algorithm called Word2vec. By analyzing relationships between words the algorithm was able to predict discoveries of new thermoelectric materials years in advance and suggest as-yet unknown materials as candidates for thermoelectric materials.

“Without telling it anything about materials science, it learned concepts like the periodic table and the crystal structure of metals,” said Jain. “That hinted at the potential of the technique. But probably the most interesting thing we figured out is, you can use this algorithm to address gaps in materials research, things that people should study but haven’t studied so far.”

Read more at: DOE/Lawrence Berkeley National Laboratory

Vahe Tshitoyan, Anubhav Jain, Leigh Weston, and John Dagdelen were among the participants in a text-mining project that used machine learning to analyze 3.3 million abstracts from materials science papers. (Photo Credit: Marilyn Chung/Berkeley Lab)