Machine-learning system based on light could yield more powerful, efficient large language models
ChatGPT has been featured in the news all over the globe due to its ability to write essays, emails, and even computer code based on a couple of user-generated prompts. An MIT-led team has just released the possibility of a system that leads to machine-learning software that is many orders larger than those that power ChatGPT. Their method could also consume more than a few orders of magnitudes less power than the top supercomputers that power the models for machine learning today.
In the issue of July 17 of Nature Photonics, researchers have published the first display of the system that can compute based on the motion of light instead of electrons by using thousands of micron-sized lasers. The new system researchers report a more than 100 times improvement in energy efficiency and an increase of 25 percent in the density of computation, a measurement of the system’s performance, compared to modern digital computers used for machine learning.
To the future
In the paper, the group also mentions “substantially several more orders of magnitude for future improvement.” As the authors say, this method “opens an avenue to large-scale optoelectronic processors to accelerate machine-learning tasks from data centers to decentralized edge devices.” Also, smartphones, as well as other small devices, can run programs that large data centers currently process.
Additionally, as the parts of the system are manufactured using manufacturing processes already in use, “we expect that it could be scalable to commercial usage within a couple of years. For instance the laser arrays that are involved are used extensively in cell phones for face ID as well as data communications,” says Zaijun Chen,e lead author. Chen researched while an assistant professor at MIT as part of the Research Laboratory of Electronics (RLE) and is currently an associate professor at the University of Southern California.
According to Dirk Englund, an associate professor at MIT’s Department of Electrical Engineering and Computer Science and the director of the project, “ChatGPT is limited in its capacity by the capabilities of supercomputers today. It’s not financially feasible to develop models that are larger. Our technology may allow us to jump into machine-learning models that wouldn’t be attainable in the near term.”
He adds, “We don’t know what capabilities the next-generation ChatGPT will have if it is 100 times more powerful, but that’s the regime of discovery that this kind of technology can allow.” Englund is also the director of MIT’s Quantum Photonics Laboratory and is associated with the RLE and the Materials Research Laboratory.
A thumping of progress
The present work is the latest accomplishment amid a steady stream of achievements over the past few years by Englund and several colleagues. In the year 2019, for instance, an Englund team published the theory, which led to the current demonstration. The original author of the research paper, Ryan Hamerly, now employed by RLE along with NTT Research Inc., is also the co-author of the present paper.
Co-authors on the current Nature Photonics paper include Alexander Sludds, Ronald Davis, Ian Christen, Liane Bernstein, and Lamia Ateshian, all from RLE, as well as Tobias Heuser, Niels Heermeier, James A. Lott, along with Stephan Reitzensttein of Technische Universitat Berlin.
DNNs (DNNs), such as the one that powers ChatGPT, are built on massive machine-learning models that mimic how the brain processes information. However, the technology behind the current DNNs is advancing to its limits even though technology for machine learning is expanding. Additionally, they consume vast amounts of energy and are mostly restricted to massive data centers. This is the reason for the creation of new paradigms of computing.
Using light instead of electrons to carry out DNN computations is a great way to eliminate bottlenecks. Optics-based computations as an example, for instance, can utilize less energy than calculations made using electronics. Furthermore, using optical technology, “you can have much larger bandwidths,” or compute density, according to Chen. Light can transfer more data over a smaller space.
However, current optical neural networks (ONNs) have many issues. They consume lots of energy due to needing to be more adequate in converting information based on electrical energy to light. Furthermore, the parts used in the process are heavy and take up much space. Even though ONNs are adept at linear calculations, such as adding, they are not the best in nonlinear calculations, such as multiplication or “if” statements.
In the present work, researchers have developed a compact design that is the first time that can solve each of these problems and two others simultaneously. The architecture is based on modern arrays of vertical surface emitting lasers (VCSELs), a relatively new technology used in applications such as lidar remote sensing and printing. Members of the Reitzenstein research group from Technische Universitat Berlin designed the VCELs mentioned in the Nature Photonics paper. “This was a collaborative project that would not have been possible without them,” Hamerly states.