Chatbots and zero sales resistence
Sauro Succi
DOI: https://doi.org/10.3389/fphy.2024.1484701
IF: 3.718
2024-10-15
Frontiers in Physics
Abstract:Not a day goes by without we hear of the latest AI breakthroughs, such as chatbots that write up texts or generate images increasingly harder to tell apart from their human-made counterparts. These headlines come with a heavy load of hype, but even with hype factored out, a highly seductive promise stands tall, the promise to capture levels of complexity largely out of grasp for our best theories, models and simulations. Briefly, AI would supplant the time-honored Scientific Method, as we know it since Galileo's time [1,2].While heavily pumped up, this promise is not empty, addressing as it does, among others, one of the most vexing Achille's heels of the scientific method, the infamous Curse of Dimensionality (CoD) [3]. Indeed, CoD compounds with a profound hallmark of Complexity, namely the fact that complex systems are sneaky: they inhabit ultra-dimensional spaces but don't fill them up [4,5,6]. To the contrary, "interesting things" take place in ultrathin and often highly scattered portions of the huge state space available to them. Nature likes to play hide and seek and big time so. An illuminating example can be found in the book of Frenkel and Smit [7], where we learn that the chance of making a sensible Monte Carlo move in the state space of hundred hard-spheres (please note, hundred, not Avogadro's) is about 10 -260 ! The golden nuggets are well hidden indeed.Computational science has devised a number of clever techniques to visit the regions hosting these preciously rare golden nuggets without waiting many ages of the Universe [8]. Yet, the CoD still remains a very tough cookie for the scientific method to the present day.Artificial Intelligence, mostly powered by Machine Learning (ML), promises a new and unprecedently powerful angle of attack to Complexity in science and society. And again, the promise is largely overblown but not empty, as witnessed by a number of success stories: chess and GO winnings, self-driving cars, Deep-Fold mapping of protein structure, stand out as some the most spectular(ized) cases in point [9].It is worth discussing where this "magic" comes from in some little more detail.The basic idea of ML is to represent a given D-dimensional output y (target) through the recursive application of a simple non linear map [10]. For a neural network (NN) consisting of an input layer x, L hidden layers z 1 . . . z L , each containing N neurons, and an output layer y, the update chain x → z 1 . . . → z L → y reads symbolically as follows:x = input(1)z 1 = f (W 1 x -b 1 ), . . . z L = f (W L z L-1 -b L )(2)y = f (W L+1 z L -b L+1 )(3)where W l are N × N matrices of weights, b l are N-dimensional arrays of biases and f is a nonlinear activation function, to be chosen out of a large palette of options.. The output y is then compared with a given training target y T ("Truth") and the weights are recursively updated in such a way as to minimize the discrepancy between y and y T (Loss function), up to the desired tolerance. This latter task is pursued by changing the weights along the direction of maximum change of the Loss function. In equationsW ij = W ij -γ ∂L ∂W ij(4)where L[W ] is the loss function, which depends on the full set of weights W , and γ is a numerical parameter in control of the convergence of the overall process.The idea is that with enough (big) data for training, the combination of (1+2+3) and ( 4) can reach any target, with no need of any model/theory aimed at capturing the causal structure of the problem at hand. Whence the alleged demise of the scientific method [1,2].Put down in such simplistic and bombastic terms, the idea is readily debunked, based on well-known properties of complex systems [11,12]. Yet, it is true that neural nets prove sometimes capable of representing "sneaky" functions in hyperdimensional spaces which would be extremely hard to attain by any other method.So, where does such magic come from?The key point is that for a DNN (deep neural network) of depth L (number of layers) and width N (numbers of neurons per layer), there are P = N L possible paths connecting any single item x i in the input layer to any another single item y j in the output layer. Hence a DNN with N = 10 3 and L = 10 2 features N 2 L = 10 8 weights and P = 10 30 paths. Moreover, the search for the target can proceed in parallel across all of these paths. If you think that this is sci-fi, please think again, as current leading edge ML applications, such as DeepFold or Large Language Models motoring the most powerful "ask-meanything" chatbots are using up to 100 billions weights, basically the number of neurons in our brain. Except that our brain works at 20 Watt while the largest ML models are now sucking up at least ten million times more energy, a point to which we shall return shortly.These numbers unveil the magic behind ML: DNN duel the CoD face up, by unleashing an exponential number of paths, and adjusting them in such a -Abstract Truncated-
physics, multidisciplinary