What is High Performance AI? | Pipeline magazine
From: Francisco Webber
Despite their popularity, steam cars had a very short lifespan. Why? Because they needed 30 minutes to start. They were quickly replaced by automobiles with internal combustion engines and electric starters. Motorists ditched their beloved steam cars because they wanted efficiency. The transition to electric cars that we know in the first decades of the 21st century is driven by their higher fuel efficiency than conventional cars – 60% versus 20%. The pattern is the same: it’s all about efficiency. Efficiency is a key driver of innovation.
Surprisingly, there is a whole segment of the industry that is developing in exactly the opposite direction: information and communication technology (ICT) in general, artificial intelligence (AI) in particular. As all major industries strive to reduce their carbon dioxide emissions and become more energy efficient, the power consumption of computing devices continues to grow. It is already equal to world air transport (4%) and should reach that of world automobile transport (8%) by 2030. This trend seems irrational but must be put into perspective.
1,000,000,000,000,000,000,000. 1 with 21 zeros. This is what a zettabyte is, a number impossible for our brains to grasp. However, this is the order of magnitude of the data produced nowadays – sensor data, simulations and measurements produced by machines, but also human-generated texts, such as e-mails, articles, reports, social media posts, etc. While numbers are easy to process because they leave no room for interpretation, human language makes jokes, expresses opinions, and uses elements of style like metaphors and allegories. Besides the amount of content, this poses an overwhelming problem for computer systems.
The language can be seen as an open system, in which new words constantly enrich the existing vocabulary. According to the Oxford English Dictionary, the English language has 171,146 words and several thousand are added each year. Some terms are common—the, are, big—while others are extremely rare—biblioklept, acnestis,
meldrop. According to Stuart Webb, professor of applied linguistics at the University of Western Ontario, if you learn just 800 of the most frequently used word families in English, you will be able to understand 75% of the language as it is spoken. normally. life. But if you want to read a novel or a newspaper, you have to learn 8,000 to 9,000 word families.
It’s not much different with computer systems: the more precise the text, the more vocabulary it has to learn. In other words: the larger the drive models should be. There’s just one major difference: while humans are able to infer the meaning of a new term from its context, computers are unable to understand vocabulary they’ve never seen. Translated into business applications, this means that statistical systems essentially ignore new terms, with a devastating impact on the quality of results.
Cortical.io conducted a real-world experiment to determine how many examples a template would need to cover 100% of the vocabulary found in 5,000 business emails. The results show that even extensive annotation does not guarantee complete vocabulary coverage: with 20,000 examples, only 70% of the vocabulary was covered. Knowing that enterprise datasets rarely contain more than a few hundred examples, the limitations of current approaches are obvious. Statistics cannot fully describe language. However, current AI models are still fueled by tons of statistics in the false hope that the bigger the models, the better the quality.
The variety, variability and ambiguity of languages are real challenges for computer systems, which can only do one thing: make calculations. It has long been thought that feeding AI systems with numbers derived from language – the so-called statistical approaches – would compensate for their lack of understanding of real meaning. This led to monster models like GPT-3 or BERT whose inflationary sizes – from billions in the beginning to hundreds of billions now – began to raise concerns about their durability. These approaches have led to what could be described as a “universe of a million models”: models are thoroughly trained to solve a very specific problem in a specific context and in a given language. In other words, these models are local. Each new problem requires another model, leading to a very fragmented environment: the universe of a million models. No network effects can be generated in such an environment. Their effectiveness in terms of replicability is nil.
However, that is not what MIT Technology Review has in mind when he describes these models as the “exhilarating and dangerous world of linguistic AI”. Instead, experts refer to their gargantuan demand for computing power and their very high carbon footprint. They also point to negative impacts at the societal level: biases in language patterns lead to discrimination in the way bank loans are granted or jobs assigned; the difficulty of accessing real information in an ocean of data facilitates fake news; the proliferation of consumer data, eagerly collected to perfect models, invites populists and demagogues to abuse it.