wiki:Public/WhitePaperAiBriefing

Version 3 (modified by Boris Horner, 18 months ago) ( diff )

AI briefing

The classical way of bringing Large Language Models (LLMs) to reliably perform certain tasks is training. To train a LLM, typically, a large number of inputs and expected outputs of the AI are compiled to a data set. In a training run, the data is integrated into the AI's "knowledge" by computing weights based on the new data. This step requires very significant computational power of high-end GPUs with sufficient RAM.

Just to give an impression of the dimensions, realistic hardware for training could be a server with 192 GB of RAM and 5 to 10 NVIDIA A100 boards with 80 GB Video RAM each, and each costing around $15,000. Such servers can be rented as dedicated machines for significant monthly fees, but since training is only performed sometimes and takes a few hours to days, it's more adequate to rent cloud hardware for this purpose when it's needed and use smaller, cheaper hardware for inference.

While renting such a server for a training sounds more expensive than it actually is (taking into account the short time it's needed), typically in the hundreds of US$, the more expensive aspect is gathering and preparing the training data.

But is this really the way users have experienced LLMs since ChatGPT was released in November 2022? Who has actually trained an AI to achieve their goals?

Towards AGI

AI solutions that were (undiscovered by the large public) in common use before ChatGPT appeared were trained to a certain problem class - like optimizing the pictures taken by a smartphone camera or machine translation.

Note: See TracWiki for help on using the wiki.