Version 2 (modified by 18 months ago) ( diff ) | ,
---|
AI briefing
The classical way of bringing Large Language Models (LLMs) to reliably perform certain tasks is training. To train a LLM, typically, a large number of inputs and expected outputs of the AI are compiled to a data set. In a training run, the data is integrated into the AI's "knowledge" by computing weights based on the new data. This step requires very significant computational power of high-end GPUs with sufficient RAM.
Just to give an impression of the dimensions, realistic hardware for training could be a server with 192 GB of RAM and 5 to 10 NVIDIA A100 boards with 80 GB Video RAM each, and each costing around $15,000. Such servers can be rented as dedicated machines for significant monthly fees, but since training is only performed sometimes and takes a few hours to days, it's more adequate to rent cloud hardware for this purpose when it's needed and use smaller, cheaper hardware for inference.
While renting such a server for a training sounds more expensive than it actually is (taking into account the short time it's needed), typically in the hundreds of US$, the more expensive aspect is gathering and preparing the training data.
But is this really the way users have experienced LLMs since ChatGPT was released in November 2022? Who has actually trained an AI to achieve their goals?