본문 바로가기

개발 이야기/AI 인공지능 이야기

LLaMA-30B 4bit weight 사용해보기 (3090 24G 기반)

728x90

 

 

ChatGPT API연동 이런 레벨의 이야기가 오고가던중에

 

LLaMA 관련 인데요.

 

결론만 빠르게 말하자면, 물건? 입니다.

 

집에서 3090에 가볍게 띄워볼수도 있고 나름 ChatGPT향이 납니다.

 

물론

 

#1 https://rentry.org/llama-tard-v2#install-text-generation-webui

#2 4 bits quantization of LLaMa using GPTQ (https://github.com/qwopqwop200/GPTQ-for-LLaMa)

#3 https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model

으로 이어지는 집단지성의 힘을 적극적으로  활용했습니다.

 

LLaMA-7B int4 DDL: https://huggingface.co/decapoda-research/llama-7b-hf-int4/resolve/main
LLaMA-13B int4 DDL: https://huggingface.co/decapoda-research/llama-13b-hf-int4/tree/main
LLaMA-30B int4 DDL: https://huggingface.co/decapoda-research/llama-30b-hf-int4/tree/main
LLaMA-65B int4 DDL: https://huggingface.co/decapoda-research/llama-65b-hf-int4/tree/main

 

 ZoidBB라는 유저께서 적극적으로 7B, 13B 모델을 오픈해주어서 빠르게 캐치업 했던것 같습니다.

 

최근 공지에 

11-3-23 There's a new torrent version of the 4bit weights called "LLaMA-HFv2-4bit". The old "LLaMA-4bit" torrent may be fine. But if you have any issues with it, it's recommended to update to the new 4bit torrent or use the decapoda-research versions off of HuggingFace or produce your own 4bit weights.

라고 떴을정도로 사람들이 많이 헤맨상황도 많았는데요. 어찌되었든 빨리 찍먹은 가능했다 정도만.... ^^;;;

 

말이 길었습니다만 결론은 뭐 위에서와 같이 물건입니다. 7B는 확실히 심심이 AI버전 느낌이라면 13B, 30B는 쓸만해요.

ChatGPT 향기가 납니다. 65B는 못돌려봐서 모르겠습니다만, 일단 30B 맥스토큰 800+- 정도 잡고 돌려보면 15분 안팍으로 도는데

결과는 준수합니다.  비디오메모리는 20G 정도 점유 하고, 소비전력은 400W정도 씁니다.

 

Sun Mar 12 18:24:44 2023      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:0A:00.0  On |                  N/A |
|100%   76C    P2   402W / 420W |  19310MiB / 24576MiB | 100% Default |
|                               |                      |                  N/A |

 

속도때문에 13B로 계속 테스트 해볼 계획입니다.

 

관련된 위치에서 일 얘기할때는 많은 얘기가 오가게 되네요.   많은 공유가 있었으면 합니다.

 

백만원짜리 카드한장이... 여럿 대체하는 그런 세상이 오겠네요.

 

추가 수정 ( 13B 테스트 하느라 스왑이 50G였는데 조금더 늘려봐야겠네요. ^^;;)

4-bit Model Requirements for LLaMA

ModelModel SizeMinimum Total VRAMCard examplesRAM/Swap to Load*

LLaMA-7B 3.5GB 6GB RTX 1660, 2060, AMD 5700xt, RTX 3050, 3060 16 GB
LLaMA-13B 6.5GB 10GB AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A2000 32 GB
LLaMA-30B 15.8GB 20GB RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 64 GB
LLaMA-65B 31.2GB 40GB A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada 128 GB

*System RAM (not VRAM) required to load the model, in addition to having enough VRAM. NOT required to RUN the model. You can use swap space if you do not have enough RAM.

 

 

 

13B결과는 아래 DeepL 번역결과 일부 스샷하고, 원문 남깁니다. 초반 두개의 문단만 넣은 결과입니다. 더 좋은 결과인것도 많습니다.

 

 

Getting started with machine learning model monitoring is easier now than ever. Several businesses have produced tools to simplify the process of monitoring machine learning systems in production. Reinventing the wheel is not necessary.

The tooling to leverage for monitoring your system depends on the specific items you want to monitor. It is worth browsing to find what best works for you before finalizing your decision. A few solutions you may wish to start with are listed below.

* Apache's Metron (open-source)
* Machine Metrics (commercial)
* MLOps (also Machine Metrics) (commercial)
* Prometheus Operator (open-source)
* OpenMetrics (open-source)

## Metrics

Metrics that need to be gathered and stored from a machine learning model can be generally broken down into 5 groups:

* Model Performance (A summary of the model's effectiveness)
* Evaluation Metrics (metrics created from model evaluation data)
* Validation (how the model behaves under a specific set of circumstances)
* Stability (how well the model will hold up over time)
* Interaction (how the output is affecting the input)

### Training Metrics

Model Performance

* Aggregate metrics: mean, median, max and min values
  * Model accuracy after training
  * Model stability after training
  * Training loss
  * Training loss after training
  * Evaluation metrics after training
  * Inference accuracy
* Metrics related to training data: data size, data type, data format, number of dimensions, and the data source

Evaluation Metrics:

* Mean squared error (MSE)
* Mean absolute error (MAE)
* Root mean squared error (RMSE)
* Root mean absolute error (RMAE)
* Predictions loss
* Actual loss
* Predictions accuracy
* Actual accuracy
* Predicted value
* Actual value

Validation:

* Error rate
* Cross-entropy
* Accuracy
* R2

Stability:

* Prediction stability
* Evaluation stability
* Overfitting

Interaction:

Interaction with training data is not generally monitored, but should be noted when applicable. Model interaction can be defined by how a trained model affects inputs (prediction) and how inputs affect outputs (inferenc)

### Production Metrics

Model Performance

* Aggregate metrics: mean, median, max and min values
  * Model accuracy
  * Model stability
  * Model performance error rate
  * Evaluation metrics

Evaluation Metrics:

* Mean absolute error (MAE)
  * Predictions accuracy
  * Actual accuracy

Validation:

* Cross-entropy
* Accuracy
* Overfitting

Stability:

* Prediction stability
* Evaluation stability
* Overfitting

Interaction:

Interaction with inputs is not generally monitored, but should be noted when applicable. Input interaction can be defined by how an input affects outputs.

## See also

* Computer programming: https://softwareengineering.stackexchange.com/
* Statistics: https://stats.stackexchange.com/
* R Programming: https://www.r-bloggers.com/
* Machine Learning: https://www.coursera.org/specializations/machine-learning

 

 

 

 

(추가1)

AI 인공지능과 관련하여 더 많은 글들에서 팁들을

하기 링크에서 참고할 수 있습니다.

https://open-support.tistory.com/search/AI

 

 

 

 

 

그럼,

     공유합니다.

 

728x90