LLaMA-30B 4bit weight 사용해보기 (3090 24G 기반)

ChatGPT API연동 이런 레벨의 이야기가 오고가던중에

LLaMA 관련 인데요.

결론만 빠르게 말하자면, 물건? 입니다.

집에서 3090에 가볍게 띄워볼수도 있고 나름 ChatGPT향이 납니다.

물론

#1 https://rentry.org/llama-tard-v2#install-text-generation-webui

#2 4 bits quantization of LLaMa using GPTQ (https://github.com/qwopqwop200/GPTQ-for-LLaMa)

#3 https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model

으로 이어지는 집단지성의 힘을 적극적으로 활용했습니다.

LLaMA-7B int4 DDL: https://huggingface.co/decapoda-research/llama-7b-hf-int4/resolve/main
LLaMA-13B int4 DDL: https://huggingface.co/decapoda-research/llama-13b-hf-int4/tree/main
LLaMA-30B int4 DDL: https://huggingface.co/decapoda-research/llama-30b-hf-int4/tree/main
LLaMA-65B int4 DDL: https://huggingface.co/decapoda-research/llama-65b-hf-int4/tree/main

의 ZoidBB라는 유저께서 적극적으로 7B, 13B 모델을 오픈해주어서 빠르게 캐치업 했던것 같습니다.

최근 공지에

11-3-23 There's a new torrent version of the 4bit weights called "LLaMA-HFv2-4bit". The old "LLaMA-4bit" torrent may be fine. But if you have any issues with it, it's recommended to update to the new 4bit torrent or use the decapoda-research versions off of HuggingFace or produce your own 4bit weights.

라고 떴을정도로 사람들이 많이 헤맨상황도 많았는데요. 어찌되었든 빨리 찍먹은 가능했다 정도만.... ^^;;;

말이 길었습니다만 결론은 뭐 위에서와 같이 물건입니다. 7B는 확실히 심심이 AI버전 느낌이라면 13B, 30B는 쓸만해요.

ChatGPT 향기가 납니다. 65B는 못돌려봐서 모르겠습니다만, 일단 30B 맥스토큰 800+- 정도 잡고 돌려보면 15분 안팍으로 도는데

결과는 준수합니다. 비디오메모리는 20G 정도 점유 하고, 소비전력은 400W정도 씁니다.

속도때문에 13B로 계속 테스트 해볼 계획입니다.

관련된 위치에서 일 얘기할때는 많은 얘기가 오가게 되네요. 많은 공유가 있었으면 합니다.

백만원짜리 카드한장이... 여럿 대체하는 그런 세상이 오겠네요.

추가 수정 ( 13B 테스트 하느라 스왑이 50G였는데 조금더 늘려봐야겠네요. ^^;;)

4-bit Model Requirements for LLaMA

ModelModel SizeMinimum Total VRAMCard examplesRAM/Swap to Load*

LLaMA-7B	3.5GB	6GB	RTX 1660, 2060, AMD 5700xt, RTX 3050, 3060	16 GB
LLaMA-13B	6.5GB	10GB	AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A2000	32 GB
LLaMA-30B	15.8GB	20GB	RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100	64 GB
LLaMA-65B	31.2GB	40GB	A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada	128 GB

*System RAM (not VRAM) required to load the model, in addition to having enough VRAM. NOT required to RUN the model. You can use swap space if you do not have enough RAM.

13B결과는 아래 DeepL 번역결과 일부 스샷하고, 원문 남깁니다. 초반 두개의 문단만 넣은 결과입니다. 더 좋은 결과인것도 많습니다.

Getting started with machine learning model monitoring is easier now than ever. Several businesses have produced tools to simplify the process of monitoring machine learning systems in production. Reinventing the wheel is not necessary.

The tooling to leverage for monitoring your system depends on the specific items you want to monitor. It is worth browsing to find what best works for you before finalizing your decision. A few solutions you may wish to start with are listed below.

* Apache's Metron (open-source)
* Machine Metrics (commercial)
* MLOps (also Machine Metrics) (commercial)
* Prometheus Operator (open-source)
* OpenMetrics (open-source)

## Metrics

Metrics that need to be gathered and stored from a machine learning model can be generally broken down into 5 groups:

* Model Performance (A summary of the model's effectiveness)
* Evaluation Metrics (metrics created from model evaluation data)
* Validation (how the model behaves under a specific set of circumstances)
* Stability (how well the model will hold up over time)
* Interaction (how the output is affecting the input)

### Training Metrics

Model Performance

* Aggregate metrics: mean, median, max and min values
* Model accuracy after training
* Model stability after training
* Training loss
* Training loss after training
* Evaluation metrics after training
* Inference accuracy
* Metrics related to training data: data size, data type, data format, number of dimensions, and the data source

Evaluation Metrics:

* Mean squared error (MSE)
* Mean absolute error (MAE)
* Root mean squared error (RMSE)
* Root mean absolute error (RMAE)
* Predictions loss
* Actual loss
* Predictions accuracy
* Actual accuracy
* Predicted value
* Actual value

Validation:

* Error rate
* Cross-entropy
* Accuracy
* R2

Stability:

* Prediction stability
* Evaluation stability
* Overfitting

Interaction:

Interaction with training data is not generally monitored, but should be noted when applicable. Model interaction can be defined by how a trained model affects inputs (prediction) and how inputs affect outputs (inferenc)

### Production Metrics

Model Performance

* Aggregate metrics: mean, median, max and min values
* Model accuracy
* Model stability
* Model performance error rate
* Evaluation metrics

Evaluation Metrics:

* Mean absolute error (MAE)
* Predictions accuracy
* Actual accuracy

Validation:

* Cross-entropy
* Accuracy
* Overfitting

Stability:

* Prediction stability
* Evaluation stability
* Overfitting

Interaction:

Interaction with inputs is not generally monitored, but should be noted when applicable. Input interaction can be defined by how an input affects outputs.

## See also

* Computer programming: https://softwareengineering.stackexchange.com/
* Statistics: https://stats.stackexchange.com/
* R Programming: https://www.r-bloggers.com/
* Machine Learning: https://www.coursera.org/specializations/machine-learning

(추가1)

AI 인공지능과 관련하여 더 많은 글들에서 팁들을

하기 링크에서 참고할 수 있습니다.

https://open-support.tistory.com/search/AI

그럼,

공유합니다.

저작자표시 비영리 변경금지

'개발 이야기 > AI 인공지능 이야기' 카테고리의 다른 글

chatgpt를 통해본 현재 시점의 인사이트 (0)	2023.03.20
Edge 브라우저(+Bing AI) 활용 (수백페이지 PDF 문서 내용 비교하기, 대화내용 요약 ) (0)	2023.03.16
노션AI Notion AI (0)	2023.03.12
ChatGPT 오픈 소스 클론이 나올 전망 (0)	2023.03.12
PDF여러개 동시에 올려서 GPT로 대화 (0)	2023.03.12

오픈서포트의 주변잡기

LLaMA-30B 4bit weight 사용해보기 (3090 24G 기반)

4-bit Model Requirements for LLaMA

'개발 이야기 > AI 인공지능 이야기' 카테고리의 다른 글

티스토리툴바

LLaMA-30B 4bit weight 사용해보기 (3090 24G 기반)

4-bit Model Requirements for LLaMA

'개발 이야기 > AI 인공지능 이야기' 카테고리의 다른 글

'개발 이야기/AI 인공지능 이야기' Related Articles

티스토리툴바