Perplexity

빌드: $ make LLAMA_CUBLAS=1
Convert: $ python convert.py /models/42dot_LLM-SFT-1.3B-gguf/ --vocab-type bpe
- $ python convert-hf-to-gguf.py /models/gemma-7b/
모두 float16으로 실험

42dot_LLM-PLM-1.3B

$ CUDA_VISIBLE_DEVICES=2 ./perplexity -m /models/42dot_LLM-PLM-1.3B/ggml-model-f16.gguf -f wiki.test.raw

added_token.json과 vocab.json을 맞추고 --vocab-type bpe로 변환했다. 지정하지 않을 경우 Core dumped 오류가 발생하므로 주의

wiki.test.raw #569 ETA 10.37 min
- Final estimate: PPL = 12.7406 +/- 0.09403
- #142 Perplexity: tensor(9.5208, device=’cuda:0’)¹
ai-book.raw(한글, 1-5장) #118 ETA 1.87 min
- Final estimate: PPL = 11.1382 +/- 0.16895
- #30 Perplexity: tensor(9.1947, device=’cuda:0’)
  - bfloat16: Perplexity: tensor(9.1999, device=’cuda:0’)
  - float32(많이 느림): Perplexity: tensor(9.1944, device=’cuda:0’)
    f = open("./ai-book.raw") lines = f.readlines()

$ CUDA_VISIBLE_DEVICES=2 ./perplexity -m /models/42dot_LLM-SFT-1.3B-gguf/ggml-model-F16.gguf -f wiki.test.raw

wiki.test.raw #569 ETA 6.42 min
- Final estimate: PPL = 13.2594 +/- 0.09938
- #142 Perplexity: tensor(10.3943, device=’cuda:0’)
ai-book.raw #118 ETA 1.75 min
- Final estimate: PPL = 12.6709 +/- 0.20299
- #30 Perplexity: tensor(10.2340, device=’cuda:0’)

$ CUDA_VISIBLE_DEVICES=2 ./perplexity -m /models/Llama-2-7b-gguf/ggml-model-f16.gguf -f wiki.test.raw

wiki.test.raw #655 ETA 28.80 min
- Final estimate: PPL = 5.7984 +/- 0.03236
- #164 Perplexity: tensor(4.9073, device=’cuda:0’)
ai-book.raw #371 ETA 15.38 min
- Final estimate: PPL = 2.7178 +/- 0.01404
- #94 Perplexity: tensor(2.4290, device=’cuda:0’)

$ CUDA_VISIBLE_DEVICES=2 ./perplexity -m /models/Llama-2-7b-chat-gguf/ggml-model-f16.gguf -f wiki.test.raw

wiki.test.raw #655 ETA 26.85 min
- Final estimate: PPL = 7.6338 +/- 0.05164
- #164 Perplexity: tensor(6.2608, device=’cuda:0’)
ai-book.raw #371 ETA 15.82 min
- Final estimate: PPL = 3.9855 +/- 0.03160
- #94 Perplexity: tensor(3.4516, device=’cuda:0’)

$ CUDA_VISIBLE_DEVICES=2 ./perplexity -m /models/gemma-7b/ggml-model-f16.gguf -f wiki.test.raw

wiki.test.raw #569 ETA 25.75 min
- Final estimate: PPL = 7.8009 +/- 0.05042
- #142 Perplexity: tensor(5.9794, device=’cuda:0’)
ai-book.raw #172 ETA 8.85 min
- Final estimate: PPL = 6.5314 +/- 0.07056
- #44 Perplexity: tensor(5.2072, device=’cuda:0’)

$ CUDA_VISIBLE_DEVICES=2 ./perplexity -m /models/gemma-7b-it/ggml-model-f16.gguf -f wiki.test.raw

wiki.test.raw #569 ETA 18.68 min
- Final estimate: PPL = 28.2536 +/- 0.31980
- #142 Perplexity: tensor(18.9067, device=’cuda:0’)
ai-book.raw #172 ETA 7.42 min
- Final estimate: PPL = 48.1784 +/- 1.16831
- #44 Perplexity: tensor(31.1553, device=’cuda:0’)

$ CUDA_VISIBLE_DEVICES=2 ./perplexity -m /models/gemma-ko-7b/ggml-model-f32.gguf -f wiki.test.raw

float16에서는 nan으로 모델 평가가 진행되지 않는다(underflow?). gguf에서는 float32, hf에서는 bfloat16으로 진행

wiki.test.raw #569 ETA 30.58 min
- Final estimate: PPL = 10.1615 +/- 0.06924
- #142 Perplexity: tensor(8.5561, device=’cuda:0’)
ai-book.raw #172 ETA 12.13 min
- Final estimate: PPL = 4.7414 +/- 0.04712
- #44 Perplexity: tensor(4.1694, device=’cuda:0’)

Last Modified: 2024/03/17 14:53:15