- Joined
- Jun 26, 2023
- Messages
- 42 (0.07/day)
Not me asked, but I think seed (or temperature for that matter) doesn't matter. It just changes/randomizes the output and was matters is the (VRAM) memory bandwidth. For my previous post text-to-text suggestion, if you do the benchmark, please in tokens per second, as indeed the answer can change, but to have the same answer every time reduce the temperature to the lowest value possible. Rough tokens per second speed can be calculated: Bandwidth / LLM size = 1792GB/s / 27GB (for the Qwen2.5-32B-Instruct-Q6_K.gguf suggestion) = 66 token/s. Of course, context size (=input text amount) matters, but as long as it's less than a couple of sentences, the VRAM consumption won't be higher than 32GB.Even with same random seed?
PS: Power scaling benchmark would be nice too.
Last edited: