I'm using fairydreaming/T5-branch, I'm not sure current llama-cpp-python server support t5
Model-Q6_K-GGUF, Reference1
Select the AI model to use for chat
Define the AI assistant's personality and behavior
Maximum length of response (higher = longer replies)
Creativity level (higher = more creative, lower = more focused)
Nucleus sampling threshold
Limit vocabulary choices to top K tokens
Penalize repeated words (higher = less repetition)