Ollama is an open-source application that enables users to run large language models (LLMs) directly on their locally. This approach enhances data privacy and reduces latency, making it particularly beneficial for developers, researchers, and businesses concerned with data security.
Reach out to Brent Fife or another member of the AI team
You'll need to grab an authorization Token from your user on the site itself and set it as a "Bearer Token"
https://gpt.clarityhosts.com/ollama/api/generate
{
"model": "llama3.1:8b",
"prompt": "Hey whats up?",
"stream": false
}
https://gpt.clarityhosts.com/ollama/api/chat
{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
],
"stream": false
}'
There are more endpoints and each endpoint has far more complexity that listed above. For further information please check out this link: ollama/docs/api.md at main · ollama/ollama
Ollama runs of its models with llama.ccp. llama.cpp is an open-source designed to run large language models (LLMs) like Meta’s LLaMA on local hardware, including systems without powerful GPUs. Written in pure C/C++, it has minimal dependencies, making it lightweight. It supports quantization, reducing model size and boosting performance, and includes GPU acceleration options. llama.cpp also enables hybrid CPU+GPU inference, so models that exceed VRAM capacity can still run efficiently.