Custom APIs & Local LLMs
Connect Meetily to LM Studio, Open WebUI, Ollama, vLLM, and any OpenAI-compatible endpoint for 100% local meeting summarization.
Last updated: May 22, 2025
On this page
TL;DR
Meetily can send meeting transcripts to any OpenAI-compatible API for summarization. This means you can use a locally-running LLM server — like LM Studio, Ollama, vLLM, or Open WebUI — instead of cloud APIs. Everything stays on your machine. The Custom Server (OpenAI) provider is a Pro feature; Ollama and Built-in AI are free.
How Custom Endpoints Work
Meetily's summarization system sends your meeting transcript to an LLM via the OpenAI Chat Completions API format(opens in new tab). When you select Custom Server (OpenAI) as your provider, Meetily will:
- Take your configured endpoint URL (e.g.,
http://localhost:1234/v1) - Append
/chat/completionsto it - Send a POST request with the transcript as a chat message
- Parse the response and display the summary
This means any server that implements the /v1/chat/completions endpoint will work with Meetily.
Supported Providers
Meetily has built-in support for these providers:
| Provider | Type | License Required |
|---|---|---|
| Built-in AI | Offline, no API needed | Community (Free) |
| Ollama | Local LLM server | Community (Free) |
| OpenAI | Cloud (GPT-4, GPT-4o) | Community (Free) |
| Claude | Cloud (Anthropic) | Community (Free) |
| Groq | Cloud (fast inference) | Community (Free) |
| OpenRouter | Cloud (model aggregator) | Community (Free) |
| Custom Server (OpenAI) | Any OpenAI-compatible API | Pro |
Pro Feature
The Custom Server (OpenAI) provider requires a Meetily Pro license ($10/user/month billed annually). If you don't have Pro, you can still use Ollama or Built-in AI for local processing at no cost.
Configuration Fields
When you select Custom Server (OpenAI), these fields are available in Settings > Model Settings:
| Field | Required | Description |
|---|---|---|
| Endpoint URL | Yes | Base URL of the OpenAI-compatible server (must include /v1) |
| Model Name | Yes | Model identifier as the server knows it (e.g., llama-3.1-8b-instruct) |
| API Key | No | Authentication key — only needed if your server requires it |
| Max Tokens | No | Maximum response length (1–32,000). Leave empty for server default. |
| Temperature | No | Randomness (0.0–2.0). Lower = more focused, higher = more creative. |
| Top P | No | Nucleus sampling (0.0–1.0). Controls output diversity. |
Important: Include /v1 in Your Endpoint URL
Common Mistake
Meetily appends /chat/completions to whatever URL you provide. Most OpenAI-compatible servers expect the full path to be /v1/chat/completions, so you must include /v1 at the end of your endpoint URL.
Correct:
http://localhost:1234/v1
Wrong:
http://localhost:1234
The final request URL will be: http://localhost:1234/v1/chat/completions
If you're getting 404 Not Found errors, this is almost certainly the cause.
LM Studio
LM Studio(opens in new tab) is a desktop app for running local LLMs with a built-in OpenAI-compatible server. It's the easiest way to get started.
Step 1: Set Up the LM Studio Server
- Download and install LM Studio(opens in new tab)
- Download a model from the Discover tab (e.g., Llama 3, Mistral, Phi-3)
- Navigate to the Local Server tab (developer tab)
- Select and load your model from the Model dropdown
- Click Start Server — it runs on
http://localhost:1234by default
Step 2: Configure Meetily
- Open Meetily → Settings → Model Settings
- Select Custom Server (OpenAI) from the provider dropdown
- Set the endpoint URL:
http://localhost:1234/v1
- Enter the model name exactly as shown in LM Studio (e.g.,
llama-3.2-3b-instruct) - API Key: Leave empty or enter any placeholder like
lm-studio— LM Studio does not require authentication for local connections - Click Save
LM Studio Tips
- The
/v1path is required — LM Studio uses the OpenAI REST API format - For meeting summarization, models with 7B+ parameters work best
- Make sure the server is running before starting a meeting in Meetily
- If you change models in LM Studio, update the model name in Meetily settings too
Ollama
Ollama(opens in new tab) is a lightweight local LLM runner. Meetily has a dedicated Ollama provider that works with the free Community Edition — you do not need to use the Custom Server option.
Free — No Pro License Needed
Ollama has a dedicated provider in Meetily and works with the free Community Edition. Meetily will auto-detect available models from your Ollama server.
Step 1: Install & Run Ollama
- Install Ollama from ollama.com(opens in new tab)
- Pull a model:
ollama pull llama3.1- Ollama runs automatically on
http://localhost:11434
Step 2: Configure Meetily
- Open Meetily → Settings → Model Settings
- Select Ollama from the provider dropdown (not Custom Server)
- Meetily will auto-detect available models from your Ollama server
- Select your model and save
Open WebUI
Open WebUI(opens in new tab) is a self-hosted web interface for LLMs that also provides an OpenAI-compatible API endpoint.
Step 1: Set Up Open WebUI
- Install Open WebUI from openwebui.com(opens in new tab)
- Start with Docker:
docker run -d -p 3000:8080 ghcr.io/open-webui/open-webui:main- Open WebUI provides an OpenAI-compatible endpoint at
http://localhost:3000/api - Create an account in the web interface and generate an API key from Settings → Account
Step 2: Configure Meetily
- Select Custom Server (OpenAI) in Meetily settings
- Set the endpoint URL:
http://localhost:3000/api/v1
- Enter your Open WebUI API key (required for Open WebUI)
- Enter the model name available on your server
- Save
vLLM
vLLM(opens in new tab) is a high-throughput inference engine with native OpenAI-compatible API support. Best for users with dedicated GPU hardware.
Step 1: Start the vLLM Server
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-3.1-8B-Instruct \
--port 8000The server starts at http://localhost:8000 with an OpenAI-compatible API.
Step 2: Configure Meetily
| Field | Value |
|---|---|
| Endpoint | http://localhost:8000/v1 |
| Model | meta-llama/Llama-3.1-8B-Instruct |
| API Key | (leave empty) |
Other Compatible Tools
Any tool that provides an OpenAI-compatible /v1/chat/completions endpoint will work with Meetily:
Jan.ai
Desktop app for running LLMs locally with a built-in server.
Endpoint: http://localhost:1337/v1
API Key: (not required)
LocalAI
Drop-in OpenAI API replacement running locally with Docker.
Endpoint: http://localhost:8080/v1
API Key: (not required)
Text Generation WebUI (oobabooga)
Feature-rich web UI for running LLMs. Enable the OpenAI extension first.
Endpoint: http://localhost:5000/v1
API Key: (not required)
llama.cpp Server
Lightweight C++ inference server with OpenAI-compatible mode.
Endpoint: http://localhost:8080/v1
API Key: (not required)
Quick Reference: All Endpoint URLs
| Tool | Default Endpoint URL | API Key |
|---|---|---|
| LM Studio | http://localhost:1234/v1 | Not required |
| Ollama* | http://localhost:11434 | Not required |
| Open WebUI | http://localhost:3000/api/v1 | Required |
| vLLM | http://localhost:8000/v1 | Not required |
| Jan.ai | http://localhost:1337/v1 | Not required |
| LocalAI | http://localhost:8080/v1 | Not required |
| llama.cpp | http://localhost:8080/v1 | Not required |
| text-gen-webui | http://localhost:5000/v1 | Not required |
*Ollama has a dedicated provider in Meetily — use the "Ollama" option instead of "Custom Server"
Recommended Models for Meeting Summarization
The quality of your meeting summaries depends on the model you use. Here are community-tested recommendations by hardware tier:
8GB+ RAM (entry-level)
- Llama 3.2 3B Instruct
- Phi-3 Mini (3.8B)
- Gemma 2 2B
16GB+ RAM (recommended)
- Llama 3.1 8B Instruct — best balance of speed and quality
- Mistral 7B Instruct
- Qwen 2.5 7B Instruct
32GB+ RAM (best quality)
- Llama 3.1 70B (Q4 quantization)
- Mixtral 8x7B
- DeepSeek V2 Lite
Model Size Tip
For meeting summarization, instruction-tuned models with 7B+ parameters give the best results. Smaller models (3B and below) may produce shorter or less detailed summaries. If you're on limited hardware, quantized versions (Q4, Q5) offer a good quality-to-speed tradeoff.
Troubleshooting
"Custom OpenAI endpoint not configured"
Make sure you have entered both the endpoint URL and model name in Settings → Model Settings. Both fields are required.
Connection errors or timeouts
- Verify your LLM server is running
- Try opening the endpoint URL in your browser — e.g.,
http://localhost:1234/v1/modelsshould return a JSON response - Check that no firewall is blocking the port
Summaries are empty or malformed
Your model may be too small for meeting summarization. Try a larger model (7B+ parameters). Also check the temperature setting — values above 1.0 can produce inconsistent output.
"Custom Server (OpenAI)" option is disabled / grayed out
This feature requires a Meetily Pro license. If you have a license, ensure it is activated and not expired. Alternatively, use the free Ollama or Built-in AI providers.
LM Studio works but model name is wrong
The model name in Meetily must match exactly what LM Studio shows. Check the model identifier in LM Studio's Local Server tab — it's displayed near the top when the model is loaded.
404 Not Found errors
You likely forgot to include /v1 in the endpoint URL. Meetily appends /chat/completions to your URL, so the final path must be /v1/chat/completions. Change your endpoint from http://localhost:1234 to http://localhost:1234/v1.