Custom APIs & Local LLMs

new

Connect Meetily to LM Studio, Open WebUI, Ollama, vLLM, and any OpenAI-compatible endpoint for 100% local meeting summarization.

Last updated: April 28, 2026

On this page

TL;DR

Meetily can send meeting transcripts to any OpenAI-compatible API for summarization. This means you can use a locally-running LLM server - like LM Studio, Ollama, vLLM, or Open WebUI - instead of cloud APIs. Everything stays on your machine. The Custom Server (OpenAI) provider is a Pro feature; Ollama and Built-in AI are free.

How Custom Endpoints Work

Meetily's summarization system sends your meeting transcript to an LLM via the OpenAI Chat Completions API format(opens in new tab). When you select Custom Server (OpenAI) as your provider, Meetily will:

Take your configured endpoint URL (e.g., http://localhost:1234/v1)
Append /chat/completions to it
Send a POST request with the transcript as a chat message
Parse the response and display the summary

This means any server that implements the /v1/chat/completions endpoint will work with Meetily.

Supported Providers

Meetily has built-in support for these providers:

Provider	Type	License Required
Built-in AI	Offline, no API needed	Community (Free)
Ollama	Local LLM server	Community (Free)
OpenAI	Cloud (GPT-4, GPT-4o)	Community (Free)
Claude	Cloud (Anthropic)	Community (Free)
Groq	Cloud (fast inference)	Community (Free)
OpenRouter	Cloud (model aggregator)	Community (Free)
Custom Server (OpenAI)	Any OpenAI-compatible API	Pro

Pro Feature

The Custom Server (OpenAI) provider requires a Meetily Pro license ($10/user/month billed annually). If you don't have Pro, you can still use Ollama or Built-in AI for local processing at no cost.

Configuration Fields

When you select Custom Server (OpenAI), these fields are available in Settings > Model Settings:

Field	Required	Description
Endpoint URL	Yes	Base URL of the OpenAI-compatible server (must include `/v1`)
Model Name	Yes	Model identifier as the server knows it (e.g., `llama-3.1-8b-instruct`)
API Key	No	Authentication key - only needed if your server requires it
Max Tokens	No	Maximum response length (1–32,000). Leave empty for server default.
Temperature	No	Randomness (0.0–2.0). Lower = more focused, higher = more creative.
Top P	No	Nucleus sampling (0.0–1.0). Controls output diversity.

Important: Include /v1 in Your Endpoint URL

Common Mistake

Meetily appends /chat/completions to whatever URL you provide. Most OpenAI-compatible servers expect the full path to be /v1/chat/completions, so you must include /v1 at the end of your endpoint URL.

Correct:

http://localhost:1234/v1

Wrong:

http://localhost:1234

The final request URL will be: http://localhost:1234/v1/chat/completions

If you're getting 404 Not Found errors, this is almost certainly the cause.

LM Studio

LM Studio(opens in new tab) is a desktop app for running local LLMs with a built-in OpenAI-compatible server. It's the easiest way to get started.

Step 1: Set Up the LM Studio Server

Download and install LM Studio(opens in new tab)
Download a model from the Discover tab (e.g., Llama 3, Mistral, Phi-3)
Navigate to the Local Server tab (developer tab)
Select and load your model from the Model dropdown
Click Start Server - it runs on http://localhost:1234 by default

Step 2: Configure Meetily

Open Meetily → Settings → Model Settings
Select Custom Server (OpenAI) from the provider dropdown
Set the endpoint URL:

http://localhost:1234/v1

Enter the model name exactly as shown in LM Studio (e.g., llama-3.2-3b-instruct)
API Key: Leave empty or enter any placeholder like lm-studio - LM Studio does not require authentication for local connections
Click Save

LM Studio Tips

The /v1 path is required - LM Studio uses the OpenAI REST API format
For meeting summarization, models with 7B+ parameters work best
Make sure the server is running before starting a meeting in Meetily
If you change models in LM Studio, update the model name in Meetily settings too

Ollama

Ollama(opens in new tab) is a lightweight local LLM runner. Meetily has a dedicated Ollama provider that works with the free Community Edition - you do not need to use the Custom Server option.

Free - No Pro License Needed

Ollama has a dedicated provider in Meetily and works with the free Community Edition. Meetily will auto-detect available models from your Ollama server.

Step 1: Install & Run Ollama

Install Ollama from ollama.com(opens in new tab)
Pull a model:

bash

ollama pull llama3.1

Ollama runs automatically on http://localhost:11434

Step 2: Configure Meetily

Open Meetily → Settings → Model Settings
Select Ollama from the provider dropdown (not Custom Server)
Meetily will auto-detect available models from your Ollama server
Select your model and save

Open WebUI

Open WebUI(opens in new tab) is a self-hosted web interface for LLMs that also provides an OpenAI-compatible API endpoint.

Step 1: Set Up Open WebUI

Install Open WebUI from openwebui.com(opens in new tab)
Start with Docker:

bash

docker run -d -p 3000:8080 ghcr.io/open-webui/open-webui:main

Open WebUI provides an OpenAI-compatible endpoint at http://localhost:3000/api
Create an account in the web interface and generate an API key from Settings → Account

Step 2: Configure Meetily

Select Custom Server (OpenAI) in Meetily settings
Set the endpoint URL:

http://localhost:3000/api/v1

Enter your Open WebUI API key (required for Open WebUI)
Enter the model name available on your server
Save

vLLM

vLLM(opens in new tab) is a high-throughput inference engine with native OpenAI-compatible API support. Best for users with dedicated GPU hardware.

Step 1: Start the vLLM Server

bash

python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --port 8000

The server starts at http://localhost:8000 with an OpenAI-compatible API.

Step 2: Configure Meetily

Field	Value
Endpoint	`http://localhost:8000/v1`
Model	`meta-llama/Llama-3.1-8B-Instruct`
API Key	(leave empty)

Other Compatible Tools

Any tool that provides an OpenAI-compatible /v1/chat/completions endpoint will work with Meetily:

Jan.ai

Desktop app for running LLMs locally with a built-in server.

Endpoint: http://localhost:1337/v1
API Key:  (not required)

LocalAI

Drop-in OpenAI API replacement running locally with Docker.

Endpoint: http://localhost:8080/v1
API Key:  (not required)

Text Generation WebUI (oobabooga)

Feature-rich web UI for running LLMs. Enable the OpenAI extension first.

Endpoint: http://localhost:5000/v1
API Key:  (not required)

llama.cpp Server

Lightweight C++ inference server with OpenAI-compatible mode.

Endpoint: http://localhost:8080/v1
API Key:  (not required)

Quick Reference: All Endpoint URLs

Tool	Default Endpoint URL	API Key
LM Studio	`http://localhost:1234/v1`	Not required
Ollama*	`http://localhost:11434`	Not required
Open WebUI	`http://localhost:3000/api/v1`	Required
vLLM	`http://localhost:8000/v1`	Not required
Jan.ai	`http://localhost:1337/v1`	Not required
LocalAI	`http://localhost:8080/v1`	Not required
llama.cpp	`http://localhost:8080/v1`	Not required
text-gen-webui	`http://localhost:5000/v1`	Not required

*Ollama has a dedicated provider in Meetily - use the "Ollama" option instead of "Custom Server"

Recommended Models for Meeting Summarization

The quality of your meeting summaries depends on the model you use. Here are community-tested recommendations by hardware tier:

8GB+ RAM (entry-level)

Llama 3.2 3B Instruct
Phi-3 Mini (3.8B)
Gemma 2 2B

16GB+ RAM (recommended)

Llama 3.1 8B Instruct - best balance of speed and quality
Mistral 7B Instruct
Qwen 2.5 7B Instruct

32GB+ RAM (best quality)

Llama 3.1 70B (Q4 quantization)
Mixtral 8x7B
DeepSeek V2 Lite

Model Size Tip

For meeting summarization, instruction-tuned models with 7B+ parameters give the best results. Smaller models (3B and below) may produce shorter or less detailed summaries. If you're on limited hardware, quantized versions (Q4, Q5) offer a good quality-to-speed tradeoff.

Troubleshooting

"Custom OpenAI endpoint not configured"

Make sure you have entered both the endpoint URL and model name in Settings → Model Settings. Both fields are required.

Connection errors or timeouts

Verify your LLM server is running
Try opening the endpoint URL in your browser - e.g., http://localhost:1234/v1/models should return a JSON response
Check that no firewall is blocking the port

Summaries are empty or malformed

Your model may be too small for meeting summarization. Try a larger model (7B+ parameters). Also check the temperature setting - values above 1.0 can produce inconsistent output.

"Custom Server (OpenAI)" option is disabled / grayed out

This feature requires a Meetily Pro license. If you have a license, ensure it is activated and not expired. Alternatively, use the free Ollama or Built-in AI providers.

LM Studio works but model name is wrong

The model name in Meetily must match exactly what LM Studio shows. Check the model identifier in LM Studio's Local Server tab - it's displayed near the top when the model is loaded.

404 Not Found errors

You likely forgot to include /v1 in the endpoint URL. Meetily appends /chat/completions to your URL, so the final path must be /v1/chat/completions. Change your endpoint from http://localhost:1234 to http://localhost:1234/v1.

Frequently Asked Questions

A custom API endpoint lets you connect Meetily to any OpenAI-compatible server for meeting summarization. Instead of relying on cloud services, you point Meetily at a locally-running LLM server - such as LM Studio, Ollama, vLLM, or Open WebUI - and all transcript processing stays on your machine. Meetily sends your transcript to the configured endpoint using the standard OpenAI Chat Completions API format and displays the returned summary.

Not necessarily. The "Custom Server (OpenAI)" provider requires Meetily Pro ($10/user/month billed annually), but there are free alternatives. Ollama has a dedicated built-in provider that works with the free Community Edition and auto-detects your installed models. The Built-in AI option also works without a license. Only the generic Custom Server connector - which supports any OpenAI-compatible endpoint - is a Pro feature.

Any server that implements the OpenAI-compatible /v1/chat/completions endpoint works with Meetily. Tested and documented servers include LM Studio, Ollama, Open WebUI, vLLM, Jan.ai, LocalAI, llama.cpp server, and Text Generation WebUI (oobabooga). If your tool exposes that standard endpoint, it will work - no special integration required.

First, download and install LM Studio from lmstudio.ai and download a model from the Discover tab (e.g. Llama 3, Mistral, Phi-3). Go to the Local Server tab, select your model, and click Start Server - it defaults to http://localhost:1234. In Meetily, open Settings → Model Settings, choose "Custom Server (OpenAI)", set the endpoint to http://localhost:1234/v1, enter the exact model name shown in LM Studio, leave the API key empty, and click Save. Make sure the LM Studio server is running before starting a meeting.

This is the most common setup mistake. Meetily appends /chat/completions to whatever URL you provide, so the final request path must be /v1/chat/completions. If your endpoint URL is http://localhost:1234 (without /v1), the request goes to http://localhost:1234/chat/completions, which does not exist on most servers. The fix is to add /v1 to the end of your endpoint URL: http://localhost:1234/v1.

For 8 GB RAM: Llama 3.2 3B Instruct, Phi-3 Mini (3.8B), or Gemma 2 2B. For 16 GB RAM (recommended): Llama 3.1 8B Instruct offers the best balance of speed and quality, along with Mistral 7B Instruct and Qwen 2.5 7B Instruct. For 32 GB+ RAM: Llama 3.1 70B (Q4 quantization), Mixtral 8x7B, or DeepSeek V2 Lite. Instruction-tuned models with 7B+ parameters produce the most detailed and accurate meeting summaries.

Cloud providers (OpenAI, Claude, Groq, OpenRouter) send your transcript to remote servers for processing - fast and high quality, but your meeting data leaves your machine. Local providers (Ollama, LM Studio, vLLM, Custom Server) process everything on your hardware - fully private and offline-capable, but require sufficient RAM and CPU/GPU. Both use the same OpenAI-compatible API format, so switching between them only requires changing the endpoint URL and model name in Meetily settings.

Yes. Once you have Meetily installed and a local LLM server running (such as Ollama or LM Studio), the entire workflow is offline: system audio capture, Whisper transcription, and LLM summarization all happen on your device with zero internet connectivity required. Only the initial downloads of Meetily and the AI models require internet. This makes it ideal for air-gapped environments, confidential meetings, or locations with unreliable connectivity.

Install Ollama from ollama.com, then pull a model by running "ollama pull llama3.1" in your terminal. Ollama runs automatically on http://localhost:11434. In Meetily, go to Settings → Model Settings and select "Ollama" from the provider dropdown (not Custom Server). Meetily auto-detects all models available on your Ollama server. Select your model and save. That is it - no Pro license needed, no API keys, completely free.

Minimum: 8 GB RAM with a modern CPU (Intel i5 / AMD Ryzen 5 or better) for small models like Llama 3.2 3B or Phi-3 Mini. Recommended: 16 GB RAM for 7B–8B parameter models, which give the best meeting summarization quality. For larger models (70B), you need 32 GB+ RAM or a dedicated GPU. GPU acceleration (NVIDIA CUDA or Apple Metal) significantly improves inference speed but is not strictly required. Quantized model variants (Q4, Q5) reduce memory needs while maintaining good quality.

Yes. Meetily lets you change your provider and model at any time in Settings → Model Settings. You can use Ollama for quick local summaries, switch to OpenAI or Claude for higher-quality results when online, or point to a Custom Server running a specialized fine-tuned model. Each provider's configuration is saved, so switching is instant. Your existing transcripts and summaries are not affected when you change providers.

First, verify your server is running by opening the endpoint URL in a browser (e.g. http://localhost:1234/v1/models should return JSON). Check that both the endpoint URL and model name are filled in - both are required. Confirm /v1 is included in the URL. Make sure no firewall is blocking the port. If summaries are empty or malformed, try a larger model (7B+ parameters) and set temperature below 1.0. If the Custom Server option is grayed out, confirm your Pro license is activated and not expired.

Custom APIs & Local LLMs

Frequently Asked Questions

Ready to get started?