Add scripts and documentation for local inference configuration with Ollama and mlx-omni-server

- Introduced `configure_local_inference.sh` to automatically set `.dev.vars` based on active local inference services. - Updated `start_inference_server.sh` to handle both Ollama and mlx-omni-server server types. - Enhanced `package.json` to include new commands for starting and configuring inference servers. - Refined README to include updated instructions for running and adding models for local inference. - Minor cleanup in `MessageBubble.tsx`.
2025-09-08 22:56:46 +00:00 · 2025-06-02 12:38:50 -04:00
parent f2d91e2752
commit 9e8b427826
5 changed files with 93 additions and 32 deletions
--- a/README.md
+++ b/README.md
@@ -6,17 +6,18 @@
  <img src="https://github.com/user-attachments/assets/620d2517-e7be-4bb0-b2b7-3aa0cba37ef0" width="250" />
 </p>

-## Project Status: Testing
-
 ## Table of Contents
+
 - [Stack](#stack)
 - [Installation](#installation)
 - [Deployment](#deployment)
 - [Local Inference](#local-inference)
    - [Ollama](#ollama)
+        - [Adding models for local inference (ollama)](#adding-models-for-local-inference-ollama)
    - [mlx-omni-server (Apple Silicon Only)](#mlx-omni-server-apple-silicon-only)
        - [Adding models for local inference (Apple Silicon)](#adding-models-for-local-inference-apple-silicon)
 - [Testing](#testing)
+- [Troubleshooting](#troubleshooting)
 - [History](#history)
 - [License](#license)

@@ -51,26 +52,33 @@
 > Local inference is achieved by overriding the `OPENAI_API_KEY` and `OPENAI_API_ENDPOINT` environment variables. See below.
 ### Ollama
 ~~~bash
-docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama ## Run Ollama (Can also be installed natively)
-bun run openai:local                         # Start OpenAI-compatible server
-sed -i '' '/^OPENAI_API_KEY=/d' .dev.vars; echo >> .dev.vars; echo 'OPENAI_API_KEY=required-but-not-used' >> .dev.vars # Reset API key
-sed -i '' '/^OPENAI_API_ENDPOINT=/d' .dev.vars; echo >> .dev.vars; echo 'OPENAI_API_ENDPOINT=http://localhost:11434' >> .dev.vars # Reset endpoint
-bun run server:dev                           # Start dev server
+bun run openai:local ollama                  # Start ollama server
+bun run openai:local:enable                  # Configure connection
+bun run server:dev                           # Restart server
 ~~~
+#### Adding models for local inference (ollama)

+~~~bash
+# See https://ollama.com/library for available models
+MODEL_TO_ADD=gemma3 
+docker exec -it ollama ollama run ${MODEL_TO_ADD}
+~~~  
 ### mlx-omni-server (Apple Silicon Only)
 ~~~bash
-brew tap seemueller-io/tap                   # Add seemueller-io tap
-brew install seemueller-io/tap/mlx-omni-server # Install mlx-omni-server
-bun run openai:local                         # Start OpenAI-compatible server
-sed -i '' '/^OPENAI_API_KEY=/d' .dev.vars; echo >> .dev.vars; echo 'OPENAI_API_KEY=required-but-not-used' >> .dev.vars # Reset API key
-sed -i '' '/^OPENAI_API_ENDPOINT=/d' .dev.vars; echo >> .dev.vars; echo 'OPENAI_API_ENDPOINT=http://localhost:10240' >> .dev.vars # Reset endpoint
-bun run server:dev                           # Start dev server
+# (prereq) install mlx-omni-server
+brew tap seemueller-io/tap                   
+brew install seemueller-io/tap/mlx-omni-server 
+
+bun run openai:local mlx-omni-server         # Start mlx-omni-server
+bun run openai:local:enable                  # Configure connection
+bun run server:dev                           # Restart server
 ~~~
 #### Adding models for local inference (Apple Silicon)

 ~~~bash
-# ensure mlx-omni-server is running in the background 
+# ensure mlx-omni-server is running
+
+# See https://huggingface.co/mlx-community for available models
 MODEL_TO_ADD=mlx-community/gemma-3-4b-it-8bit

 curl http://localhost:10240/v1/chat/completions \
@@ -81,15 +89,20 @@ curl http://localhost:10240/v1/chat/completions \
  }"
 ~~~  

-
-
-
 ## Testing

 Tests are located in `__tests__` directories next to the code they test. Testing is incomplete at this time.

 > `bun run test` will run all tests

+
+## Troubleshooting
+1.  `bun run clean`
+1.  `bun i`
+1.  `bun server:dev` 
+1.  `bun client:dev` 
+1. Submit an issue
+
 History
 ---
 A high-level overview for the development history of the parent repository, [geoff-seemueller-io](https://geoff.seemueller.io), is provided in [LEGACY.md](./LEGACY.md).