🎨 画像AI コミュニティ

localai

LocalAIは、OpenAI互換APIでLLMや画像生成などをローカル環境で実行できるオープンソースツールで、GPUなしでもオフラインでプライベートに利用でき、開発者が手軽にAI機能を組み込めるように支援するSkill。

📜 元の英語説明(参考)

Expert guidance for LocalAI, the open-source drop-in replacement for OpenAI's API that runs locally. Helps developers self-host LLMs, image generators, audio transcription, and text-to-speech models with an OpenAI-compatible API — no GPU required, completely offline and private.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o localai.zip https://jpskill.com/download/15083.zip && unzip -o localai.zip && rm localai.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/15083.zip -OutFile "$d\localai.zip"; Expand-Archive "$d\localai.zip" -DestinationPath $d -Force; ri "$d\localai.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して localai.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → localai フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

LocalAI — セルフホスト型 OpenAI の代替

概要

LocalAI は、ローカルで実行される OpenAI の API のオープンソースのドロップイン代替品です。開発者が LLM、画像ジェネレーター、音声文字起こし、テキスト読み上げモデルを OpenAI 互換の API でセルフホストするのに役立ちます。GPU は不要で、完全にオフラインかつプライベートです。

手順

Docker でのクイックスタート

# Docker で LocalAI を実行します (CPU のみ、GPU は不要)
docker run -p 8080:8080 \
  -v ./models:/build/models \
  localai/localai:latest-cpu

# GPU サポートあり (NVIDIA CUDA)
docker run -p 8080:8080 --gpus all \
  -v ./models:/build/models \
  localai/localai:latest-gpu-nvidia-cuda-12

# 本番環境向けの Docker Compose

# docker-compose.yml — 本番環境 LocalAI のセットアップ
version: "3.8"
services:
  localai:
    image: localai/localai:latest-cpu
    ports:
      - "8080:8080"
    volumes:
      - ./models:/build/models
    environment:
      - THREADS=4                    # 推論用の CPU スレッド数
      - CONTEXT_SIZE=4096            # デフォルトのコンテキストウィンドウ
      - GALLERIES=[{"name":"model-gallery","url":"github:mudler/LocalAI/gallery/index.yaml@master"}]
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 30s
      timeout: 10s

モデルのインストール

# ギャラリーからモデルをインストールします (API 経由)
curl -X POST http://localhost:8080/models/apply \
  -H "Content-Type: application/json" \
  -d '{"id": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q5_K_M.gguf"}'

# または、GGUF ファイルを models ディレクトリに直接ダウンロードします
wget -P ./models/ \
  https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf

# モデル構成を作成します
cat > ./models/mistral.yaml << 'EOF'
name: mistral
backend: llama-cpp
parameters:
  model: mistral-7b-instruct-v0.2.Q5_K_M.gguf
  temperature: 0.7
  top_p: 0.9
  top_k: 40
  context_size: 8192
template:
  chat_message: |
    {{.RoleName}}: {{.Content}}
  chat: |
    [INST] {{.Input}} [/INST]
EOF

# 利用可能なモデルを一覧表示します
curl http://localhost:8080/v1/models | jq '.data[].id'

OpenAI 互換 API

// src/local-ai.ts — OpenAI SDK で LocalAI を使用する
import OpenAI from "openai";

const ai = new OpenAI({
  apiKey: "not-needed",
  baseURL: "http://localhost:8080/v1",
});

// チャット補完
async function chat(prompt: string) {
  const response = await ai.chat.completions.create({
    model: "mistral",                      // 構成からのモデル名
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: prompt },
    ],
    temperature: 0.7,
  });
  return response.choices[0].message.content;
}

// 埋め込み
async function embed(texts: string[]) {
  const response = await ai.embeddings.create({
    model: "text-embedding-ada-002",       // ローカル埋め込みモデルにマッピング
    input: texts,
  });
  return response.data.map(d => d.embedding);
}

// 画像生成 (Stable Diffusion バックエンド)
async function generateImage(prompt: string) {
  const response = await ai.images.generate({
    model: "stablediffusion",
    prompt,
    n: 1,
    size: "512x512",
  });
  return response.data[0].url;
}

// 音声文字起こし (Whisper バックエンド)
async function transcribe(audioPath: string) {
  const response = await ai.audio.transcriptions.create({
    model: "whisper-1",
    file: fs.createReadStream(audioPath),
  });
  return response.text;
}

// テキスト読み上げ
async function textToSpeech(text: string) {
  const response = await ai.audio.speech.create({
    model: "tts-1",
    voice: "alloy",
    input: text,
  });
  const buffer = Buffer.from(await response.arrayBuffer());
  fs.writeFileSync("output.mp3", buffer);
}

複数モデル構成

# models/chat-model.yaml — チャットモデル
name: chat
backend: llama-cpp
parameters:
  model: llama-3.1-8b-instruct.Q5_K_M.gguf
  context_size: 8192
  threads: 4
  gpu_layers: 0                            # 0 = CPU のみ、GPU オフロードの場合は増加

---
# models/code-model.yaml — コード補完モデル
name: code
backend: llama-cpp
parameters:
  model: codellama-7b-instruct.Q5_K_M.gguf
  context_size: 16384
  threads: 4

---
# models/embedding-model.yaml — 埋め込みモデル
name: embedding
backend: sentencetransformers
parameters:
  model: all-MiniLM-L6-v2

---
# models/whisper-model.yaml — 音声文字起こし
name: whisper-1
backend: whisper
parameters:
  model: whisper-base.bin
  language: en

関数呼び出し

// LocalAI は互換性のあるモデルでの関数呼び出しをサポートしています
async function chatWithFunctions(prompt: string) {
  const response = await ai.chat.completions.create({
    model: "mistral",
    messages: [{ role: "user", content: prompt }],
    tools: [
      {
        type: "function",
        function: {
          name: "get_current_weather",
          description: "場所の天気を取得します",
          parameters: {
            type: "object",
            properties: {
              location: { type: "string" },
              unit: { type: "string", enum: ["celsius", "fahrenheit"] },
            },
            required: ["location"],
          },
        },
      },
    ],
    tool_choice: "auto",
  });
  return response;
}

インストール

# Docker (推奨)
docker pull localai/localai:latest-cpu

# バイナリ (Linux/macOS)
curl -Lo local-ai https://github.com/mudler/LocalAI/releases/latest/download/local-ai-$(uname -s)-$(uname -m)
chmod +x local-ai
./local-ai --models-path ./models

# Homebrew (macOS)
brew install localai

例

例 1: 既存のアプリケーションへの Localai の統合

ユーザーリクエスト:

AI チャット機能のために、Localai を Next.js アプリに追加します。ストリーミング応答が必要です。

エージェントは SDK をインストールし、Localai クライアントを初期化する API ルートを作成し、ストリーミングを構成し、適切なモデルを選択し、フロントエンドを接続します。

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

LocalAI — Self-Hosted OpenAI Alternative

Overview

LocalAI, the open-source drop-in replacement for OpenAI's API that runs locally. Helps developers self-host LLMs, image generators, audio transcription, and text-to-speech models with an OpenAI-compatible API — no GPU required, completely offline and private.

Instructions

Quick Start with Docker

# Run LocalAI with Docker (CPU-only, no GPU needed)
docker run -p 8080:8080 \
  -v ./models:/build/models \
  localai/localai:latest-cpu

# With GPU support (NVIDIA CUDA)
docker run -p 8080:8080 --gpus all \
  -v ./models:/build/models \
  localai/localai:latest-gpu-nvidia-cuda-12

# Docker Compose for production

# docker-compose.yml — Production LocalAI setup
version: "3.8"
services:
  localai:
    image: localai/localai:latest-cpu
    ports:
      - "8080:8080"
    volumes:
      - ./models:/build/models
    environment:
      - THREADS=4                    # CPU threads for inference
      - CONTEXT_SIZE=4096            # Default context window
      - GALLERIES=[{"name":"model-gallery","url":"github:mudler/LocalAI/gallery/index.yaml@master"}]
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 30s
      timeout: 10s

Model Installation

# Install models from the gallery (via API)
curl -X POST http://localhost:8080/models/apply \
  -H "Content-Type: application/json" \
  -d '{"id": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q5_K_M.gguf"}'

# Or download GGUF files directly into the models directory
wget -P ./models/ \
  https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf

# Create a model configuration
cat > ./models/mistral.yaml << 'EOF'
name: mistral
backend: llama-cpp
parameters:
  model: mistral-7b-instruct-v0.2.Q5_K_M.gguf
  temperature: 0.7
  top_p: 0.9
  top_k: 40
  context_size: 8192
template:
  chat_message: |
    {{.RoleName}}: {{.Content}}
  chat: |
    [INST] {{.Input}} [/INST]
EOF

# List available models
curl http://localhost:8080/v1/models | jq '.data[].id'

OpenAI-Compatible API

// src/local-ai.ts — Use LocalAI with OpenAI SDK
import OpenAI from "openai";

const ai = new OpenAI({
  apiKey: "not-needed",
  baseURL: "http://localhost:8080/v1",
});

// Chat completions
async function chat(prompt: string) {
  const response = await ai.chat.completions.create({
    model: "mistral",                      // Model name from config
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: prompt },
    ],
    temperature: 0.7,
  });
  return response.choices[0].message.content;
}

// Embeddings
async function embed(texts: string[]) {
  const response = await ai.embeddings.create({
    model: "text-embedding-ada-002",       // Mapped to local embedding model
    input: texts,
  });
  return response.data.map(d => d.embedding);
}

// Image generation (Stable Diffusion backend)
async function generateImage(prompt: string) {
  const response = await ai.images.generate({
    model: "stablediffusion",
    prompt,
    n: 1,
    size: "512x512",
  });
  return response.data[0].url;
}

// Audio transcription (Whisper backend)
async function transcribe(audioPath: string) {
  const response = await ai.audio.transcriptions.create({
    model: "whisper-1",
    file: fs.createReadStream(audioPath),
  });
  return response.text;
}

// Text-to-speech
async function textToSpeech(text: string) {
  const response = await ai.audio.speech.create({
    model: "tts-1",
    voice: "alloy",
    input: text,
  });
  const buffer = Buffer.from(await response.arrayBuffer());
  fs.writeFileSync("output.mp3", buffer);
}

Multi-Model Configuration

# models/chat-model.yaml — Chat model
name: chat
backend: llama-cpp
parameters:
  model: llama-3.1-8b-instruct.Q5_K_M.gguf
  context_size: 8192
  threads: 4
  gpu_layers: 0                            # 0 = CPU only, increase for GPU offloading

---
# models/code-model.yaml — Code completion model
name: code
backend: llama-cpp
parameters:
  model: codellama-7b-instruct.Q5_K_M.gguf
  context_size: 16384
  threads: 4

---
# models/embedding-model.yaml — Embedding model
name: embedding
backend: sentencetransformers
parameters:
  model: all-MiniLM-L6-v2

---
# models/whisper-model.yaml — Audio transcription
name: whisper-1
backend: whisper
parameters:
  model: whisper-base.bin
  language: en

Function Calling

// LocalAI supports function calling with compatible models
async function chatWithFunctions(prompt: string) {
  const response = await ai.chat.completions.create({
    model: "mistral",
    messages: [{ role: "user", content: prompt }],
    tools: [
      {
        type: "function",
        function: {
          name: "get_current_weather",
          description: "Get the weather for a location",
          parameters: {
            type: "object",
            properties: {
              location: { type: "string" },
              unit: { type: "string", enum: ["celsius", "fahrenheit"] },
            },
            required: ["location"],
          },
        },
      },
    ],
    tool_choice: "auto",
  });
  return response;
}

Installation

# Docker (recommended)
docker pull localai/localai:latest-cpu

# Binary (Linux/macOS)
curl -Lo local-ai https://github.com/mudler/LocalAI/releases/latest/download/local-ai-$(uname -s)-$(uname -m)
chmod +x local-ai
./local-ai --models-path ./models

# Homebrew (macOS)
brew install localai

Examples

Example 1: Integrating Localai into an existing application

User request:

Add Localai to my Next.js app for the AI chat feature. I want streaming responses.

The agent installs the SDK, creates an API route that initializes the Localai client, configures streaming, selects an appropriate model, and wires up the frontend to consume the stream. It handles error cases and sets up proper environment variable management for the API key.

Example 2: Optimizing model installation performance

User request:

My Localai calls are slow and expensive. Help me optimize the setup.

The agent reviews the current implementation, identifies issues (wrong model selection, missing caching, inefficient prompting, no batching), and applies optimizations specific to Localai's capabilities — adjusting model parameters, adding response caching, and implementing retry logic with exponential backoff.

Guidelines

CPU is fine for most use cases — 7B models run well on CPU; GPU helps for 13B+ and image generation
Q5_K_M quantization — Best balance of quality and speed; Q4_K_M for faster inference, Q6_K for higher quality
One model per purpose — Run separate models for chat, embedding, and code; don't force one model to do everything
Docker for production — Use Docker Compose with health checks and restart policies; don't run the binary directly
OpenAI SDK compatibility — Your existing OpenAI code works with LocalAI; just change the base URL
Context size = memory — Each model uses ~(context_size × 2MB) RAM; set context_size based on available memory
Thread count = physical cores — Set THREADS to your physical CPU core count; hyperthreading doesn't help inference
Gallery for easy setup — Use the model gallery for one-click model installation instead of manual GGUF downloads