jpskill.com
💼 ビジネス コミュニティ

data-lakehouse

Design and implement data lakehouse architectures for scalable big data storage and analytics.

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o data-lakehouse.zip https://jpskill.com/download/22180.zip && unzip -o data-lakehouse.zip && rm data-lakehouse.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/22180.zip -OutFile "$d\data-lakehouse.zip"; Expand-Archive "$d\data-lakehouse.zip" -DestinationPath $d -Force; ri "$d\data-lakehouse.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して data-lakehouse.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → data-lakehouse フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-18
取得日時
2026-05-18
同梱ファイル
1
📖 Claude が読む原文 SKILL.md(中身を展開)

この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。

data-lakehouse

Purpose

This skill enables the design and implementation of data lakehouse architectures, combining data lakes with warehouse features for scalable big data storage and analytics. Use it to manage petabyte-scale data with ACID transactions, schema evolution, and optimized query performance on platforms like Delta Lake or Iceberg.

When to Use

  • When handling large-scale data ingestion from sources like S3 or Kafka, requiring both raw storage and structured querying.
  • For analytics workloads needing real-time updates, such as ETL pipelines in e-commerce or IoT data processing.
  • If you're integrating with Spark or Presto for SQL analytics on unstructured data.

Key Capabilities

  • Architecture Design: Generate blueprints for lakehouse setups, including partitioning strategies and metadata management (e.g., using Iceberg for table formats).
  • Data Ingestion: Support for batch and streaming ingestion with tools like Apache Spark, handling formats like Parquet or ORC.
  • Query Optimization: Implement caching and indexing for faster queries, such as creating Delta Lake tables with Z-order clustering.
  • Scalability: Auto-scale storage and compute resources via cloud APIs, e.g., AWS Glue for ETL jobs.
  • Security: Enforce row-level access controls using policies like AWS Lake Formation grants.

Usage Patterns

  • Pattern 1: For new lakehouse setup, invoke the skill to generate a configuration file, then use it to initialize storage. Example: Create a Delta Lake table from CSV data.
  • Pattern 2: In analytics workflows, use the skill to optimize queries by adding indexes, then execute via Spark SQL.
  • Pattern 3: For maintenance, periodically run checks for schema evolution and merge operations on existing tables.

Common Commands/API

Use the OpenClaw CLI or API for this skill. Authentication requires setting $DATA_LAKEHOUSE_API_KEY as an environment variable.

  • CLI Command: Initialize a lakehouse project:

    openclaw skill data-lakehouse init --project my-lakehouse --storage s3://my-bucket --engine delta

    This creates a basic configuration file with S3 bucket and Delta Lake engine.

  • API Endpoint: Create a table via POST request:

    curl -H "Authorization: Bearer $DATA_LAKEHOUSE_API_KEY" \
         -d '{"table_name": "sales_data", "format": "parquet", "partition_by": ["date"]}' \
         https://api.openclaw.ai/data-lakehouse/tables

    Response includes table metadata for immediate use.

  • Code Snippet: In Python, integrate with Spark:

    from pyspark.sql import SparkSession
    spark = SparkSession.builder.appName("lakehouse").getOrCreate()
    df = spark.read.format("delta").load("s3://my-bucket/sales_data")
    df.write.format("delta").mode("append").save("s3://my-bucket/sales_data")

    This appends data to a Delta table; ensure Spark is configured with AWS credentials.

  • Config Format: Use JSON for lakehouse configs, e.g.:

    {
      "storage": "s3://my-bucket",
      "engine": "iceberg",
      "auth": {"key": "$DATA_LAKEHOUSE_API_KEY"}
    }

    Load this via CLI: openclaw skill data-lakehouse apply --config path/to/config.json.

Integration Notes

  • With Other Skills: Link to "big-data" skills by passing outputs, e.g., pipe data from a Kafka ingestion skill into this one using Spark streaming.
  • External Tools: Integrate with AWS S3 by setting bucket policies; use --aws-region us-west-2 in CLI commands. For Spark, ensure dependencies like spark.delta are in your environment.
  • API Integration: When calling from other services, handle retries for rate limits; example: Use the same $DATA_LAKEHOUSE_API_KEY in chained API calls.
  • Dependency Management: Always specify versions, e.g., require Spark 3.0+ for Delta Lake compatibility.

Error Handling

  • Common Errors: Handle authentication failures by checking if $DATA_LAKEHOUSE_API_KEY is set; use os.environ.get('DATA_LAKEHOUSE_API_KEY') in scripts.
  • Prescriptive Steps: For storage access errors (e.g., S3 permissions), add try-except blocks:
    try:
        spark.read.format("delta").load("s3://my-bucket/data")
    except Exception as e:
        print(f"Error: {e}. Check bucket permissions and retry.")

    Retry transient errors like network issues with exponential backoff in API calls.

  • Debugging: Use CLI flag --verbose for detailed logs, e.g., openclaw skill data-lakehouse init --verbose. Validate configs with openclaw skill data-lakehouse validate --file config.json.

Concrete Usage Examples

  • Example 1: Building a sales analytics lakehouse: First, initialize: openclaw skill data-lakehouse init --project sales-ana --storage s3://sales-data. Then, ingest data: Use the API to create a table, and append via Spark as shown above. Finally, query with: spark.sql("SELECT * FROM sales_data WHERE date > '2023-01-01'").

  • Example 2: Optimizing an existing lakehouse for IoT data: Run: openclaw skill data-lakehouse optimize --table iot_metrics --add-index date. This adds an index; verify with a query snippet: df = spark.read.format("iceberg").load("s3://iot-bucket/metrics").filter(df.date > current_date()). Monitor performance post-optimization.

Graph Relationships

  • Related to: "big-data" (shares tags for data processing pipelines), "data-engineering" (cluster affiliation for ETL workflows).
  • Connected via: Tags like "data-lakehouse" for cross-skill queries, and "data-engineering" cluster for sequential task flows.