📦 その他コミュニティ

scrape-webpage

ウェブページの内容を収集し、必要な情報を抜き出して整理、画像もダウンロードして、AEM Edge Delivery Servicesへの移行準備を整える分析結果JSONを作成するSkill。

📜 元の英語説明(参考)

Scrape webpage content, extract metadata, download images, and prepare for import/migration to AEM Edge Delivery Services. Returns analysis JSON with paths, metadata, cleaned HTML, and local images.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o scrape-webpage.zip https://jpskill.com/download/9688.zip && unzip -o scrape-webpage.zip && rm scrape-webpage.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9688.zip -OutFile "$d\scrape-webpage.zip"; Expand-Archive "$d\scrape-webpage.zip" -DestinationPath $d -Force; ri "$d\scrape-webpage.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して scrape-webpage.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → scrape-webpage フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

ウェブページのスクレイピング

インポート/移行のために、ウェブページからコンテンツ、メタデータ、画像を抽出します。

この Skill の使用時

この Skill は以下の場合に使用します。

ページインポートを開始し、ソース URL からコンテンツを抽出する必要がある場合
ローカルイメージのダウンロードによるウェブページ分析が必要な場合
メタデータ抽出（Open Graph、JSON-LD など）が必要な場合

呼び出し元: page-import skill (ステップ 1)

前提条件

この Skill を使用する前に、以下を確認してください。

✅ Node.js が利用可能であること
✅ npm playwright がインストールされていること (npm install playwright)
✅ Chromium ブラウザがインストールされていること (npx playwright install chromium)
✅ Sharp image ライブラリがインストールされていること (cd .claude/skills/scrape-webpage/scripts && npm install)

スクレイピングのワークフロー

ステップ 1: 分析スクリプトの実行

コマンド:

node .claude/skills/scrape-webpage/scripts/analyze-webpage.js "https://example.com/page" --output ./import-work

スクリプトの動作:

すべてのイメージをキャプチャするために、ネットワーク傍受を設定します
ヘッドレス Chromium でページをロードします
遅延ロードされたイメージをトリガーするために、ページ全体をスクロールします
すべてのイメージをローカルにダウンロードします（WebP/AVIF/SVG を PNG に変換）
視覚的な参照のために、フルページのスクリーンショットをキャプチャします
メタデータ（タイトル、説明、Open Graph、JSON-LD、canonical）を抽出します
DOM 内のイメージを修正します (background-image→img, picture elements, srcset→src, relative→absolute, inline SVG→img)
クリーンな HTML を抽出します（スクリプト/スタイルを削除）
HTML 内のイメージ URL をローカルパス (./images/...) に置き換えます
ドキュメントパスを生成します（サニタイズ、小文字化、.html 拡張子なし）
イメージマッピングを含む完全な分析を metadata.json に保存します

詳細な説明: resources/web-page-analysis.md を参照してください

ステップ 2: 出力の検証

出力ファイル:

./import-work/metadata.json - パスとイメージマッピングを含む完全な分析
./import-work/screenshot.png - レイアウト比較のための視覚的な参照
./import-work/cleaned.html - ローカルイメージパスを含むメインコンテンツ HTML
./import-work/images/ - ダウンロードされたすべてのイメージ（WebP/AVIF/SVG は PNG に変換）

ファイルの存在を確認:

ls -lh ./import-work/metadata.json ./import-work/screenshot.png ./import-work/cleaned.html
ls -lh ./import-work/images/ | head -5

ステップ 3: メタデータ JSON のレビュー

出力 JSON 構造:

{
  "url": "https://example.com/page",
  "timestamp": "2025-01-12T10:30:00.000Z",
  "paths": {
    "documentPath": "/us/en/about",
    "htmlFilePath": "us/en/about.plain.html",
    "mdFilePath": "us/en/about.md",
    "dirPath": "us/en",
    "filename": "about"
  },
  "screenshot": "./import-work/screenshot.png",
  "html": {
    "filePath": "./import-work/cleaned.html",
    "size": 45230
  },
  "metadata": {
    "title": "Page Title",
    "description": "Page description",
    "og:image": "https://example.com/image.jpg",
    "canonical": "https://example.com/page"
  },
  "images": {
    "count": 15,
    "mapping": {
      "https://example.com/hero.jpg": "./images/a1b2c3d4e5f6.jpg",
      "https://example.com/logo.webp": "./images/f6e5d4c3b2a1.png"
    },
    "stats": {
      "total": 15,
      "converted": 3,
      "skipped": 12,
      "failed": 0
    }
  }
}

キーフィールド:

paths.documentPath - ブラウザプレビュー URL に使用
paths.htmlFilePath - 最終的な HTML ファイルの保存場所
images.mapping - 元の URL → ローカルパス
metadata - 抽出されたページメタデータ

出力

この Skill は以下を提供します。

✅ パス、メタデータ、イメージマッピングを含む metadata.json
✅ 視覚的な参照のための screenshot.png
✅ ローカルイメージ参照を含む cleaned.html
✅ ダウンロードされたすべてのイメージを含む images/ フォルダ

次のステップ: これらの出力を identify-page-structure skill に渡します

トラブルシューティング

ブラウザがインストールされていません:

npx playwright install chromium

Sharp がインストールされていません:

cd .claude/skills/scrape-webpage/scripts && npm install

イメージのダウンロードに失敗しました:

metadata.json で images.stats.failed カウントを確認してください
一部のイメージは認証が必要であったり、CORS によってブロックされたりする場合があります
失敗したイメージは記録されますが、スクレイピングプロセスは停止しません

遅延ロードされたイメージがキャプチャされません:

スクリプトはページをスクロールして遅延ロードをトリガーします
高度な遅延ロードには、scripts/analyze-webpage.js でのカスタマイズが必要になる場合があります

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Scrape Webpage

Extract content, metadata, and images from a webpage for import/migration.

When to Use This Skill

Use this skill when:

Starting a page import and need to extract content from source URL
Need webpage analysis with local image downloads
Want metadata extraction (Open Graph, JSON-LD, etc.)

Invoked by: page-import skill (Step 1)

Prerequisites

Before using this skill, ensure:

✅ Node.js is available
✅ npm playwright is installed (npm install playwright)
✅ Chromium browser is installed (npx playwright install chromium)
✅ Sharp image library is installed (cd .claude/skills/scrape-webpage/scripts && npm install)

Related Skills

page-import - Orchestrator that invokes this skill
identify-page-structure - Uses this skill's output (screenshot, HTML, metadata)
generate-import-html - Uses image mapping and paths from this skill

Scraping Workflow

Step 1: Run Analysis Script

Command:

node .claude/skills/scrape-webpage/scripts/analyze-webpage.js "https://example.com/page" --output ./import-work

What the script does:

Sets up network interception to capture all images
Loads page in headless Chromium
Scrolls through entire page to trigger lazy-loaded images
Downloads all images locally (converts WebP/AVIF/SVG to PNG)
Captures full-page screenshot for visual reference
Extracts metadata (title, description, Open Graph, JSON-LD, canonical)
Fixes images in DOM (background-image→img, picture elements, srcset→src, relative→absolute, inline SVG→img)
Extracts cleaned HTML (removes scripts/styles)
Replaces image URLs in HTML with local paths (./images/...)
Generates document paths (sanitized, lowercase, no .html extension)
Saves complete analysis with image mapping to metadata.json

For detailed explanation: See resources/web-page-analysis.md

Step 2: Verify Output

Output files:

./import-work/metadata.json - Complete analysis with paths and image mapping
./import-work/screenshot.png - Visual reference for layout comparison
./import-work/cleaned.html - Main content HTML with local image paths
./import-work/images/ - All downloaded images (WebP/AVIF/SVG converted to PNG)

Verify files exist:

ls -lh ./import-work/metadata.json ./import-work/screenshot.png ./import-work/cleaned.html
ls -lh ./import-work/images/ | head -5

Step 3: Review Metadata JSON

Output JSON structure:

{
  "url": "https://example.com/page",
  "timestamp": "2025-01-12T10:30:00.000Z",
  "paths": {
    "documentPath": "/us/en/about",
    "htmlFilePath": "us/en/about.plain.html",
    "mdFilePath": "us/en/about.md",
    "dirPath": "us/en",
    "filename": "about"
  },
  "screenshot": "./import-work/screenshot.png",
  "html": {
    "filePath": "./import-work/cleaned.html",
    "size": 45230
  },
  "metadata": {
    "title": "Page Title",
    "description": "Page description",
    "og:image": "https://example.com/image.jpg",
    "canonical": "https://example.com/page"
  },
  "images": {
    "count": 15,
    "mapping": {
      "https://example.com/hero.jpg": "./images/a1b2c3d4e5f6.jpg",
      "https://example.com/logo.webp": "./images/f6e5d4c3b2a1.png"
    },
    "stats": {
      "total": 15,
      "converted": 3,
      "skipped": 12,
      "failed": 0
    }
  }
}

Key fields:

paths.documentPath - Used for browser preview URL
paths.htmlFilePath - Where to save final HTML file
images.mapping - Original URLs → local paths
metadata - Extracted page metadata

Output

This skill provides:

✅ metadata.json with paths, metadata, image mapping
✅ screenshot.png for visual reference
✅ cleaned.html with local image references
✅ images/ folder with all downloaded images

Next step: Pass these outputs to identify-page-structure skill

Troubleshooting

Browser not installed:

npx playwright install chromium

Sharp not installed:

cd .claude/skills/scrape-webpage/scripts && npm install

Image download failures:

Check images.stats.failed count in metadata.json
Some images may require authentication or be blocked by CORS
Failed images will be noted but won't stop the scraping process

Lazy-loaded images not captured:

Script scrolls through page to trigger lazy loading
Some advanced lazy-loading may need customization in scripts/analyze-webpage.js