jpskill.com
🛠️ 開発・MCP コミュニティ

Natural Language Processing

Build NLP applications using transformers library, BERT, GPT, text classification, named entity recognition, and sentiment analysis

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o natural-language-processing.zip https://jpskill.com/download/21480.zip && unzip -o natural-language-processing.zip && rm natural-language-processing.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/21480.zip -OutFile "$d\natural-language-processing.zip"; Expand-Archive "$d\natural-language-processing.zip" -DestinationPath $d -Force; ri "$d\natural-language-processing.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して natural-language-processing.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → natural-language-processing フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-18
取得日時
2026-05-18
同梱ファイル
2

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

自然言語処理

概要

このスキルは、最新のTransformers、BERT、GPT、およびテキスト分類、固有表現認識、感情分析などの古典的なNLP技術を使用して、NLPアプリケーションを構築するための包括的なツールを提供します。

使用する場面

  • 感情分析、トピック分類、または意図検出のためのテキスト分類システムを構築する場合
  • 非構造化テキストから固有表現(人名、地名、組織名)を抽出する場合
  • 機械翻訳、テキスト要約、または質問応答システムを実装する場合
  • 大量のテキストデータを処理・分析して洞察を得る場合
  • チャットボット、バーチャルアシスタント、または会話型AIアプリケーションを作成する場合
  • ドメイン固有のNLPタスクのために事前学習済みTransformerモデルをファインチューニングする場合

NLPのコアタスク

  • テキスト分類: 感情、トピック、意図の分類
  • 固有表現認識: 人名、地名、組織名の識別
  • 機械翻訳: 言語間のテキスト翻訳
  • テキスト要約: 主要な情報の抽出
  • 質問応答: ドキュメント内の回答の検索
  • テキスト生成: 首尾一貫したテキストの生成

人気のモデルとライブラリ

  • Transformers: BERT, GPT, RoBERTa, T5
  • spaCy: 産業用NLPパイプライン
  • NLTK: 古典的なNLPツールキット
  • Hugging Face: 事前学習済みモデルのハブ
  • PyTorch/TensorFlow: ディープラーニングフレームワーク

Pythonによる実装

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter
import re
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
import torch
from transformers import (AutoTokenizer, AutoModelForSequenceClassification,
                         AutoModelForTokenClassification, pipeline,
                         TextClassificationPipeline)
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import warnings
warnings.filterwarnings('ignore')

# Download required NLTK resources
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

print("=== 1. Text Preprocessing ===")

def preprocess_text(text, remove_stopwords=True, lemmatize=True):
    """Complete text preprocessing pipeline"""
    # Lowercase
    text = text.lower()

    # Remove special characters and digits
    text = re.sub(r'[^a-zA-Z\s]', '', text)

    # Tokenize
    tokens = word_tokenize(text)

    # Remove stopwords
    if remove_stopwords:
        stop_words = set(stopwords.words('english'))
        tokens = [t for t in tokens if t not in stop_words]

    # Lemmatize
    if lemmatize:
        lemmatizer = WordNetLemmatizer()
        tokens = [lemmatizer.lemmatize(t) for t in tokens]

    return tokens, ' '.join(tokens)

sample_text = "The quick brown foxes are jumping over the lazy dogs! Amazing performance."
tokens, processed = preprocess_text(sample_text)
print(f"Original: {sample_text}")
print(f"Processed: {processed}")
print(f"Tokens: {tokens}\n")

# 2. Text Classification with sklearn
print("=== 2. Traditional Text Classification ===")

# Sample data
texts = [
    "I love this product, it's amazing!",
    "This movie is fantastic and entertaining.",
    "Best purchase ever, highly recommended.",
    "Terrible quality, very disappointed.",
    "Worst experience, waste of money.",
    "Horrible service and poor quality.",
    "The food was delicious and fresh.",
    "Great atmosphere and friendly staff.",
    "Bad weather today, very gloomy.",
    "The book was boring and uninteresting."
]

labels = [1, 1, 1, 0, 0, 0, 1, 1, 0, 0]  # 1: positive, 0: negative

# TF-IDF vectorization
tfidf = TfidfVectorizer(max_features=100, ngram_range=(1, 2))
X_tfidf = tfidf.fit_transform(texts)

# Train classifier
clf = MultinomialNB()
clf.fit(X_tfidf, labels)

# Evaluate
predictions = clf.predict(X_tfidf)
print(f"Accuracy: {accuracy_score(labels, predictions):.4f}")
print(f"Precision: {precision_score(labels, predictions):.4f}")
print(f"Recall: {recall_score(labels, predictions):.4f}")
print(f"F1: {f1_score(labels, predictions):.4f}\n")

# 3. Transformer-based text classification
print("=== 3. Transformer-based Classification ===")

try:
    # Use Hugging Face transformers for sentiment analysis
    sentiment_pipeline = pipeline(
        "sentiment-analysis",
        model="distilbert-base-uncased-finetuned-sst-2-english"
    )

    test_sentences = [
        "This is a wonderful movie!",
        "I absolutely hate this product.",
        "It's okay, nothing special.",
        "Amazing quality and fast delivery!"
    ]

    print("Sentiment Analysis Results:")
    for sentence in test_sentences:
        result = sentiment_pipeline(sentence)
        print(f"  Text: {sentence}")
        print(f"  Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}\n")

except Exception as e:
    print(f"Transformer model not available: {str(e)}\n")

# 4. Named Entity Recognition (NER)
print("=== 4. Named Entity Recognition ===")

try:
    ner_pipeline = pipeline(
        "ner",
        model="distilbert-base-uncased",
        aggregation_strategy="simple"
    )

    text = "Apple Inc. was founded by Steve Jobs in Cupertino, California."
    entities = ner_pipeline(text)

    print(f"Text: {text}")
    print("Entities:")
    for entity in entities:
        print(f"  {entity['word']}: {entity['entity_group']} (score: {entity['score']:.4f})")

except Exception as e:
    print(f"NER model not available: {str(e)}\n")

# 5. Word embeddings and similarity
print("\n=== 5. Word Embeddings and Similarity ===")

from sklearn.metrics.pairwise import cosine_similarity

# Simple bag-of-words embeddings
vectorizer = CountVectorizer(max_features=50)
docs = [
    "machine learning is great",
    "deep learning uses neural networks",
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Natural Language Processing

Overview

This skill provides comprehensive tools for building NLP applications using modern transformers, BERT, GPT, and classical NLP techniques for text classification, named entity recognition, sentiment analysis, and more.

When to Use

  • Building text classification systems for sentiment analysis, topic categorization, or intent detection
  • Extracting named entities (people, places, organizations) from unstructured text
  • Implementing machine translation, text summarization, or question answering systems
  • Processing and analyzing large volumes of textual data for insights
  • Creating chatbots, virtual assistants, or conversational AI applications
  • Fine-tuning pre-trained transformer models for domain-specific NLP tasks

NLP Core Tasks

  • Text Classification: Sentiment, topic, intent classification
  • Named Entity Recognition: Identifying people, places, organizations
  • Machine Translation: Text translation between languages
  • Text Summarization: Extracting key information
  • Question Answering: Finding answers in documents
  • Text Generation: Generating coherent text

Popular Models and Libraries

  • Transformers: BERT, GPT, RoBERTa, T5
  • spaCy: Industrial NLP pipeline
  • NLTK: Classic NLP toolkit
  • Hugging Face: Pre-trained models hub
  • PyTorch/TensorFlow: Deep learning frameworks

Python Implementation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter
import re
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
import torch
from transformers import (AutoTokenizer, AutoModelForSequenceClassification,
                         AutoModelForTokenClassification, pipeline,
                         TextClassificationPipeline)
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import warnings
warnings.filterwarnings('ignore')

# Download required NLTK resources
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

print("=== 1. Text Preprocessing ===")

def preprocess_text(text, remove_stopwords=True, lemmatize=True):
    """Complete text preprocessing pipeline"""
    # Lowercase
    text = text.lower()

    # Remove special characters and digits
    text = re.sub(r'[^a-zA-Z\s]', '', text)

    # Tokenize
    tokens = word_tokenize(text)

    # Remove stopwords
    if remove_stopwords:
        stop_words = set(stopwords.words('english'))
        tokens = [t for t in tokens if t not in stop_words]

    # Lemmatize
    if lemmatize:
        lemmatizer = WordNetLemmatizer()
        tokens = [lemmatizer.lemmatize(t) for t in tokens]

    return tokens, ' '.join(tokens)

sample_text = "The quick brown foxes are jumping over the lazy dogs! Amazing performance."
tokens, processed = preprocess_text(sample_text)
print(f"Original: {sample_text}")
print(f"Processed: {processed}")
print(f"Tokens: {tokens}\n")

# 2. Text Classification with sklearn
print("=== 2. Traditional Text Classification ===")

# Sample data
texts = [
    "I love this product, it's amazing!",
    "This movie is fantastic and entertaining.",
    "Best purchase ever, highly recommended.",
    "Terrible quality, very disappointed.",
    "Worst experience, waste of money.",
    "Horrible service and poor quality.",
    "The food was delicious and fresh.",
    "Great atmosphere and friendly staff.",
    "Bad weather today, very gloomy.",
    "The book was boring and uninteresting."
]

labels = [1, 1, 1, 0, 0, 0, 1, 1, 0, 0]  # 1: positive, 0: negative

# TF-IDF vectorization
tfidf = TfidfVectorizer(max_features=100, ngram_range=(1, 2))
X_tfidf = tfidf.fit_transform(texts)

# Train classifier
clf = MultinomialNB()
clf.fit(X_tfidf, labels)

# Evaluate
predictions = clf.predict(X_tfidf)
print(f"Accuracy: {accuracy_score(labels, predictions):.4f}")
print(f"Precision: {precision_score(labels, predictions):.4f}")
print(f"Recall: {recall_score(labels, predictions):.4f}")
print(f"F1: {f1_score(labels, predictions):.4f}\n")

# 3. Transformer-based text classification
print("=== 3. Transformer-based Classification ===")

try:
    # Use Hugging Face transformers for sentiment analysis
    sentiment_pipeline = pipeline(
        "sentiment-analysis",
        model="distilbert-base-uncased-finetuned-sst-2-english"
    )

    test_sentences = [
        "This is a wonderful movie!",
        "I absolutely hate this product.",
        "It's okay, nothing special.",
        "Amazing quality and fast delivery!"
    ]

    print("Sentiment Analysis Results:")
    for sentence in test_sentences:
        result = sentiment_pipeline(sentence)
        print(f"  Text: {sentence}")
        print(f"  Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}\n")

except Exception as e:
    print(f"Transformer model not available: {str(e)}\n")

# 4. Named Entity Recognition (NER)
print("=== 4. Named Entity Recognition ===")

try:
    ner_pipeline = pipeline(
        "ner",
        model="distilbert-base-uncased",
        aggregation_strategy="simple"
    )

    text = "Apple Inc. was founded by Steve Jobs in Cupertino, California."
    entities = ner_pipeline(text)

    print(f"Text: {text}")
    print("Entities:")
    for entity in entities:
        print(f"  {entity['word']}: {entity['entity_group']} (score: {entity['score']:.4f})")

except Exception as e:
    print(f"NER model not available: {str(e)}\n")

# 5. Word embeddings and similarity
print("\n=== 5. Word Embeddings and Similarity ===")

from sklearn.metrics.pairwise import cosine_similarity

# Simple bag-of-words embeddings
vectorizer = CountVectorizer(max_features=50)
docs = [
    "machine learning is great",
    "deep learning uses neural networks",
    "machine learning and deep learning"
]

embeddings = vectorizer.fit_transform(docs).toarray()

# Compute similarity
similarity_matrix = cosine_similarity(embeddings)
print("Document Similarity Matrix:")
print(pd.DataFrame(similarity_matrix, columns=[f"Doc{i}" for i in range(len(docs))],
                  index=[f"Doc{i}" for i in range(len(docs))]).round(3))

# 6. Tokenization and vocabulary
print("\n=== 6. Tokenization Analysis ===")

corpus = " ".join(texts)
tokens, _ = preprocess_text(corpus)

# Vocabulary
vocab = Counter(tokens)
print(f"Vocabulary size: {len(vocab)}")
print("Top 10 most common words:")
for word, count in vocab.most_common(10):
    print(f"  {word}: {count}")

# 7. Advanced Transformer pipeline
print("\n=== 7. Advanced NLP Tasks ===")

try:
    # Zero-shot classification
    zero_shot_pipeline = pipeline(
        "zero-shot-classification",
        model="facebook/bart-large-mnli"
    )

    sequence = "Apple is discussing the possibility of acquiring startup for 1 billion dollars"
    candidate_labels = ["business", "sports", "technology", "politics"]

    result = zero_shot_pipeline(sequence, candidate_labels)
    print("Zero-shot Classification Results:")
    for label, score in zip(result['labels'], result['scores']):
        print(f"  {label}: {score:.4f}")

except Exception as e:
    print(f"Advanced pipeline not available: {str(e)}\n")

# 8. Text statistics and analysis
print("\n=== 8. Text Statistics ===")

sample_texts = [
    "Natural language processing is fascinating.",
    "Machine learning enables artificial intelligence.",
    "Deep learning revolutionizes computer vision."
]

stats_data = []
for text in sample_texts:
    words = text.split()
    chars = len(text)
    avg_word_len = np.mean([len(w) for w in words])

    stats_data.append({
        'Text': text[:40] + '...' if len(text) > 40 else text,
        'Words': len(words),
        'Characters': chars,
        'Avg Word Len': avg_word_len
    })

stats_df = pd.DataFrame(stats_data)
print(stats_df.to_string(index=False))

# 9. Visualization
print("\n=== 9. NLP Visualization ===")

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Word frequency
word_freq = vocab.most_common(15)
words, freqs = zip(*word_freq)
axes[0, 0].barh(range(len(words)), freqs, color='steelblue')
axes[0, 0].set_yticks(range(len(words)))
axes[0, 0].set_yticklabels(words)
axes[0, 0].set_xlabel('Frequency')
axes[0, 0].set_title('Top 15 Most Frequent Words')
axes[0, 0].invert_yaxis()

# Sentiment distribution
sentiments = ['Positive', 'Negative', 'Positive', 'Negative', 'Positive']
sentiment_counts = Counter(sentiments)
axes[0, 1].pie(sentiment_counts.values(), labels=sentiment_counts.keys(),
              autopct='%1.1f%%', colors=['green', 'red'])
axes[0, 1].set_title('Sentiment Distribution')

# Document similarity heatmap
im = axes[1, 0].imshow(similarity_matrix, cmap='YlOrRd', aspect='auto')
axes[1, 0].set_xticks(range(len(docs)))
axes[1, 0].set_yticks(range(len(docs)))
axes[1, 0].set_xticklabels([f'Doc{i}' for i in range(len(docs))])
axes[1, 0].set_yticklabels([f'Doc{i}' for i in range(len(docs))])
axes[1, 0].set_title('Document Similarity Heatmap')
plt.colorbar(im, ax=axes[1, 0])

# Text length distribution
text_lengths = [len(t.split()) for t in texts]
axes[1, 1].hist(text_lengths, bins=5, color='coral', edgecolor='black')
axes[1, 1].set_xlabel('Number of Words')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Text Length Distribution')
axes[1, 1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('nlp_analysis.png', dpi=100, bbox_inches='tight')
print("\nNLP visualization saved as 'nlp_analysis.png'")

# 10. Summary
print("\n=== NLP Summary ===")
print(f"Texts processed: {len(texts)}")
print(f"Unique vocabulary: {len(vocab)} words")
print(f"Average text length: {np.mean([len(t.split()) for t in texts]):.2f} words")
print(f"Classification accuracy: {accuracy_score(labels, predictions):.4f}")

print("\nNatural language processing setup completed!")

Common NLP Tasks and Models

  • Classification: DistilBERT, RoBERTa, ELECTRA
  • NER: BioBERT, SciBERT, spaCy models
  • Translation: MarianMT, M2M-100
  • Summarization: BART, Pegasus, T5
  • QA: BERT, RoBERTa, DeBERTa

Text Preprocessing Pipeline

  1. Lowercasing and cleaning
  2. Tokenization
  3. Stopword removal
  4. Lemmatization/Stemming
  5. Vectorization

Best Practices

  • Use pre-trained models when available
  • Fine-tune on task-specific data
  • Handle out-of-vocabulary words
  • Batch process for efficiency
  • Monitor for bias in models

Deliverables

  • Trained NLP model
  • Text classification results
  • Named entities extracted
  • Performance metrics
  • Visualization dashboard
  • Inference API

同梱ファイル

※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。