Llama 3 - Metaのオープンソース大規模言語モデル | 最新情報

Llama 3とは

Llama 3は、Meta（旧Facebook）が開発・公開したオープンソースの大規模言語モデルです。商用利用可能なライセンスで公開されており、ローカル環境での実行やファインチューニングが可能です。

モデルバリエーション

モデル	パラメータ	用途
Llama 3 8B	80億	軽量、ローカル実行向け
Llama 3 70B	700億	高性能、サーバー向け
Llama 3 8B Instruct	80億	対話・指示向け調整済み
Llama 3 70B Instruct	700億	対話・指示向け調整済み

ベンチマーク

MMLU (知識):
- GPT-4: 86.4%
- Llama 3 70B: 82.0%
- Llama 3 8B: 68.4%

HumanEval (コーディング):
- GPT-4: 67.0%
- Llama 3 70B: 62.5%
- Llama 3 8B: 45.8%

ローカルでの実行

Ollama

# Ollamaのインストール
curl -fsSL https://ollama.ai/install.sh | sh

# Llama 3 8Bを実行
ollama run llama3:8b

# 対話モード
>>> What is the capital of France?
The capital of France is Paris.

Python (transformers)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing simply."}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

llama.cpp（C++実装）

# llama.cppをビルド
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# GGUFモデルをダウンロード
# Hugging Faceからquantized版を取得

# 実行
./main -m models/llama-3-8b-instruct.Q4_K_M.gguf \
  -p "What is machine learning?" \
  -n 256

APIサービスとして使用

vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Meta-Llama-3-8B-Instruct")
sampling_params = SamplingParams(temperature=0.7, max_tokens=256)

prompts = [
    "Explain the theory of relativity",
    "Write a Python function to sort a list"
]

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(output.outputs[0].text)

OpenAI互換API（Ollama）

# サーバー起動
ollama serve

# OpenAI互換エンドポイント
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

// JavaScript/TypeScriptから使用
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama'  // Ollamaは認証不要
});

const response = await openai.chat.completions.create({
  model: 'llama3:8b',
  messages: [{ role: 'user', content: 'Hello!' }]
});

ファインチューニング

LoRAを使用

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B",
    torch_dtype=torch.bfloat16
)

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05
)

model = get_peft_model(model, lora_config)
# トレーニングを実行...

ユースケース

RAGアプリケーション

from langchain_community.llms import Ollama
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

llm = Ollama(model="llama3:8b")
vectorstore = Chroma(...)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)

result = qa_chain.invoke("What is our return policy?")

コード生成

prompt = """
Write a Python function that:
1. Takes a list of numbers
2. Filters out negative numbers
3. Returns the sum of remaining numbers

Include docstring and type hints.
"""

response = llm.generate(prompt)

ハードウェア要件

モデル	VRAM	RAM
8B (FP16)	16GB	32GB
8B (Q4)	6GB	16GB
70B (FP16)	140GB	256GB
70B (Q4)	40GB	64GB

まとめ

Llama 3は、オープンソースで商用利用可能な強力な大規模言語モデルです。ローカル実行、ファインチューニング、APIサービス化など、柔軟な活用が可能です。プライバシーを重視するアプリケーションや、カスタマイズが必要なユースケースに最適です。

この技術を体系的に学びたいですか？

未来学では東証プライム上場企業のITエンジニアが24時間サポート。月額24,800円から、退会金0円のオンラインIT塾です。

LINEで無料相談する

← 一覧に戻る

Llama 3とは

モデルバリエーション

ベンチマーク

ローカルでの実行

Ollama

Python (transformers)

llama.cpp（C++実装）

APIサービスとして使用

vLLM

OpenAI互換API（Ollama）

ファインチューニング

LoRAを使用

ユースケース

RAGアプリケーション

コード生成

ハードウェア要件

まとめ

おすすめ記事

AIエージェント 2025 - 自律的にタスクを遂行するAIの最前線

AIサイバーセキュリティ 2025 - 攻撃と防御のAI軍拡競争

AI動画生成 2025 - Sora・Runway・Veo 3の徹底比較

AIOps 2025 - AI駆動のIT運用自動化

エッジコンピューティング最新動向2025 - 5G/6Gとの融合で加速する分散処理

ローカルLLM 2025 - Llama・Mistralのオンデバイス実行ガイド