Skip to content

Meta Llama API完全指南

Llama是Meta开源的大语言模型,以其开放性和高性能成为开源AI的标杆

概述

Llama(Large Language Model Meta AI)是Meta推出的开源大语言模型系列,以其开放性、高性能和可本地部署的特点,成为开源AI社区的标杆。本教程将带你全面掌握Llama API的使用方法。

为什么选择Llama?

优势说明
完全开源模型权重完全开放,可自由使用
本地部署可在本地服务器部署,数据安全
高性能性能媲美闭源模型
社区活跃庞大的开源社区支持

Llama模型概览

Llama模型家族:

Llama 3.3系列(最新)
├── Llama-3.3-70B-Instruct    # 70B参数,性能优秀
└── Llama-3.3-70B             # 基础版本

Llama 3.2系列(多模态)
├── Llama-3.2-90B-Vision      # 多模态大模型
├── Llama-3.2-11B-Vision      # 多模态中等模型
├── Llama-3.2-3B-Instruct     # 轻量级模型
└── Llama-3.2-1B-Instruct     # 最小模型

Llama 3.1系列(经典)
├── Llama-3.1-405B-Instruct   # 最大开源模型
├── Llama-3.1-70B-Instruct    # 平衡选择
└── Llama-3.1-8B-Instruct     # 轻量级模型

基本概念

获取模型访问权限

php
<?php
// 1. 在Meta官网申请访问权限
// https://llama.meta.com/llama-downloads/

// 2. 在Hugging Face获取访问权限
// https://huggingface.co/meta-llama

// 3. 使用第三方API服务(如Together AI、Groq等)
$apiKey = getenv('LLAMA_API_KEY');
$baseUrl = 'https://api.together.xyz/v1'; // Together AI

部署方式选择

Llama部署方式:

1. 云端API服务
   ├── Together AI
   ├── Groq
   ├── Fireworks AI
   └── Replicate

2. 本地部署
   ├── Ollama
   ├── vLLM
   ├── llama.cpp
   └── Hugging Face Transformers

3. 云端私有部署
   ├── AWS Bedrock
   ├── Azure AI
   └── Google Cloud

环境准备

使用Together AI API

php
<?php
require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;

class LlamaClient
{
    private $client;
    private $apiKey;
    private $baseUrl;

    public function __construct(string $apiKey, string $provider = 'together')
    {
        $this->apiKey = $apiKey;

        $providers = [
            'together' => 'https://api.together.xyz/v1',
            'groq' => 'https://api.groq.com/openai/v1',
            'fireworks' => 'https://api.fireworks.ai/inference/v1',
        ];

        $this->baseUrl = $providers[$provider] ?? $providers['together'];

        $this->client = new Client([
            'base_uri' => $this->baseUrl,
            'timeout' => 120,
            'headers' => [
                'Authorization' => 'Bearer ' . $this->apiKey,
                'Content-Type' => 'application/json',
            ],
        ]);
    }

    public function chat(
        array $messages,
        string $model = 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
        array $options = []
    ): array {
        $params = [
            'model' => $model,
            'messages' => $messages,
        ];

        if (isset($options['temperature'])) {
            $params['temperature'] = $options['temperature'];
        }

        if (isset($options['max_tokens'])) {
            $params['max_tokens'] = $options['max_tokens'];
        }

        if (isset($options['top_p'])) {
            $params['top_p'] = $options['top_p'];
        }

        try {
            $response = $this->client->post('/chat/completions', [
                'json' => $params,
            ]);

            return json_decode($response->getBody(), true);
        } catch (RequestException $e) {
            $errorBody = $e->getResponse() ? $e->getResponse()->getBody()->getContents() : 'Unknown error';
            throw new Exception('Llama API Error: ' . $errorBody);
        }
    }
}

// 使用示例
$apiKey = getenv('TOGETHER_API_KEY');
$client = new LlamaClient($apiKey, 'together');

$result = $client->chat([
    ['role' => 'user', 'content' => '请用一句话介绍PHP语言']
]);

echo $result['choices'][0]['message']['content'];

运行结果:

PHP是一种开源的服务器端脚本语言,特别适合Web开发,可以嵌入HTML中执行。

本地部署Llama

使用Ollama部署

bash
# 安装Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 下载Llama模型
ollama pull llama3.3:70b
ollama pull llama3.2:3b

# 启动服务
ollama serve

PHP调用本地Ollama

php
<?php
class OllamaClient
{
    private $client;
    private $baseUrl = 'http://localhost:11434';

    public function __construct(string $baseUrl = 'http://localhost:11434')
    {
        $this->baseUrl = $baseUrl;
        $this->client = new Client([
            'base_uri' => $this->baseUrl,
            'timeout' => 120,
        ]);
    }

    public function chat(string $model, string $prompt): string
    {
        $response = $this->client->post('/api/generate', [
            'json' => [
                'model' => $model,
                'prompt' => $prompt,
                'stream' => false,
            ],
        ]);

        $result = json_decode($response->getBody(), true);
        return $result['response'];
    }

    public function chatWithHistory(string $model, array $messages): string
    {
        $response = $this->client->post('/api/chat', [
            'json' => [
                'model' => $model,
                'messages' => $messages,
                'stream' => false,
            ],
        ]);

        $result = json_decode($response->getBody(), true);
        return $result['message']['content'];
    }

    public function streamChat(string $model, string $prompt): Generator
    {
        $response = $this->client->post('/api/generate', [
            'json' => [
                'model' => $model,
                'prompt' => $prompt,
                'stream' => true,
            ],
            'stream' => true,
        ]);

        $body = $response->getBody();

        while (!$body->eof()) {
            $line = $body->read(1024);
            $lines = explode("\n", $line);

            foreach ($lines as $l) {
                $l = trim($l);
                if (empty($l)) {
                    continue;
                }

                $data = json_decode($l, true);
                if (isset($data['response'])) {
                    yield $data['response'];
                }
            }
        }
    }

    public function listModels(): array
    {
        $response = $this->client->get('/api/tags');
        $result = json_decode($response->getBody(), true);
        return $result['models'] ?? [];
    }
}

// 使用示例
$ollama = new OllamaClient();

// 列出可用模型
$models = $ollama->listModels();
print_r($models);

// 简单对话
$response = $ollama->chat('llama3.2:3b', '什么是PHP?');
echo $response;

// 带历史记录的对话
$messages = [
    ['role' => 'user', 'content' => '你好'],
    ['role' => 'assistant', 'content' => '你好!有什么可以帮助你的吗?'],
    ['role' => 'user', 'content' => '请介绍一下PHP'],
];
$response = $ollama->chatWithHistory('llama3.2:3b', $messages);
echo $response;

高级参数配置

完整参数示例

php
<?php
class LlamaClient
{
    // ... 前面的代码 ...

    public function chatAdvanced(
        array $messages,
        string $model = 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
        array $options = []
    ): array {
        $params = [
            'model' => $model,
            'messages' => $messages,
        ];

        $optionalParams = [
            'temperature',
            'max_tokens',
            'top_p',
            'top_k',
            'repetition_penalty',
            'stop',
            'seed',
        ];

        foreach ($optionalParams as $param) {
            if (isset($options[$param])) {
                $params[$param] = $options[$param];
            }
        }

        try {
            $response = $this->client->post('/chat/completions', [
                'json' => $params,
            ]);

            return json_decode($response->getBody(), true);
        } catch (RequestException $e) {
            $errorBody = $e->getResponse() ? $e->getResponse()->getBody()->getContents() : 'Unknown error';
            throw new Exception('Llama API Error: ' . $errorBody);
        }
    }
}

// 使用示例
$result = $client->chatAdvanced(
    [['role' => 'user', 'content' => '写一首关于程序员的诗']],
    'meta-llama/Llama-3.3-70B-Instruct-Turbo',
    [
        'temperature' => 0.8,
        'max_tokens' => 500,
        'top_p' => 0.9,
        'repetition_penalty' => 1.1,
    ]
);

参数详解

参数范围默认值说明
temperature0-20.7控制随机性
max_tokens1-模型上限无限制最大输出Token
top_p0-10.9核采样参数
top_k1-10040只考虑前K个候选词
repetition_penalty1-21.0重复惩罚
stop字符串或数组-停止序列
seed整数-随机种子

流式响应处理

php
<?php
class LlamaClient
{
    // ... 前面的代码 ...

    public function chatStream(
        array $messages,
        string $model = 'meta-llama/Llama-3.3-70B-Instruct-Turbo'
    ): Generator {
        $response = $this->client->post('/chat/completions', [
            'json' => [
                'model' => $model,
                'messages' => $messages,
                'stream' => true,
            ],
            'stream' => true,
        ]);

        $body = $response->getBody();
        $buffer = '';

        while (!$body->eof()) {
            $chunk = $body->read(1024);
            $buffer .= $chunk;

            while (($pos = strpos($buffer, "\n")) !== false) {
                $line = substr($buffer, 0, $pos);
                $buffer = substr($buffer, $pos + 1);

                $line = trim($line);
                if (empty($line) || $line === 'data: [DONE]') {
                    continue;
                }

                if (strpos($line, 'data: ') === 0) {
                    $json = substr($line, 6);
                    $data = json_decode($json, true);

                    if (isset($data['choices'][0]['delta']['content'])) {
                        yield $data['choices'][0]['delta']['content'];
                    }
                }
            }
        }
    }
}

// 使用示例
echo "Llama回复:";
foreach ($client->chatStream([['role' => 'user', 'content' => '讲一个程序员笑话']]) as $chunk) {
    echo $chunk;
    flush();
}

多轮对话实现

php
<?php
class LlamaChatSession
{
    private LlamaClient $client;
    private array $messages = [];
    private string $model;

    public function __construct(LlamaClient $client, string $model = 'meta-llama/Llama-3.3-70B-Instruct-Turbo')
    {
        $this->client = $client;
        $this->model = $model;
    }

    public function setSystemPrompt(string $prompt): void
    {
        $this->messages = [
            ['role' => 'system', 'content' => $prompt]
        ];
    }

    public function chat(string $userMessage): string
    {
        $this->messages[] = ['role' => 'user', 'content' => $userMessage];

        $response = $this->client->chat($this->messages, $this->model);
        $assistantMessage = $response['choices'][0]['message']['content'];

        $this->messages[] = ['role' => 'assistant', 'content' => $assistantMessage];

        return $assistantMessage;
    }

    public function getHistory(): array
    {
        return $this->messages;
    }

    public function clearHistory(): void
    {
        $systemMessage = $this->messages[0] ?? null;
        $this->messages = $systemMessage ? [$systemMessage] : [];
    }
}

// 使用示例
$session = new LlamaChatSession($client);
$session->setSystemPrompt('你是一个专业的PHP开发工程师,用简洁的语言回答问题。');

echo "用户:PHP有哪些优点?\n";
echo "Llama:" . $session->chat('PHP有哪些优点?') . "\n";

常见错误与踩坑点

错误1:模型名称错误

php
<?php
// ❌ 错误做法:使用不正确的模型名称
$result = $client->chat($messages, 'llama-3');

// ✅ 正确做法:使用完整的模型名称
$result = $client->chat($messages, 'meta-llama/Llama-3.3-70B-Instruct-Turbo');

错误2:本地部署资源不足

php
<?php
// ❌ 错误做法:在资源不足的机器上部署大模型
// 70B模型需要约140GB显存

// ✅ 正确做法:根据硬件选择合适的模型
// 8B模型:约16GB显存
// 70B模型:约140GB显存(或使用量化版本)

// 使用量化模型减少资源占用
ollama pull llama3.3:70b-instruct-q4_K_M

错误3:忽略上下文限制

php
<?php
// ❌ 错误做法:发送超长上下文
$longText = file_get_contents('large_file.txt');
$messages = [['role' => 'user', 'content' => $longText]];

// ✅ 正确做法:检查并限制上下文长度
function truncateText(string $text, int $maxTokens = 8000): string
{
    $approxChars = $maxTokens * 4;
    if (strlen($text) > $approxChars) {
        return substr($text, 0, $approxChars) . '...';
    }
    return $text;
}

常见应用场景

场景1:本地私有AI助手

php
<?php
class PrivateAIAssistant
{
    private OllamaClient $client;
    private string $model;

    public function __construct(OllamaClient $client, string $model = 'llama3.2:3b')
    {
        $this->client = $client;
        $this->model = $model;
    }

    public function ask(string $question, string $context = ''): string
    {
        $prompt = $context
            ? "背景信息:\n{$context}\n\n问题:{$question}"
            : $question;

        return $this->client->chat($this->model, $prompt);
    }

    public function analyzeDocument(string $document): string
    {
        $prompt = "请分析以下文档内容:\n\n{$document}";
        return $this->client->chat($this->model, $prompt);
    }
}

场景2:代码助手

php
<?php
class LlamaCodeAssistant
{
    private LlamaClient $client;

    public function generateCode(string $description, string $language = 'PHP'): string
    {
        $result = $this->client->chat([
            [
                'role' => 'system',
                'content' => '你是一个专业的程序员,请生成简洁、高效的代码。'
            ],
            [
                'role' => 'user',
                'content' => "请生成{$language}代码:{$description}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }

    public function explainCode(string $code): string
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "请解释以下代码的功能:\n```\n{$code}\n```"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }
}

场景3:文档问答系统

php
<?php
class DocumentQA
{
    private LlamaClient $client;
    private string $document;

    public function __construct(LlamaClient $client, string $document)
    {
        $this->client = $client;
        $this->document = $document;
    }

    public function ask(string $question): string
    {
        $result = $this->client->chat([
            [
                'role' => 'system',
                'content' => '你是一个文档问答助手,请根据提供的文档内容回答问题。'
            ],
            [
                'role' => 'user',
                'content' => "文档内容:\n{$this->document}\n\n问题:{$question}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }
}

场景4:内容生成

php
<?php
class ContentGenerator
{
    private LlamaClient $client;

    public function generateArticle(string $topic, int $wordCount = 800): string
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "请写一篇关于{$topic}的文章,字数约{$wordCount}字。"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }

    public function generateSummary(string $content, int $maxLength = 200): string
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "请将以下内容总结为不超过{$maxLength}字:\n\n{$content}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }
}

场景5:多语言翻译

php
<?php
class TranslationService
{
    private LlamaClient $client;

    public function translate(string $text, string $from, string $to): string
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "将以下{$from}文本翻译成{$to}:\n\n{$text}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }
}

企业级进阶应用场景

场景1:构建私有知识库问答系统

php
<?php
class PrivateKnowledgeBase
{
    private OllamaClient $client;
    private array $documents = [];
    private string $model;

    public function __construct(OllamaClient $client, string $model = 'llama3.2:3b')
    {
        $this->client = $client;
        $this->model = $model;
    }

    public function addDocument(string $id, string $content): void
    {
        $this->documents[$id] = $content;
    }

    public function query(string $question): string
    {
        $context = implode("\n\n", $this->documents);

        $prompt = <<<PROMPT
以下是一些文档内容:

{$context}

请根据以上内容回答问题:{$question}

如果文档中没有相关信息,请诚实说明。
PROMPT;

        return $this->client->chat($this->model, $prompt);
    }
}

场景2:构建代码审查系统

php
<?php
class CodeReviewSystem
{
    private LlamaClient $client;

    public function review(string $code, string $language = 'PHP'): array
    {
        $result = $this->client->chat([
            [
                'role' => 'system',
                'content' => '你是一位资深代码审查专家。请从代码质量、安全性、性能三个维度审查代码。'
            ],
            [
                'role' => 'user',
                'content' => "请审查以下{$language}代码:\n```\n{$code}\n```\n\n请以JSON格式返回审查结果。"
            ]
        ]);

        return json_decode($result['choices'][0]['message']['content'], true);
    }
}

常见问题答疑(FAQ)

Q1:如何选择Llama模型?

回答

场景推荐模型原因
本地开发测试Llama 3.2 3B资源需求低
生产环境Llama 3.3 70B性能优秀
边缘设备Llama 3.2 1B最小模型
多模态需求Llama 3.2 Vision支持图像

Q2:本地部署需要什么硬件?

回答

模型参数量显存需求(FP16)显存需求(4-bit量化)
Llama 3.2 1B1B~2GB~1GB
Llama 3.2 3B3B~6GB~2GB
Llama 3.1 8B8B~16GB~6GB
Llama 3.3 70B70B~140GB~40GB

Q3:云端API和本地部署如何选择?

回答

因素云端API本地部署
数据隐私数据传输到云端数据完全本地
成本按使用付费硬件成本+电费
延迟网络延迟本地处理快
可定制性有限完全可控

Q4:如何处理API错误?

回答

php
<?php
function handleLlamaError(Exception $e): string
{
    $message = $e->getMessage();

    if (strpos($message, '401') !== false) {
        return 'API Key无效';
    }
    if (strpos($message, '429') !== false) {
        return '请求过于频繁';
    }
    if (strpos($message, '500') !== false) {
        return '服务器错误';
    }

    return '服务暂时不可用';
}

Q5:如何优化推理速度?

回答

php
<?php
// 1. 使用量化模型
// ollama pull llama3.3:70b-q4_K_M

// 2. 减少max_tokens
$result = $client->chat($messages, $model, ['max_tokens' => 500]);

// 3. 使用更小的模型
$model = 'meta-llama/Llama-3.2-3B-Instruct';

// 4. 批量处理
function batchProcess(LlamaClient $client, array $prompts): array
{
    $results = [];
    foreach ($prompts as $key => $prompt) {
        $results[$key] = $client->chat([['role' => 'user', 'content' => $prompt]]);
    }
    return $results;
}

Q6:如何实现模型微调?

回答

bash
# 使用Hugging Face PEFT进行微调
pip install peft transformers accelerate

# 微调脚本示例
python finetune.py \
    --model_name meta-llama/Llama-3.2-3B \
    --dataset your_dataset.json \
    --output_dir ./finetuned_model

实战练习

基础练习

练习1:编写一个简单的Llama聊天程序。

参考代码

php
<?php
$apiKey = getenv('TOGETHER_API_KEY');
$client = new LlamaClient($apiKey, 'together');

echo "Llama聊天助手 (输入 'quit' 退出)\n";

while (true) {
    echo "\n你: ";
    $input = trim(fgets(STDIN));

    if ($input === 'quit') {
        break;
    }

    $result = $client->chat([['role' => 'user', 'content' => $input]]);
    echo "Llama: " . $result['choices'][0]['message']['content'] . "\n";
}

进阶练习

练习2:实现一个本地文档问答系统。

参考代码

php
<?php
class LocalDocumentQA
{
    private OllamaClient $client;
    private string $document;

    public function loadDocument(string $filePath): void
    {
        $this->document = file_get_contents($filePath);
    }

    public function ask(string $question): string
    {
        $prompt = "文档内容:\n{$this->document}\n\n问题:{$question}";
        return $this->client->chat('llama3.2:3b', $prompt);
    }
}

挑战练习

练习3:构建一个智能代码助手。

参考代码

php
<?php
class IntelligentCodeAssistant
{
    private LlamaClient $client;

    public function generate(string $description, string $language = 'PHP'): string
    {
        $result = $this->client->chat([
            [
                'role' => 'system',
                'content' => '你是一个专业的程序员,请生成简洁、高效的代码。'
            ],
            [
                'role' => 'user',
                'content' => "请生成{$language}代码:{$description}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }

    public function explain(string $code): string
    {
        $result = $this->client->chat([
            ['role' => 'user', 'content' => "请解释以下代码:\n```\n{$code}\n```"]
        ]);

        return $result['choices'][0]['message']['content'];
    }

    public function debug(string $code, string $error): string
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "以下代码出错,请帮我修复:\n代码:\n```\n{$code}\n```\n错误:{$error}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }

    public function review(string $code): array
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "请审查以下代码并给出改进建议:\n```\n{$code}\n```"
            ]
        ]);

        return ['review' => $result['choices'][0]['message']['content']];
    }
}

知识点总结

核心要点

  1. 开源优势:模型权重完全开放,可自由使用
  2. 本地部署:可在本地服务器部署,数据安全
  3. 多种部署方式:云端API、本地部署、私有云
  4. 模型选择:根据需求选择合适大小的模型
  5. 社区支持:庞大的开源社区支持

易错点回顾

易错点正确做法
模型名称错误使用完整的模型名称
资源不足根据硬件选择合适的模型
忽略上下文限制检查并限制上下文长度
不使用量化使用量化模型减少资源占用

拓展参考资料

官方文档

进阶学习路径

  1. 本知识点 → Llama API基础使用
  2. 下一步DeepSeek API
  3. 进阶错误处理与重试
  4. 高级安全与鉴权

💡 记住:Llama的开源特性使其成为私有部署和数据敏感场景的理想选择,善用本地部署可以构建完全可控的AI应用。