Meta Llama API完全指南

Llama是Meta开源的大语言模型，以其开放性和高性能成为开源AI的标杆

概述

Llama（Large Language Model Meta AI）是Meta推出的开源大语言模型系列，以其开放性、高性能和可本地部署的特点，成为开源AI社区的标杆。本教程将带你全面掌握Llama API的使用方法。

为什么选择Llama？

优势	说明
完全开源	模型权重完全开放，可自由使用
本地部署	可在本地服务器部署，数据安全
高性能	性能媲美闭源模型
社区活跃	庞大的开源社区支持

Llama模型概览

Llama模型家族：

Llama 3.3系列（最新）
├── Llama-3.3-70B-Instruct    # 70B参数，性能优秀
└── Llama-3.3-70B             # 基础版本

Llama 3.2系列（多模态）
├── Llama-3.2-90B-Vision      # 多模态大模型
├── Llama-3.2-11B-Vision      # 多模态中等模型
├── Llama-3.2-3B-Instruct     # 轻量级模型
└── Llama-3.2-1B-Instruct     # 最小模型

Llama 3.1系列（经典）
├── Llama-3.1-405B-Instruct   # 最大开源模型
├── Llama-3.1-70B-Instruct    # 平衡选择
└── Llama-3.1-8B-Instruct     # 轻量级模型

基本概念

获取模型访问权限

php

<?php
// 1. 在Meta官网申请访问权限
// https://llama.meta.com/llama-downloads/

// 2. 在Hugging Face获取访问权限
// https://huggingface.co/meta-llama

// 3. 使用第三方API服务（如Together AI、Groq等）
$apiKey = getenv('LLAMA_API_KEY');
$baseUrl = 'https://api.together.xyz/v1'; // Together AI

部署方式选择

Llama部署方式：

1. 云端API服务
   ├── Together AI
   ├── Groq
   ├── Fireworks AI
   └── Replicate

2. 本地部署
   ├── Ollama
   ├── vLLM
   ├── llama.cpp
   └── Hugging Face Transformers

3. 云端私有部署
   ├── AWS Bedrock
   ├── Azure AI
   └── Google Cloud

环境准备

使用Together AI API

php

<?php
require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;

class LlamaClient
{
    private $client;
    private $apiKey;
    private $baseUrl;

    public function __construct(string $apiKey, string $provider = 'together')
    {
        $this->apiKey = $apiKey;

        $providers = [
            'together' => 'https://api.together.xyz/v1',
            'groq' => 'https://api.groq.com/openai/v1',
            'fireworks' => 'https://api.fireworks.ai/inference/v1',
        ];

        $this->baseUrl = $providers[$provider] ?? $providers['together'];

        $this->client = new Client([
            'base_uri' => $this->baseUrl,
            'timeout' => 120,
            'headers' => [
                'Authorization' => 'Bearer ' . $this->apiKey,
                'Content-Type' => 'application/json',
            ],
        ]);
    }

    public function chat(
        array $messages,
        string $model = 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
        array $options = []
    ): array {
        $params = [
            'model' => $model,
            'messages' => $messages,
        ];

        if (isset($options['temperature'])) {
            $params['temperature'] = $options['temperature'];
        }

        if (isset($options['max_tokens'])) {
            $params['max_tokens'] = $options['max_tokens'];
        }

        if (isset($options['top_p'])) {
            $params['top_p'] = $options['top_p'];
        }

        try {
            $response = $this->client->post('/chat/completions', [
                'json' => $params,
            ]);

            return json_decode($response->getBody(), true);
        } catch (RequestException $e) {
            $errorBody = $e->getResponse() ? $e->getResponse()->getBody()->getContents() : 'Unknown error';
            throw new Exception('Llama API Error: ' . $errorBody);
        }
    }
}

// 使用示例
$apiKey = getenv('TOGETHER_API_KEY');
$client = new LlamaClient($apiKey, 'together');

$result = $client->chat([
    ['role' => 'user', 'content' => '请用一句话介绍PHP语言']
]);

echo $result['choices'][0]['message']['content'];

运行结果：

PHP是一种开源的服务器端脚本语言，特别适合Web开发，可以嵌入HTML中执行。

本地部署Llama

使用Ollama部署

bash

# 安装Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 下载Llama模型
ollama pull llama3.3:70b
ollama pull llama3.2:3b

# 启动服务
ollama serve

PHP调用本地Ollama

php

<?php
class OllamaClient
{
    private $client;
    private $baseUrl = 'http://localhost:11434';

    public function __construct(string $baseUrl = 'http://localhost:11434')
    {
        $this->baseUrl = $baseUrl;
        $this->client = new Client([
            'base_uri' => $this->baseUrl,
            'timeout' => 120,
        ]);
    }

    public function chat(string $model, string $prompt): string
    {
        $response = $this->client->post('/api/generate', [
            'json' => [
                'model' => $model,
                'prompt' => $prompt,
                'stream' => false,
            ],
        ]);

        $result = json_decode($response->getBody(), true);
        return $result['response'];
    }

    public function chatWithHistory(string $model, array $messages): string
    {
        $response = $this->client->post('/api/chat', [
            'json' => [
                'model' => $model,
                'messages' => $messages,
                'stream' => false,
            ],
        ]);

        $result = json_decode($response->getBody(), true);
        return $result['message']['content'];
    }

    public function streamChat(string $model, string $prompt): Generator
    {
        $response = $this->client->post('/api/generate', [
            'json' => [
                'model' => $model,
                'prompt' => $prompt,
                'stream' => true,
            ],
            'stream' => true,
        ]);

        $body = $response->getBody();

        while (!$body->eof()) {
            $line = $body->read(1024);
            $lines = explode("\n", $line);

            foreach ($lines as $l) {
                $l = trim($l);
                if (empty($l)) {
                    continue;
                }

                $data = json_decode($l, true);
                if (isset($data['response'])) {
                    yield $data['response'];
                }
            }
        }
    }

    public function listModels(): array
    {
        $response = $this->client->get('/api/tags');
        $result = json_decode($response->getBody(), true);
        return $result['models'] ?? [];
    }
}

// 使用示例
$ollama = new OllamaClient();

// 列出可用模型
$models = $ollama->listModels();
print_r($models);

// 简单对话
$response = $ollama->chat('llama3.2:3b', '什么是PHP？');
echo $response;

// 带历史记录的对话
$messages = [
    ['role' => 'user', 'content' => '你好'],
    ['role' => 'assistant', 'content' => '你好！有什么可以帮助你的吗？'],
    ['role' => 'user', 'content' => '请介绍一下PHP'],
];
$response = $ollama->chatWithHistory('llama3.2:3b', $messages);
echo $response;

高级参数配置

完整参数示例

php

<?php
class LlamaClient
{
    // ... 前面的代码 ...

    public function chatAdvanced(
        array $messages,
        string $model = 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
        array $options = []
    ): array {
        $params = [
            'model' => $model,
            'messages' => $messages,
        ];

        $optionalParams = [
            'temperature',
            'max_tokens',
            'top_p',
            'top_k',
            'repetition_penalty',
            'stop',
            'seed',
        ];

        foreach ($optionalParams as $param) {
            if (isset($options[$param])) {
                $params[$param] = $options[$param];
            }
        }

        try {
            $response = $this->client->post('/chat/completions', [
                'json' => $params,
            ]);

            return json_decode($response->getBody(), true);
        } catch (RequestException $e) {
            $errorBody = $e->getResponse() ? $e->getResponse()->getBody()->getContents() : 'Unknown error';
            throw new Exception('Llama API Error: ' . $errorBody);
        }
    }
}

// 使用示例
$result = $client->chatAdvanced(
    [['role' => 'user', 'content' => '写一首关于程序员的诗']],
    'meta-llama/Llama-3.3-70B-Instruct-Turbo',
    [
        'temperature' => 0.8,
        'max_tokens' => 500,
        'top_p' => 0.9,
        'repetition_penalty' => 1.1,
    ]
);

参数详解

参数	范围	默认值	说明
temperature	0-2	0.7	控制随机性
max_tokens	1-模型上限	无限制	最大输出Token
top_p	0-1	0.9	核采样参数
top_k	1-100	40	只考虑前K个候选词
repetition_penalty	1-2	1.0	重复惩罚
stop	字符串或数组	-	停止序列
seed	整数	-	随机种子

流式响应处理

php

<?php
class LlamaClient
{
    // ... 前面的代码 ...

    public function chatStream(
        array $messages,
        string $model = 'meta-llama/Llama-3.3-70B-Instruct-Turbo'
    ): Generator {
        $response = $this->client->post('/chat/completions', [
            'json' => [
                'model' => $model,
                'messages' => $messages,
                'stream' => true,
            ],
            'stream' => true,
        ]);

        $body = $response->getBody();
        $buffer = '';

        while (!$body->eof()) {
            $chunk = $body->read(1024);
            $buffer .= $chunk;

            while (($pos = strpos($buffer, "\n")) !== false) {
                $line = substr($buffer, 0, $pos);
                $buffer = substr($buffer, $pos + 1);

                $line = trim($line);
                if (empty($line) || $line === 'data: [DONE]') {
                    continue;
                }

                if (strpos($line, 'data: ') === 0) {
                    $json = substr($line, 6);
                    $data = json_decode($json, true);

                    if (isset($data['choices'][0]['delta']['content'])) {
                        yield $data['choices'][0]['delta']['content'];
                    }
                }
            }
        }
    }
}

// 使用示例
echo "Llama回复：";
foreach ($client->chatStream([['role' => 'user', 'content' => '讲一个程序员笑话']]) as $chunk) {
    echo $chunk;
    flush();
}

多轮对话实现

php

<?php
class LlamaChatSession
{
    private LlamaClient $client;
    private array $messages = [];
    private string $model;

    public function __construct(LlamaClient $client, string $model = 'meta-llama/Llama-3.3-70B-Instruct-Turbo')
    {
        $this->client = $client;
        $this->model = $model;
    }

    public function setSystemPrompt(string $prompt): void
    {
        $this->messages = [
            ['role' => 'system', 'content' => $prompt]
        ];
    }

    public function chat(string $userMessage): string
    {
        $this->messages[] = ['role' => 'user', 'content' => $userMessage];

        $response = $this->client->chat($this->messages, $this->model);
        $assistantMessage = $response['choices'][0]['message']['content'];

        $this->messages[] = ['role' => 'assistant', 'content' => $assistantMessage];

        return $assistantMessage;
    }

    public function getHistory(): array
    {
        return $this->messages;
    }

    public function clearHistory(): void
    {
        $systemMessage = $this->messages[0] ?? null;
        $this->messages = $systemMessage ? [$systemMessage] : [];
    }
}

// 使用示例
$session = new LlamaChatSession($client);
$session->setSystemPrompt('你是一个专业的PHP开发工程师，用简洁的语言回答问题。');

echo "用户：PHP有哪些优点？\n";
echo "Llama：" . $session->chat('PHP有哪些优点？') . "\n";

常见错误与踩坑点

错误1：模型名称错误

php

<?php
// ❌ 错误做法：使用不正确的模型名称
$result = $client->chat($messages, 'llama-3');

// ✅ 正确做法：使用完整的模型名称
$result = $client->chat($messages, 'meta-llama/Llama-3.3-70B-Instruct-Turbo');

错误2：本地部署资源不足

php

<?php
// ❌ 错误做法：在资源不足的机器上部署大模型
// 70B模型需要约140GB显存

// ✅ 正确做法：根据硬件选择合适的模型
// 8B模型：约16GB显存
// 70B模型：约140GB显存（或使用量化版本）

// 使用量化模型减少资源占用
ollama pull llama3.3:70b-instruct-q4_K_M

错误3：忽略上下文限制

php

<?php
// ❌ 错误做法：发送超长上下文
$longText = file_get_contents('large_file.txt');
$messages = [['role' => 'user', 'content' => $longText]];

// ✅ 正确做法：检查并限制上下文长度
function truncateText(string $text, int $maxTokens = 8000): string
{
    $approxChars = $maxTokens * 4;
    if (strlen($text) > $approxChars) {
        return substr($text, 0, $approxChars) . '...';
    }
    return $text;
}

常见应用场景

场景1：本地私有AI助手

php

<?php
class PrivateAIAssistant
{
    private OllamaClient $client;
    private string $model;

    public function __construct(OllamaClient $client, string $model = 'llama3.2:3b')
    {
        $this->client = $client;
        $this->model = $model;
    }

    public function ask(string $question, string $context = ''): string
    {
        $prompt = $context
            ? "背景信息：\n{$context}\n\n问题：{$question}"
            : $question;

        return $this->client->chat($this->model, $prompt);
    }

    public function analyzeDocument(string $document): string
    {
        $prompt = "请分析以下文档内容：\n\n{$document}";
        return $this->client->chat($this->model, $prompt);
    }
}

场景2：代码助手

php

<?php
class LlamaCodeAssistant
{
    private LlamaClient $client;

    public function generateCode(string $description, string $language = 'PHP'): string
    {
        $result = $this->client->chat([
            [
                'role' => 'system',
                'content' => '你是一个专业的程序员，请生成简洁、高效的代码。'
            ],
            [
                'role' => 'user',
                'content' => "请生成{$language}代码：{$description}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }

    public function explainCode(string $code): string
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "请解释以下代码的功能：\n```\n{$code}\n```"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }
}

场景3：文档问答系统

php

<?php
class DocumentQA
{
    private LlamaClient $client;
    private string $document;

    public function __construct(LlamaClient $client, string $document)
    {
        $this->client = $client;
        $this->document = $document;
    }

    public function ask(string $question): string
    {
        $result = $this->client->chat([
            [
                'role' => 'system',
                'content' => '你是一个文档问答助手，请根据提供的文档内容回答问题。'
            ],
            [
                'role' => 'user',
                'content' => "文档内容：\n{$this->document}\n\n问题：{$question}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }
}

场景4：内容生成

php

<?php
class ContentGenerator
{
    private LlamaClient $client;

    public function generateArticle(string $topic, int $wordCount = 800): string
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "请写一篇关于{$topic}的文章，字数约{$wordCount}字。"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }

    public function generateSummary(string $content, int $maxLength = 200): string
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "请将以下内容总结为不超过{$maxLength}字：\n\n{$content}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }
}

场景5：多语言翻译

php

<?php
class TranslationService
{
    private LlamaClient $client;

    public function translate(string $text, string $from, string $to): string
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "将以下{$from}文本翻译成{$to}：\n\n{$text}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }
}

企业级进阶应用场景

场景1：构建私有知识库问答系统

php

<?php
class PrivateKnowledgeBase
{
    private OllamaClient $client;
    private array $documents = [];
    private string $model;

    public function __construct(OllamaClient $client, string $model = 'llama3.2:3b')
    {
        $this->client = $client;
        $this->model = $model;
    }

    public function addDocument(string $id, string $content): void
    {
        $this->documents[$id] = $content;
    }

    public function query(string $question): string
    {
        $context = implode("\n\n", $this->documents);

        $prompt = <<<PROMPT
以下是一些文档内容：

{$context}

请根据以上内容回答问题：{$question}

如果文档中没有相关信息，请诚实说明。
PROMPT;

        return $this->client->chat($this->model, $prompt);
    }
}

场景2：构建代码审查系统

php

<?php
class CodeReviewSystem
{
    private LlamaClient $client;

    public function review(string $code, string $language = 'PHP'): array
    {
        $result = $this->client->chat([
            [
                'role' => 'system',
                'content' => '你是一位资深代码审查专家。请从代码质量、安全性、性能三个维度审查代码。'
            ],
            [
                'role' => 'user',
                'content' => "请审查以下{$language}代码：\n```\n{$code}\n```\n\n请以JSON格式返回审查结果。"
            ]
        ]);

        return json_decode($result['choices'][0]['message']['content'], true);
    }
}

常见问题答疑（FAQ）

Q1：如何选择Llama模型？

回答：

场景	推荐模型	原因
本地开发测试	Llama 3.2 3B	资源需求低
生产环境	Llama 3.3 70B	性能优秀
边缘设备	Llama 3.2 1B	最小模型
多模态需求	Llama 3.2 Vision	支持图像

Q2：本地部署需要什么硬件？

回答：

模型	参数量	显存需求（FP16）	显存需求（4-bit量化）
Llama 3.2 1B	1B	~2GB	~1GB
Llama 3.2 3B	3B	~6GB	~2GB
Llama 3.1 8B	8B	~16GB	~6GB
Llama 3.3 70B	70B	~140GB	~40GB

Q3：云端API和本地部署如何选择？

回答：

因素	云端API	本地部署
数据隐私	数据传输到云端	数据完全本地
成本	按使用付费	硬件成本+电费
延迟	网络延迟	本地处理快
可定制性	有限	完全可控

Q4：如何处理API错误？

回答：

php

<?php
function handleLlamaError(Exception $e): string
{
    $message = $e->getMessage();

    if (strpos($message, '401') !== false) {
        return 'API Key无效';
    }
    if (strpos($message, '429') !== false) {
        return '请求过于频繁';
    }
    if (strpos($message, '500') !== false) {
        return '服务器错误';
    }

    return '服务暂时不可用';
}

Q5：如何优化推理速度？

回答：

php

<?php
// 1. 使用量化模型
// ollama pull llama3.3:70b-q4_K_M

// 2. 减少max_tokens
$result = $client->chat($messages, $model, ['max_tokens' => 500]);

// 3. 使用更小的模型
$model = 'meta-llama/Llama-3.2-3B-Instruct';

// 4. 批量处理
function batchProcess(LlamaClient $client, array $prompts): array
{
    $results = [];
    foreach ($prompts as $key => $prompt) {
        $results[$key] = $client->chat([['role' => 'user', 'content' => $prompt]]);
    }
    return $results;
}

Q6：如何实现模型微调？

回答：

bash

# 使用Hugging Face PEFT进行微调
pip install peft transformers accelerate

# 微调脚本示例
python finetune.py \
    --model_name meta-llama/Llama-3.2-3B \
    --dataset your_dataset.json \
    --output_dir ./finetuned_model

实战练习

基础练习

练习1：编写一个简单的Llama聊天程序。

参考代码：

php

<?php
$apiKey = getenv('TOGETHER_API_KEY');
$client = new LlamaClient($apiKey, 'together');

echo "Llama聊天助手 (输入 'quit' 退出)\n";

while (true) {
    echo "\n你: ";
    $input = trim(fgets(STDIN));

    if ($input === 'quit') {
        break;
    }

    $result = $client->chat([['role' => 'user', 'content' => $input]]);
    echo "Llama: " . $result['choices'][0]['message']['content'] . "\n";
}

进阶练习

练习2：实现一个本地文档问答系统。

参考代码：

php

<?php
class LocalDocumentQA
{
    private OllamaClient $client;
    private string $document;

    public function loadDocument(string $filePath): void
    {
        $this->document = file_get_contents($filePath);
    }

    public function ask(string $question): string
    {
        $prompt = "文档内容：\n{$this->document}\n\n问题：{$question}";
        return $this->client->chat('llama3.2:3b', $prompt);
    }
}

挑战练习

练习3：构建一个智能代码助手。

参考代码：

php

<?php
class IntelligentCodeAssistant
{
    private LlamaClient $client;

    public function generate(string $description, string $language = 'PHP'): string
    {
        $result = $this->client->chat([
            [
                'role' => 'system',
                'content' => '你是一个专业的程序员，请生成简洁、高效的代码。'
            ],
            [
                'role' => 'user',
                'content' => "请生成{$language}代码：{$description}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }

    public function explain(string $code): string
    {
        $result = $this->client->chat([
            ['role' => 'user', 'content' => "请解释以下代码：\n```\n{$code}\n```"]
        ]);

        return $result['choices'][0]['message']['content'];
    }

    public function debug(string $code, string $error): string
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "以下代码出错，请帮我修复：\n代码：\n```\n{$code}\n```\n错误：{$error}"
            ]
        ]);

        return $result['choices'][0]['message']['content'];
    }

    public function review(string $code): array
    {
        $result = $this->client->chat([
            [
                'role' => 'user',
                'content' => "请审查以下代码并给出改进建议：\n```\n{$code}\n```"
            ]
        ]);

        return ['review' => $result['choices'][0]['message']['content']];
    }
}

知识点总结

核心要点

开源优势：模型权重完全开放，可自由使用
本地部署：可在本地服务器部署，数据安全
多种部署方式：云端API、本地部署、私有云
模型选择：根据需求选择合适大小的模型
社区支持：庞大的开源社区支持

易错点回顾

易错点	正确做法
模型名称错误	使用完整的模型名称
资源不足	根据硬件选择合适的模型
忽略上下文限制	检查并限制上下文长度
不使用量化	使用量化模型减少资源占用

拓展参考资料

官方文档

进阶学习路径

本知识点 → Llama API基础使用
下一步 → DeepSeek API
进阶 → 错误处理与重试
高级 → 安全与鉴权

💡 记住：Llama的开源特性使其成为私有部署和数据敏感场景的理想选择，善用本地部署可以构建完全可控的AI应用。

Meta Llama API完全指南 ​

概述 ​

为什么选择Llama？ ​

Llama模型概览 ​

基本概念 ​

获取模型访问权限 ​

部署方式选择 ​

环境准备 ​

使用Together AI API ​

本地部署Llama ​

使用Ollama部署 ​

PHP调用本地Ollama ​

高级参数配置 ​

完整参数示例 ​

参数详解 ​

流式响应处理 ​

多轮对话实现 ​

常见错误与踩坑点 ​

错误1：模型名称错误 ​

错误2：本地部署资源不足 ​

错误3：忽略上下文限制 ​

常见应用场景 ​

场景1：本地私有AI助手 ​

场景2：代码助手 ​

场景3：文档问答系统 ​

场景4：内容生成 ​

场景5：多语言翻译 ​

企业级进阶应用场景 ​

场景1：构建私有知识库问答系统 ​

场景2：构建代码审查系统 ​

常见问题答疑（FAQ） ​

Q1：如何选择Llama模型？ ​

Q2：本地部署需要什么硬件？ ​

Q3：云端API和本地部署如何选择？ ​

Q4：如何处理API错误？ ​

Q5：如何优化推理速度？ ​

Q6：如何实现模型微调？ ​

实战练习 ​

基础练习 ​

进阶练习 ​

挑战练习 ​

知识点总结 ​

核心要点 ​

易错点回顾 ​

拓展参考资料 ​

官方文档 ​

进阶学习路径 ​

Meta Llama API完全指南

概述

为什么选择Llama？

Llama模型概览

基本概念

获取模型访问权限

部署方式选择

环境准备

使用Together AI API

本地部署Llama

使用Ollama部署

PHP调用本地Ollama

高级参数配置

完整参数示例

参数详解

流式响应处理

多轮对话实现

常见错误与踩坑点

错误1：模型名称错误

错误2：本地部署资源不足

错误3：忽略上下文限制

常见应用场景

场景1：本地私有AI助手

场景2：代码助手

场景3：文档问答系统

场景4：内容生成

场景5：多语言翻译

企业级进阶应用场景

场景1：构建私有知识库问答系统

场景2：构建代码审查系统

常见问题答疑（FAQ）

Q1：如何选择Llama模型？

Q2：本地部署需要什么硬件？

Q3：云端API和本地部署如何选择？

Q4：如何处理API错误？

Q5：如何优化推理速度？

Q6：如何实现模型微调？

实战练习

基础练习

进阶练习

挑战练习

知识点总结

核心要点

易错点回顾

拓展参考资料

官方文档

进阶学习路径