生成与 Prompt 工程 (Generation)

检索到了高质量的文档片段后，最后一步是让 LLM 根据这些信息回答用户问题。

1. 构建 Prompt 模板

Golang 的 text/template 标准库非常适合构建动态 Prompt。

package main

import (
	"bytes"
	"fmt"
	"text/template"
)

const ragTemplate = `
你是一个智能助手。请根据以下背景信息回答用户的问题。
如果你不知道答案，请直接说不知道，不要编造。

【背景信息】：
{{ range .Contexts }}
- {{ . }}
{{ end }}

【用户问题】：{{ .Query }}

【回答】：
`

func BuildPrompt(query string, contexts []string) string {
	tmpl, _ := template.New("rag").Parse(ragTemplate)
	
	var buf bytes.Buffer
	data := map[string]interface{}{
		"Query":    query,
		"Contexts": contexts,
	}
	
	tmpl.Execute(&buf, data)
	return buf.String()
}

func main() {
	ctx := []string{
		"Golang 发布于 2009 年。",
		"Golang 的作者包括 Rob Pike。",
	}
	prompt := BuildPrompt("Go 什么时候发布的？", ctx)
	fmt.Println(prompt)
}

2. 调用 LLM 生成

使用 langchaingo 或 go-openai 发送 Prompt。

2.1 流式输出 (Streaming)

RAG 系统通常会有几秒的延迟（检索+生成），为了提升用户体验，流式输出是必须的。

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/tmc/langchaingo/llms"
	"github.com/tmc/langchaingo/llms/openai"
)

func ChatStream(prompt string) {
	ctx := context.Background()
	llm, _ := openai.New()

	// 注册回调函数处理流式数据
	_, err := llm.Call(ctx, prompt,
		llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
			fmt.Print(string(chunk)) // 实时打印
			return nil
		}),
	)
	if err != nil {
		panic(err)
	}
}

3. 历史对话管理 (Chat History)

多轮对话 RAG 比单轮更复杂。我们需要将“当前问题”结合“历史记录”改写为一个独立问题 (Standalone Query)，然后再去检索。

流程：

用户输入: "它有什么特性？" (历史记录: "User: 介绍下 Golang")
LLM 改写: "Golang 有什么特性？"
检索: 使用改写后的问题去向量库检索。
生成: 回答用户。

Golang 实现改写逻辑：

const rewriteTemplate = `
给定以下对话历史和后续问题，请将后续问题改写为一个独立的搜索查询。

历史对话:
{{ .History }}

后续问题: {{ .Question }}

独立查询:
`
// 调用 LLM 执行一次改写，拿到结果后再走常规 RAG 流程

总结

生成阶段的关键在于：

Prompt 质量：清晰的指令，限制幻觉。
流式响应：降低感官延迟。
多轮对话：引入 Query Rewrite（查询改写）机制。

生成与 Prompt 工程 (Generation) ​

1. 构建 Prompt 模板 ​

2. 调用 LLM 生成 ​

2.1 流式输出 (Streaming) ​

3. 历史对话管理 (Chat History) ​

总结 ​

🚀 学习遇到瓶颈？想进大厂？

生成与 Prompt 工程 (Generation)

1. 构建 Prompt 模板

2. 调用 LLM 生成

2.1 流式输出 (Streaming)

3. 历史对话管理 (Chat History)

总结