{ "cells": [ { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 2008, "status": "ok", "timestamp": 1710126277679, "user": { "displayName": "Yoshihisa Nitta", "userId": "15888006800030996813" }, "user_tz": -540 }, "id": "UnSA6Jvg4Nhk", "outputId": "d3ae32e6-c9ab-4abd-adde-2b2a6fe81edf" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" ] } ], "source": [ "# by nitta\n", "import os\n", "is_colab = 'google.colab' in str(get_ipython()) # for Google Colab\n", "\n", "if is_colab:\n", " from google.colab import drive\n", " drive.mount('/content/drive')\n", " SAVE_PATH='/content/drive/MyDrive/DeepLearning/llm_book2'\n", "else:\n", " SAVE_PATH='.'\n", "\n", "# # variable definition in 'account.py' to access HuggingFace Hub\n", "# MyAccount = {\n", "# 'HuggingFace': {\n", "# 'user': 'YourUserName',\n", "# 'token': 'YourTokenWithWritePermission'\n", "# },\n", "# 'OpenAI': {\n", "# 'api-key': 'YourOpenAI_API_Key'\n", "# }\n", "# }\n", "ACCOUNT_FILE = os.path.join(SAVE_PATH, 'account.py')\n", "%run {ACCOUNT_FILE} # set 'MyAccount' variable" ] }, { "cell_type": "markdown", "metadata": { "id": "FN_ayuoyQ8wp" }, "source": [ "# LangChain\n", "\n", "
\n", "export LANGCHAIN_TRACING_V2=\"true\"\n", "export LANGCHAIN_API_KEY=\"...\"\n", "" ] }, { "cell_type": "markdown", "metadata": { "id": "H_QV_U0QljpQ" }, "source": [ "# LangChain を用いた構築\n", "\n", "LangChain を使うと、LLMを外部のデータソースや計算に接続するアプリケーションを構築できる。\n", "このクィックスタートでは、そのためのいくつかの方法を示す。\n", "返答のためのプロンプト・テンプレートのみに依存する単純な LLM chain から始める。\n", "次に、別のデータベースからデータを取得し、それをプロンプト・テンプレートに渡す chain を構築する。\n", "それから、チャット履歴を追加して、会話検索 chain を構築する。\n", "これにより 前の質問を記憶されているので、LLM とチャット形式で対話できる。\n", "最後に、質問に答えるためにデータを取得する必要があるかどうかを決定するために LLM を使う agent を構築する。\n", "大まかに説明するが、それぞれには詳細がある。\n", "関連するリンクを貼っておくので必要ならば参照してほしい。" ] }, { "cell_type": "markdown", "metadata": { "id": "-KmFir53n604" }, "source": [ "# LLM Chain: OpenAI" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 13607, "status": "ok", "timestamp": 1710126304539, "user": { "displayName": "Yoshihisa Nitta", "userId": "15888006800030996813" }, "user_tz": -540 }, "id": "JOpWamKDoBn0", "outputId": "f56e75bc-ccb7-4c37-e167-c6370cd69a32" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting langchain-openai\n", " Downloading langchain_openai-0.0.8-py3-none-any.whl (32 kB)\n", "Requirement already satisfied: langchain-core<0.2.0,>=0.1.27 in /usr/local/lib/python3.10/dist-packages (from langchain-openai) (0.1.30)\n", "Collecting openai<2.0.0,>=1.10.0 (from langchain-openai)\n", " Downloading openai-1.13.3-py3-none-any.whl (227 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m227.4/227.4 kB\u001b[0m \u001b[31m1.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting tiktoken<1,>=0.5.2 (from langchain-openai)\n", " Downloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.8/1.8 MB\u001b[0m \u001b[31m2.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.2.0,>=0.1.27->langchain-openai) (6.0.1)\n", "Requirement already satisfied: anyio<5,>=3 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.2.0,>=0.1.27->langchain-openai) (3.7.1)\n", "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.2.0,>=0.1.27->langchain-openai) (1.33)\n", "Requirement already satisfied: langsmith<0.2.0,>=0.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.2.0,>=0.1.27->langchain-openai) (0.1.23)\n", "Requirement already satisfied: packaging<24.0,>=23.2 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.2.0,>=0.1.27->langchain-openai) (23.2)\n", "Requirement already satisfied: pydantic<3,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.2.0,>=0.1.27->langchain-openai) (2.6.3)\n", "Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.2.0,>=0.1.27->langchain-openai) (2.31.0)\n", "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.2.0,>=0.1.27->langchain-openai) (8.2.3)\n", "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai<2.0.0,>=1.10.0->langchain-openai) (1.7.0)\n", "Collecting httpx<1,>=0.23.0 (from openai<2.0.0,>=1.10.0->langchain-openai)\n", " Downloading httpx-0.27.0-py3-none-any.whl (75 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m2.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain-openai) (1.3.1)\n", "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain-openai) (4.66.2)\n", "Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain-openai) (4.10.0)\n", "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken<1,>=0.5.2->langchain-openai) (2023.12.25)\n", "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3->langchain-core<0.2.0,>=0.1.27->langchain-openai) (3.6)\n", "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3->langchain-core<0.2.0,>=0.1.27->langchain-openai) (1.2.0)\n", "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai<2.0.0,>=1.10.0->langchain-openai) (2024.2.2)\n", "Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai<2.0.0,>=1.10.0->langchain-openai)\n", " Downloading httpcore-1.0.4-py3-none-any.whl (77 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.8/77.8 kB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai<2.0.0,>=1.10.0->langchain-openai)\n", " Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m2.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.10/dist-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.2.0,>=0.1.27->langchain-openai) (2.4)\n", "Requirement already satisfied: orjson<4.0.0,>=3.9.14 in /usr/local/lib/python3.10/dist-packages (from langsmith<0.2.0,>=0.1.0->langchain-core<0.2.0,>=0.1.27->langchain-openai) (3.9.15)\n", "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain-core<0.2.0,>=0.1.27->langchain-openai) (0.6.0)\n", "Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain-core<0.2.0,>=0.1.27->langchain-openai) (2.16.3)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain-core<0.2.0,>=0.1.27->langchain-openai) (3.3.2)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain-core<0.2.0,>=0.1.27->langchain-openai) (2.0.7)\n", "Installing collected packages: h11, tiktoken, httpcore, httpx, openai, langchain-openai\n", "Successfully installed h11-0.14.0 httpcore-1.0.4 httpx-0.27.0 langchain-openai-0.0.8 openai-1.13.3 tiktoken-0.6.0\n" ] } ], "source": [ "#\n", "! pip install langchain-openai" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "executionInfo": { "elapsed": 13, "status": "ok", "timestamp": 1710126304539, "user": { "displayName": "Yoshihisa Nitta", "userId": "15888006800030996813" }, "user_tz": -540 }, "id": "831ZJQA24JCv" }, "outputs": [], "source": [ "import os\n", "os.environ[\"OPENAI_API_KEY\"] = MyAccount[\"OpenAI\"][\"api-key\"]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "executionInfo": { "elapsed": 1751, "status": "ok", "timestamp": 1710126306277, "user": { "displayName": "Yoshihisa Nitta", "userId": "15888006800030996813" }, "user_tz": -540 }, "id": "bzkQ8A4p8dnT" }, "outputs": [], "source": [ "from langchain_openai import ChatOpenAI\n", "\n", "llm = ChatOpenAI()\n", "# llm = ChatOpenAI(openai_api_key=MyAccount[\"OpenAI\"][\"api-key\"])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 1552, "status": "ok", "timestamp": 1710126307827, "user": { "displayName": "Yoshihisa Nitta", "userId": "15888006800030996813" }, "user_tz": -540 }, "id": "0H_zesRh8zA-", "outputId": "b8efa767-150b-49f8-a6aa-ab728135e0c0" }, "outputs": [ { "data": { "text/plain": [ "AIMessage(content='Langsmith can help with testing by providing automated testing capabilities, such as unit testing and integration testing. It can also help by generating test cases based on grammar and syntax rules, and by providing tools for analyzing test coverage and identifying potential issues in the code. Additionally, Langsmith can be used to simulate various scenarios and test edge cases, helping to ensure the reliability and quality of the software being developed.')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "llm.invoke(\"how can langsmith help with testing?\")" ] }, { "cell_type": "markdown", "metadata": { "id": "K8k6FTXurNKU" }, "source": [ "
create_retrieval_chain
関数を使うことができるが、\n",
"2種類の変更点がある。\n",
"\n",
"input
に、\n",
"会話履歴を chat_history
に受け取り、LLM を使用して\n",
"質問クエリを生成する。"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1710126345145,
"user": {
"displayName": "Yoshihisa Nitta",
"userId": "15888006800030996813"
},
"user_tz": -540
},
"id": "qilJRVJXjH_G"
},
"outputs": [],
"source": [
"from langchain.chains import create_history_aware_retriever\n",
"from langchain_core.prompts import MessagesPlaceholder\n",
"\n",
"# 最初に、LLM に渡して検索クエリを生成させるための、プロンプトが必要となる。\n",
"\n",
"prompt = ChatPromptTemplate.from_messages([\n",
" MessagesPlaceholder(variable_name=\"chat_history\"),\n",
" (\"user\", \"{input}\"),\n",
" (\"user\", \"Given the above conversation, generate a search query to look up in order to get information relevant to the conversation\")\n",
"])\n",
"\n",
"retriever_chain = create_history_aware_retriever(llm, retriever, prompt)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 964,
"status": "ok",
"timestamp": 1710126346108,
"user": {
"displayName": "Yoshihisa Nitta",
"userId": "15888006800030996813"
},
"user_tz": -540
},
"id": "QLSAElgipNfc",
"outputId": "325e46ba-0033-421f-c8c8-590fad3388f8"
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Skip to main content\\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith DocsLangChain Python DocsLangChain JS/TS DocsLangSmith API DocsSearchGo to AppLangSmithUser GuideSetupPricing (Coming Soon)Self-HostingTracingEvaluationMonitoringPrompt HubProxyUser GuideOn this pageLangSmith User GuideLangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.Prototyping‚ÄãPrototyping LLM applications often involves quick experimentation between prompts, model types, retrieval strategy and other parameters.\\nThe ability to rapidly understand how the model is performing ‚Äî and debug where it is failing ‚Äî is incredibly important for this phase.Debugging‚ÄãWhen developing new LLM applications, we suggest having LangSmith tracing enabled by default.\\nOftentimes, it isn‚Äôt necessary to look at every single trace. However, when things go wrong (an unexpected end result, infinite agent loop, slower than expected execution, higher than expected token usage), it‚Äôs extremely helpful to debug by looking through the application traces. LangSmith gives clear visibility and debugging information at each step of an LLM sequence, making it much easier to identify and root-cause issues.\\nWe provide native rendering of chat messages, functions, and retrieve documents.Initial Test Set‚ÄãWhile many developers still ship an initial version of their application based on ‚Äúvibe checks‚Äù, we‚Äôve seen an increasing number of engineering teams start to adopt a more test driven approach. LangSmith allows developers to create datasets, which are collections of inputs and reference outputs, and use these to run tests on their LLM applications.\\nThese test cases can be uploaded in bulk, created on the fly, or exported from application traces. LangSmith also makes it easy to run custom evaluations (both LLM and heuristic based) to score test results.Comparison View‚ÄãWhen prototyping different versions of your applications and making changes, it‚Äôs important to see whether or not you‚Äôve regressed with respect to your initial test cases.\\nOftentimes, changes in the prompt, retrieval strategy, or model choice can have huge implications in responses produced by your application.\\nIn order to get a sense for which variant is performing better, it‚Äôs useful to be able to view results for different configurations on the same datapoints side-by-side. We‚Äôve invested heavily in a user-friendly comparison view for test runs to track and diagnose regressions in test scores across multiple revisions of your application.Playground‚ÄãLangSmith provides a playground environment for rapid iteration and experimentation.\\nThis allows you to quickly test out different prompts and models. You can open the playground from any prompt or model run in your trace.', metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}),\n",
" Document(page_content='LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}),\n",
" Document(page_content=\"This allows you to quickly test out different prompts and models. You can open the playground from any prompt or model run in your trace.\\nEvery playground run is logged in the system and can be used to create test cases or compare with other runs.Beta Testing‚ÄãBeta testing allows developers to collect more data on how their LLM applications are performing in real-world scenarios. In this phase, it‚Äôs important to develop an understanding for the types of inputs the app is performing well or poorly on and how exactly it‚Äôs breaking down in those cases. Both feedback collection and run annotation are critical for this workflow. This will help in curation of test cases that can help track regressions/improvements and development of automatic evaluations.Collecting Feedback‚ÄãWhen launching your application to an initial set of users, it‚Äôs important to gather human feedback on the responses it‚Äôs producing. This helps draw attention to the most interesting runs and highlight edge cases that are causing problematic responses. LangSmith allows you to attach feedback scores to logged traces (oftentimes, this is hooked up to a feedback button in your app), then filter on traces that have a specific feedback tag and score. A common workflow is to filter on traces that receive a poor user feedback score, then drill down into problematic points using the detailed trace view.Annotating Traces‚ÄãLangSmith also supports sending runs to annotation queues, which allow annotators to closely inspect interesting traces and annotate them with respect to different criteria. Annotators can be PMs, engineers, or even subject matter experts. This allows users to catch regressions across important evaluation criteria.Adding Runs to a Dataset‚ÄãAs your application progresses through the beta testing phase, it's essential to continue collecting data to refine and improve its performance. LangSmith enables you to add runs as examples to datasets (from both the project page and within an annotation queue), expanding your test coverage on real-world scenarios. This is a key benefit in having your logging system and your evaluation/testing system in the same platform.Production‚ÄãClosely inspecting key data points, growing benchmarking datasets, annotating traces, and drilling down into important data in trace view are workflows you‚Äôll also want to do once your app hits production. However, especially at the production stage, it‚Äôs crucial to get a high-level overview of application performance with respect to latency, cost, and feedback scores. This ensures that it's delivering desirable results at scale.Monitoring and A/B Testing‚ÄãLangSmith provides monitoring charts that allow you to track key metrics over time. You can expand to view metrics for a given period and drill down into a specific data point to get a trace table for that time period ‚Äî this is especially handy for debugging production issues.LangSmith also allows for tag and metadata grouping, which allows users to mark different versions of their applications with different identifiers and view how they are performing side-by-side within each chart. This is helpful for A/B testing changes in prompt, model, or retrieval strategy.Was this page helpful?PreviousLangSmithNextSetupPrototypingBeta TestingProductionCommunityDiscordTwitterGitHubDocs CodeLangSmith SDKPythonJS/TSMoreHomepageBlogCopyright ¬© 2024 LangChain, Inc.\", metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'})]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# ユーザーが引き続く質問をしたインスタンスを渡して、テストする。\n",
"from langchain_core.messages import HumanMessage, AIMessage\n",
"\n",
"chat_history = [\n",
" HumanMessage(content=\"Can LangSmith help test my LLM applications?\"),\n",
" AIMessage(content=\"Yes!\")\n",
"]\n",
"retriever_chain.invoke({\n",
" \"chat_history\": chat_history,\n",
" \"input\": \"Tell me how\"\n",
"})"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1710126346108,
"user": {
"displayName": "Yoshihisa Nitta",
"userId": "15888006800030996813"
},
"user_tz": -540
},
"id": "RTKJ7Cn5pNuT"
},
"outputs": [],
"source": [
"# 新しい retriever を得て、検索済みの文書を記憶しながら、会話を続けるための新しい chain を生成することができる。\n",
"prompt = ChatPromptTemplate.from_messages([\n",
" (\"system\", \"\"\"\n",
" Answer the user's questions based on the below context:\n",
"\n",
" {context}\n",
" \"\"\"),\n",
" MessagesPlaceholder(variable_name=\"chat_history\"),\n",
" (\"user\", \"{input}\"),\n",
"])\n",
"document_chain = create_stuff_documents_chain(llm, prompt)\n",
"\n",
"retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 3908,
"status": "ok",
"timestamp": 1710126350015,
"user": {
"displayName": "Yoshihisa Nitta",
"userId": "15888006800030996813"
},
"user_tz": -540
},
"id": "0AQN38-duQHu",
"outputId": "cf8c0164-dbd6-4f5e-a2a9-08a97d2dac10"
},
"outputs": [
{
"data": {
"text/plain": [
"{'chat_history': [HumanMessage(content='Can LangSmith help test my LLM applications?'),\n",
" AIMessage(content='Yes!')],\n",
" 'input': 'Tell me how',\n",
" 'context': [Document(page_content='Skip to main content\\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith DocsLangChain Python DocsLangChain JS/TS DocsLangSmith API DocsSearchGo to AppLangSmithUser GuideSetupPricing (Coming Soon)Self-HostingTracingEvaluationMonitoringPrompt HubProxyUser GuideOn this pageLangSmith User GuideLangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.Prototyping‚ÄãPrototyping LLM applications often involves quick experimentation between prompts, model types, retrieval strategy and other parameters.\\nThe ability to rapidly understand how the model is performing ‚Äî and debug where it is failing ‚Äî is incredibly important for this phase.Debugging‚ÄãWhen developing new LLM applications, we suggest having LangSmith tracing enabled by default.\\nOftentimes, it isn‚Äôt necessary to look at every single trace. However, when things go wrong (an unexpected end result, infinite agent loop, slower than expected execution, higher than expected token usage), it‚Äôs extremely helpful to debug by looking through the application traces. LangSmith gives clear visibility and debugging information at each step of an LLM sequence, making it much easier to identify and root-cause issues.\\nWe provide native rendering of chat messages, functions, and retrieve documents.Initial Test Set‚ÄãWhile many developers still ship an initial version of their application based on ‚Äúvibe checks‚Äù, we‚Äôve seen an increasing number of engineering teams start to adopt a more test driven approach. LangSmith allows developers to create datasets, which are collections of inputs and reference outputs, and use these to run tests on their LLM applications.\\nThese test cases can be uploaded in bulk, created on the fly, or exported from application traces. LangSmith also makes it easy to run custom evaluations (both LLM and heuristic based) to score test results.Comparison View‚ÄãWhen prototyping different versions of your applications and making changes, it‚Äôs important to see whether or not you‚Äôve regressed with respect to your initial test cases.\\nOftentimes, changes in the prompt, retrieval strategy, or model choice can have huge implications in responses produced by your application.\\nIn order to get a sense for which variant is performing better, it‚Äôs useful to be able to view results for different configurations on the same datapoints side-by-side. We‚Äôve invested heavily in a user-friendly comparison view for test runs to track and diagnose regressions in test scores across multiple revisions of your application.Playground‚ÄãLangSmith provides a playground environment for rapid iteration and experimentation.\\nThis allows you to quickly test out different prompts and models. You can open the playground from any prompt or model run in your trace.', metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}),\n",
" Document(page_content='LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}),\n",
" Document(page_content=\"This allows you to quickly test out different prompts and models. You can open the playground from any prompt or model run in your trace.\\nEvery playground run is logged in the system and can be used to create test cases or compare with other runs.Beta Testing‚ÄãBeta testing allows developers to collect more data on how their LLM applications are performing in real-world scenarios. In this phase, it‚Äôs important to develop an understanding for the types of inputs the app is performing well or poorly on and how exactly it‚Äôs breaking down in those cases. Both feedback collection and run annotation are critical for this workflow. This will help in curation of test cases that can help track regressions/improvements and development of automatic evaluations.Collecting Feedback‚ÄãWhen launching your application to an initial set of users, it‚Äôs important to gather human feedback on the responses it‚Äôs producing. This helps draw attention to the most interesting runs and highlight edge cases that are causing problematic responses. LangSmith allows you to attach feedback scores to logged traces (oftentimes, this is hooked up to a feedback button in your app), then filter on traces that have a specific feedback tag and score. A common workflow is to filter on traces that receive a poor user feedback score, then drill down into problematic points using the detailed trace view.Annotating Traces‚ÄãLangSmith also supports sending runs to annotation queues, which allow annotators to closely inspect interesting traces and annotate them with respect to different criteria. Annotators can be PMs, engineers, or even subject matter experts. This allows users to catch regressions across important evaluation criteria.Adding Runs to a Dataset‚ÄãAs your application progresses through the beta testing phase, it's essential to continue collecting data to refine and improve its performance. LangSmith enables you to add runs as examples to datasets (from both the project page and within an annotation queue), expanding your test coverage on real-world scenarios. This is a key benefit in having your logging system and your evaluation/testing system in the same platform.Production‚ÄãClosely inspecting key data points, growing benchmarking datasets, annotating traces, and drilling down into important data in trace view are workflows you‚Äôll also want to do once your app hits production. However, especially at the production stage, it‚Äôs crucial to get a high-level overview of application performance with respect to latency, cost, and feedback scores. This ensures that it's delivering desirable results at scale.Monitoring and A/B Testing‚ÄãLangSmith provides monitoring charts that allow you to track key metrics over time. You can expand to view metrics for a given period and drill down into a specific data point to get a trace table for that time period ‚Äî this is especially handy for debugging production issues.LangSmith also allows for tag and metadata grouping, which allows users to mark different versions of their applications with different identifiers and view how they are performing side-by-side within each chart. This is helpful for A/B testing changes in prompt, model, or retrieval strategy.Was this page helpful?PreviousLangSmithNextSetupPrototypingBeta TestingProductionCommunityDiscordTwitterGitHubDocs CodeLangSmith SDKPythonJS/TSMoreHomepageBlogCopyright ¬© 2024 LangChain, Inc.\", metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'})],\n",
" 'answer': 'LangSmith can help test your LLM applications by providing features such as:\\n\\n1. Prototyping: Allows quick experimentation between prompts, model types, and retrieval strategies.\\n2. Debugging: Offers clear visibility and debugging information at each step of an LLM sequence.\\n3. Initial Test Set: Enables developers to create datasets for running tests on their applications.\\n4. Comparison View: Helps track and diagnose regressions in test scores across multiple revisions of your application.\\n5. Playground: Provides a playground environment for rapid iteration and experimentation with different prompts and models.\\n6. Beta Testing: Allows developers to collect more data on how their applications are performing in real-world scenarios.\\n7. Monitoring and A/B Testing: Provides monitoring charts to track key metrics over time and compare different versions of applications.\\n\\nThese features in LangSmith facilitate testing and evaluation of LLM applications at various stages of development and deployment.'}"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# end-to-end でテストできるようになった。\n",
"\n",
"chat_history = [\n",
" HumanMessage(content=\"Can LangSmith help test my LLM applications?\"),\n",
" AIMessage(content=\"Yes!\")\n",
"]\n",
"retrieval_chain.invoke({\n",
" \"chat_history\": chat_history,\n",
" \"input\": \"Tell me how\"\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ry7cHDHzvsDg"
},
"source": [
"LangSimth でのテストに関する文書を、これが返すことがわかる。\n",
"これは、続く質問を会話履歴に結合して、LLM が新しい質問を生成したおかげである。\n",
"\n",
"この新しい retriver を得たので、検索した文書を心にとめつつ、会話を続ける新しい chain を生成できる。"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"executionInfo": {
"elapsed": 12,
"status": "ok",
"timestamp": 1710126350015,
"user": {
"displayName": "Yoshihisa Nitta",
"userId": "15888006800030996813"
},
"user_tz": -540
},
"id": "KqZpKzTyunEL"
},
"outputs": [],
"source": [
"prompt = ChatPromptTemplate.from_messages([\n",
" (\"system\", \"\"\"\n",
" Answer the user's questions based on the below context:\n",
"\n",
" {context}\n",
" \"\"\"),\n",
" MessagesPlaceholder(variable_name=\"chat_history\"),\n",
" (\"user\", \"{input}\"),\n",
"])\n",
"\n",
"document_chain = create_stuff_documents_chain(llm, prompt)\n",
"\n",
"retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 3330,
"status": "ok",
"timestamp": 1710126353334,
"user": {
"displayName": "Yoshihisa Nitta",
"userId": "15888006800030996813"
},
"user_tz": -540
},
"id": "AhjdG-HExZ6E",
"outputId": "85c91d40-362b-4331-b2b5-464e3e4027c8"
},
"outputs": [
{
"data": {
"text/plain": [
"{'chat_history': [HumanMessage(content='Can LangSmith help test my LLM applications?'),\n",
" AIMessage(content='Yes!')],\n",
" 'input': 'Tell me how',\n",
" 'context': [Document(page_content='Skip to main content\\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith DocsLangChain Python DocsLangChain JS/TS DocsLangSmith API DocsSearchGo to AppLangSmithUser GuideSetupPricing (Coming Soon)Self-HostingTracingEvaluationMonitoringPrompt HubProxyUser GuideOn this pageLangSmith User GuideLangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.Prototyping‚ÄãPrototyping LLM applications often involves quick experimentation between prompts, model types, retrieval strategy and other parameters.\\nThe ability to rapidly understand how the model is performing ‚Äî and debug where it is failing ‚Äî is incredibly important for this phase.Debugging‚ÄãWhen developing new LLM applications, we suggest having LangSmith tracing enabled by default.\\nOftentimes, it isn‚Äôt necessary to look at every single trace. However, when things go wrong (an unexpected end result, infinite agent loop, slower than expected execution, higher than expected token usage), it‚Äôs extremely helpful to debug by looking through the application traces. LangSmith gives clear visibility and debugging information at each step of an LLM sequence, making it much easier to identify and root-cause issues.\\nWe provide native rendering of chat messages, functions, and retrieve documents.Initial Test Set‚ÄãWhile many developers still ship an initial version of their application based on ‚Äúvibe checks‚Äù, we‚Äôve seen an increasing number of engineering teams start to adopt a more test driven approach. LangSmith allows developers to create datasets, which are collections of inputs and reference outputs, and use these to run tests on their LLM applications.\\nThese test cases can be uploaded in bulk, created on the fly, or exported from application traces. LangSmith also makes it easy to run custom evaluations (both LLM and heuristic based) to score test results.Comparison View‚ÄãWhen prototyping different versions of your applications and making changes, it‚Äôs important to see whether or not you‚Äôve regressed with respect to your initial test cases.\\nOftentimes, changes in the prompt, retrieval strategy, or model choice can have huge implications in responses produced by your application.\\nIn order to get a sense for which variant is performing better, it‚Äôs useful to be able to view results for different configurations on the same datapoints side-by-side. We‚Äôve invested heavily in a user-friendly comparison view for test runs to track and diagnose regressions in test scores across multiple revisions of your application.Playground‚ÄãLangSmith provides a playground environment for rapid iteration and experimentation.\\nThis allows you to quickly test out different prompts and models. You can open the playground from any prompt or model run in your trace.', metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}),\n",
" Document(page_content='LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}),\n",
" Document(page_content=\"This allows you to quickly test out different prompts and models. You can open the playground from any prompt or model run in your trace.\\nEvery playground run is logged in the system and can be used to create test cases or compare with other runs.Beta Testing‚ÄãBeta testing allows developers to collect more data on how their LLM applications are performing in real-world scenarios. In this phase, it‚Äôs important to develop an understanding for the types of inputs the app is performing well or poorly on and how exactly it‚Äôs breaking down in those cases. Both feedback collection and run annotation are critical for this workflow. This will help in curation of test cases that can help track regressions/improvements and development of automatic evaluations.Collecting Feedback‚ÄãWhen launching your application to an initial set of users, it‚Äôs important to gather human feedback on the responses it‚Äôs producing. This helps draw attention to the most interesting runs and highlight edge cases that are causing problematic responses. LangSmith allows you to attach feedback scores to logged traces (oftentimes, this is hooked up to a feedback button in your app), then filter on traces that have a specific feedback tag and score. A common workflow is to filter on traces that receive a poor user feedback score, then drill down into problematic points using the detailed trace view.Annotating Traces‚ÄãLangSmith also supports sending runs to annotation queues, which allow annotators to closely inspect interesting traces and annotate them with respect to different criteria. Annotators can be PMs, engineers, or even subject matter experts. This allows users to catch regressions across important evaluation criteria.Adding Runs to a Dataset‚ÄãAs your application progresses through the beta testing phase, it's essential to continue collecting data to refine and improve its performance. LangSmith enables you to add runs as examples to datasets (from both the project page and within an annotation queue), expanding your test coverage on real-world scenarios. This is a key benefit in having your logging system and your evaluation/testing system in the same platform.Production‚ÄãClosely inspecting key data points, growing benchmarking datasets, annotating traces, and drilling down into important data in trace view are workflows you‚Äôll also want to do once your app hits production. However, especially at the production stage, it‚Äôs crucial to get a high-level overview of application performance with respect to latency, cost, and feedback scores. This ensures that it's delivering desirable results at scale.Monitoring and A/B Testing‚ÄãLangSmith provides monitoring charts that allow you to track key metrics over time. You can expand to view metrics for a given period and drill down into a specific data point to get a trace table for that time period ‚Äî this is especially handy for debugging production issues.LangSmith also allows for tag and metadata grouping, which allows users to mark different versions of their applications with different identifiers and view how they are performing side-by-side within each chart. This is helpful for A/B testing changes in prompt, model, or retrieval strategy.Was this page helpful?PreviousLangSmithNextSetupPrototypingBeta TestingProductionCommunityDiscordTwitterGitHubDocs CodeLangSmith SDKPythonJS/TSMoreHomepageBlogCopyright ¬© 2024 LangChain, Inc.\", metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \\uf8ffü¶úÔ∏è\\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'})],\n",
" 'answer': 'LangSmith can help test your LLM applications by providing features such as tracing, test case creation, custom evaluations, comparison view, playground environment for rapid iteration and experimentation, beta testing support, feedback collection, annotation queues, adding runs to datasets, monitoring charts for tracking key metrics, and A/B testing capabilities. These features allow you to assess and improve the performance of your LLM applications at various stages of development and production.'}"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# end-to-end でテストする\n",
"chat_history = [\n",
" HumanMessage(content=\"Can LangSmith help test my LLM applications?\"),\n",
" AIMessage(content=\"Yes!\")\n",
"]\n",
"\n",
"retrieval_chain.invoke({\n",
" \"chat_history\": chat_history,\n",
" \"input\": \"Tell me how\"\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "P26qPgYPyJhk"
},
"source": [
"一貫した回答が得られたことがわかる。retrieval chain (質問チェイン)を chat bot へと調整することができた。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OEoE-BQwyYwb"
},
"source": [
"# Agent\n",
"\n",
"各ステップが事前にわかっている chain 例を作成してきた。\n",
"最後に作成するのは agent で、agent では LLM がどのステップを取るかを決定する。\n",
"\n",
"[注意] ローカルなモデルはまだ信頼性がないので、OpenAI モデルを使った agent の作り方のみを説明する。\n",
"\n",
"agent を構築するときにさいしょにすべきなのは、どのツールにアクセスできるかを決定することである。\n",
"この例では、agent が2つのツールにアクセスできるものとする。\n",
"\n",
"