Sync latest code

2024-08-05 15:34:30 +08:00
parent f3fc5b38db
commit 63a32d2b93
6 changed files with 282 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,67 @@
 English | [中文](README_ZH.md)
 # Whisper Plugin for Maubot
 Whisper Plugin is a plugin for Maubot, Support for transcriptions of audio messages to text in Matrix clients using OpenAI's Whisper API
 ## Features
 - Automatically responds to audio and file messages in rooms
 - Calls OpenAI Whisper API for audio transcription
 - Supports user and room whitelists
 - Support reply in thread
 ![alt text](sample1.png)
 ## Installation
 1. **Clone or download the plugin code**:
    ```bash
    git clone <repository_url>
    cd <repository_directory>
    zip -9r maubot-stt.mbp *
    ```
 2. **Configure Maubot**:
    Make sure you have installed and set up Maubot. Refer to [Maubot's official documentation](https://docs.mau.fi/maubot/usage/basic.html) for detailed steps.
 3. **Upload the plugin**:
    Upload the plugin in the Maubot management interface.
 ## Configuration
 After uploading the plugin, you need to configure it. Here are the configuration items:
 - `api_endpoint`: OpenAI Whisper API endpoint, default is `https://api.openai.com/v1/audio/transcriptions`.
 - `openai_api_key`: Your OpenAI API key.
 - `allowed_users`: List of users allowed to use this plugin. If empty, all users are allowed.
 - `allowed_rooms`: List of rooms allowed to use this plugin. If empty, all rooms are allowed.
 ### Configuration Example
 In the Maubot management interface, go to the plugin's configuration page and fill in the following content:
 ```json
 {
  "api_endpoint": "https://api.openai.com/v1/audio/transcriptions",
  "openai_api_key": "your_openai_api_key",
  "allowed_users": ["@user1:matrix.org", "@user2:matrix.org"],
  "allowed_rooms": ["!roomid:matrix.org"]
 }
 ```
 It is recommended to close and reopen the instance configuration after saving.
 ## Usage
 The plugin will automatically listen to messages in the room and perform the following actions upon receiving a voice message or audio file:
 1. Download the audio file.
 2. Call the OpenAI Whisper API for transcription.
 3. Send the transcription result as a text message to the corresponding room.
 ## License
 This project is licensed under the MIT License.
--- a/README_ZH.md
+++ b/README_ZH.md
@@ -0,0 +1,69 @@
 [English](README.md) | 中文
 # Whisper Plugin for Maubot
 Whisper Plugin 是一个用于 Maubot 的插件，支持在 Matrix 客户端用 OpenAI 的 Whisper API 将音频消息转录为文本。
 ## 功能
 - 自动响应房间中的音频和文件消息
 - 调用 OpenAI Whisper API 进行音频转录
 - 支持用户和房间白名单
 - 支持在线程中回复
 ![alt text](sample1.png)
 ## 安装
 1. **克隆或下载插件代码**：
    ```bash
    git clone <repository_url>
    cd <repository_directory>
    zip -9r maubot-stt.mbp *
    ```
 2. **配置 Maubot**：
    请确保你已经安装并设置好了 Maubot。具体步骤可以参考 [Maubot 的官方文档](https://docs.mau.fi/maubot/usage/basic.html)。
 3. **上传插件**：
    在 Maubot 管理界面上传插件。
 ## 配置
 上传插件后，你需要进行一些配置。以下是配置项的说明：
 - `api_endpoint`：OpenAI Whisper API 端点，默认为 `https://api.openai.com/v1/audio/transcriptions`。
 - `openai_api_key`：你的 OpenAI API 密钥。
 - `allowed_users`：允许使用此插件的用户列表。如果为空列表，则允许所有用户使用。
 - `allowed_rooms`：允许使用此插件的房间列表。如果为空列表，则允许所有房间使用。
 ### 配置示例
 在 Maubot 管理界面，进入插件的配置页面，填写以下内容：
 ```json
 {
  "api_endpoint": "https://api.openai.com/v1/audio/transcriptions",
  "openai_api_key": "your_openai_api_key",
  "allowed_users": ["@user1:matrix.org", "@user2:matrix.org"],
  "allowed_rooms": ["!roomid:matrix.org"]
 }
 ```
 建议关闭实例配置保存后再打开。
 ## 使用
 插件会自动监听房间中的消息，当在接收到语音消息或音频文件时进行以下操作：
 1. 下载音频文件。
 2. 调用 OpenAI Whisper API 进行转录。
 3. 将转录结果以文本消息的形式发送到相应的房间。
 ## 许可证
 本项目采用 MIT 许可证。
--- a/base-config.yaml
+++ b/base-config.yaml
@@ -0,0 +1,18 @@
 openai_api_key: sk-
 # API endpoint to connect to
 whisper_endpoint: https://api.openai.com/v1/audio/transcriptions
 # List of allowed users
 # [] means no restriction
 allowed_users: []
 # List of allowed rooms
 # [] means no restriction
 allowed_rooms: []
 # Optional text to guide the model's style or to continue from a previous audio segment. The prompt should match the language of the audio.
 prompt: "以下是普通话录制的会议记录："
 # The language of the input audio. Providing the input language in ISO-639-1 format (Chinese: "zh") will improve accuracy and latency, leave empty by default
 language:
--- a/maubot.yaml
+++ b/maubot.yaml
@@ -0,0 +1,11 @@
 maubot: 0.1.0
 id: nigzu.com.maubot-stt
 version: 0.2.6
 license: MIT
 modules:
 - openai-whisper
 main_class: WhisperPlugin
 config: true
 extra_files:
 - base-config.yaml
 database: false 
--- a/openai-whisper.py
+++ b/openai-whisper.py
@@ -0,0 +1,117 @@
 import asyncio
 import json
 import os
 import re
 import aiohttp
 from datetime import datetime
 from typing import Type
 from mautrix.client import Client
 from maubot.handlers import event
 from maubot import Plugin, MessageEvent
 from mautrix.errors import MatrixRequestError
 from mautrix.types import EventType, MessageType, RelationType, TextMessageEventContent, Format,RelatesTo,InReplyTo
 from mautrix.util.config import BaseProxyConfig, ConfigUpdateHelper
 class Config(BaseProxyConfig):
    def do_update(self, helper: ConfigUpdateHelper) -> None:
        helper.copy("whisper_endpoint")
        helper.copy("openai_api_key")
        helper.copy("allowed_users")
        helper.copy("allowed_rooms")
        helper.copy("prompt")
        helper.copy("language")
 class WhisperPlugin(Plugin):
    async def start(self) -> None:
        await super().start()
        self.config.load_and_update()
        self.whisper_endpoint = self.config['whisper_endpoint']
        self.api_key = self.config['openai_api_key']
        self.prompt = self.config['prompt']
        self.language = self.config['language']
        self.allowed_users = self.config['allowed_users']
        self.allowed_rooms = self.config['allowed_rooms']
        self.log.debug("Whisper plugin started")
    async def should_respond(self, event: MessageEvent) -> bool:
        if event.sender == self.client.mxid:
            return False
        if self.allowed_users and event.sender not in self.allowed_users:
            return False
        if self.allowed_rooms and event.room_id not in self.allowed_rooms:
            return False
        return event.content.msgtype == MessageType.AUDIO or event.content.msgtype == MessageType.FILE
    @event.on(EventType.ROOM_MESSAGE)
    async def on_message(self, event: MessageEvent) -> None:
        if not await self.should_respond(event):
            return
        try:
            await event.mark_read()
            await self.client.set_typing(event.room_id, timeout=99999)
            audio_bytes = await self.client.download_media(url=event.content.url)
            transcription = await self.transcribe_audio(audio_bytes)
            await self.client.set_typing(event.room_id, timeout=0)
            content = TextMessageEventContent(
                msgtype=MessageType.TEXT,
                body=transcription,
                format=Format.HTML,
                formatted_body=transcription
            )
            in_reply_to = InReplyTo(event_id=event.event_id)
            if event.content.relates_to and event.content.relates_to.rel_type == RelationType.THREAD:
                await event.respond(content, in_thread=True)
            else:
                content.relates_to = RelatesTo(
                    in_reply_to=in_reply_to
                )
                await event.respond(content)
        except Exception as e:
            self.log.exception(f"Something went wrong: {e}")
            await event.respond(f"Something went wrong: {e}")
    async def transcribe_audio(self, audio_bytes: bytes) -> str:
        headers = {
            "Authorization": f"Bearer {self.api_key}"
        }
        data = aiohttp.FormData()
        data.add_field('file', audio_bytes, filename='audio.mp3', content_type='audio/mpeg')
        data.add_field('model', 'whisper-1')
        if self.prompt:
            data.add_field('prompt', f"{self.prompt}")
        if self.language:
            data.add_field('language', f"{self.language}")
        try:
            async with aiohttp.ClientSession() as session:
                async with session.post(self.whisper_endpoint, headers=headers, data=data) as response:
                    if response.status != 200:
                        self.log.error(f"Error response from API: {await response.text()}")
                        return f"Error: {await response.text()}"
                    response_json = await response.json()
                    return response_json.get("text", "Sorry, I can't transcribe the audio.")
        except Exception as e:
            self.log.exception(f"Failed to transcribe audio, msg: {e}")
            return "Sorry, an error occurred while transcribing the audio."
    @classmethod
    def get_config_class(cls) -> Type[BaseProxyConfig]:
        return Config
    def save_config(self) -> None:
        self.config.save()
    async def update_config(self, new_config: dict) -> None:
        self.config.update(new_config)
        self.save_config()
        self.log.debug("Configuration updated and saved")
--- a/sample1.png
+++ b/sample1.png