Sync latest code

This commit is contained in:
namgal
2024-08-05 15:34:30 +08:00
parent f3fc5b38db
commit 63a32d2b93
6 changed files with 282 additions and 0 deletions

67
README.md Normal file
View File

@@ -0,0 +1,67 @@
English | [中文](README_ZH.md)
# Whisper Plugin for Maubot
Whisper Plugin is a plugin for Maubot, Support for transcriptions of audio messages to text in Matrix clients using OpenAI's Whisper API
## Features
- Automatically responds to audio and file messages in rooms
- Calls OpenAI Whisper API for audio transcription
- Supports user and room whitelists
- Support reply in thread
![alt text](sample1.png)
## Installation
1. **Clone or download the plugin code**:
```bash
git clone <repository_url>
cd <repository_directory>
zip -9r maubot-stt.mbp *
```
2. **Configure Maubot**:
Make sure you have installed and set up Maubot. Refer to [Maubot's official documentation](https://docs.mau.fi/maubot/usage/basic.html) for detailed steps.
3. **Upload the plugin**:
Upload the plugin in the Maubot management interface.
## Configuration
After uploading the plugin, you need to configure it. Here are the configuration items:
- `api_endpoint`: OpenAI Whisper API endpoint, default is `https://api.openai.com/v1/audio/transcriptions`.
- `openai_api_key`: Your OpenAI API key.
- `allowed_users`: List of users allowed to use this plugin. If empty, all users are allowed.
- `allowed_rooms`: List of rooms allowed to use this plugin. If empty, all rooms are allowed.
### Configuration Example
In the Maubot management interface, go to the plugin's configuration page and fill in the following content:
```json
{
"api_endpoint": "https://api.openai.com/v1/audio/transcriptions",
"openai_api_key": "your_openai_api_key",
"allowed_users": ["@user1:matrix.org", "@user2:matrix.org"],
"allowed_rooms": ["!roomid:matrix.org"]
}
```
It is recommended to close and reopen the instance configuration after saving.
## Usage
The plugin will automatically listen to messages in the room and perform the following actions upon receiving a voice message or audio file:
1. Download the audio file.
2. Call the OpenAI Whisper API for transcription.
3. Send the transcription result as a text message to the corresponding room.
## License
This project is licensed under the MIT License.

69
README_ZH.md Normal file
View File

@@ -0,0 +1,69 @@
[English](README.md) | 中文
# Whisper Plugin for Maubot
Whisper Plugin 是一个用于 Maubot 的插件,支持在 Matrix 客户端用 OpenAI 的 Whisper API 将音频消息转录为文本。
## 功能
- 自动响应房间中的音频和文件消息
- 调用 OpenAI Whisper API 进行音频转录
- 支持用户和房间白名单
- 支持在线程中回复
![alt text](sample1.png)
## 安装
1. **克隆或下载插件代码**
```bash
git clone <repository_url>
cd <repository_directory>
zip -9r maubot-stt.mbp *
```
2. **配置 Maubot**
请确保你已经安装并设置好了 Maubot。具体步骤可以参考 [Maubot 的官方文档](https://docs.mau.fi/maubot/usage/basic.html)。
3. **上传插件**
在 Maubot 管理界面上传插件。
## 配置
上传插件后,你需要进行一些配置。以下是配置项的说明:
- `api_endpoint`OpenAI Whisper API 端点,默认为 `https://api.openai.com/v1/audio/transcriptions`。
- `openai_api_key`:你的 OpenAI API 密钥。
- `allowed_users`:允许使用此插件的用户列表。如果为空列表,则允许所有用户使用。
- `allowed_rooms`:允许使用此插件的房间列表。如果为空列表,则允许所有房间使用。
### 配置示例
在 Maubot 管理界面,进入插件的配置页面,填写以下内容:
```json
{
"api_endpoint": "https://api.openai.com/v1/audio/transcriptions",
"openai_api_key": "your_openai_api_key",
"allowed_users": ["@user1:matrix.org", "@user2:matrix.org"],
"allowed_rooms": ["!roomid:matrix.org"]
}
```
建议关闭实例配置保存后再打开。
## 使用
插件会自动监听房间中的消息,当在接收到语音消息或音频文件时进行以下操作:
1. 下载音频文件。
2. 调用 OpenAI Whisper API 进行转录。
3. 将转录结果以文本消息的形式发送到相应的房间。
## 许可证
本项目采用 MIT 许可证。

18
base-config.yaml Normal file
View File

@@ -0,0 +1,18 @@
openai_api_key: sk-
# API endpoint to connect to
whisper_endpoint: https://api.openai.com/v1/audio/transcriptions
# List of allowed users
# [] means no restriction
allowed_users: []
# List of allowed rooms
# [] means no restriction
allowed_rooms: []
# Optional text to guide the model's style or to continue from a previous audio segment. The prompt should match the language of the audio.
prompt: "以下是普通话录制的会议记录:"
# The language of the input audio. Providing the input language in ISO-639-1 format (Chinese: "zh") will improve accuracy and latency, leave empty by default
language:

11
maubot.yaml Normal file
View File

@@ -0,0 +1,11 @@
maubot: 0.1.0
id: nigzu.com.maubot-stt
version: 0.2.6
license: MIT
modules:
- openai-whisper
main_class: WhisperPlugin
config: true
extra_files:
- base-config.yaml
database: false

117
openai-whisper.py Normal file
View File

@@ -0,0 +1,117 @@
import asyncio
import json
import os
import re
import aiohttp
from datetime import datetime
from typing import Type
from mautrix.client import Client
from maubot.handlers import event
from maubot import Plugin, MessageEvent
from mautrix.errors import MatrixRequestError
from mautrix.types import EventType, MessageType, RelationType, TextMessageEventContent, Format,RelatesTo,InReplyTo
from mautrix.util.config import BaseProxyConfig, ConfigUpdateHelper
class Config(BaseProxyConfig):
def do_update(self, helper: ConfigUpdateHelper) -> None:
helper.copy("whisper_endpoint")
helper.copy("openai_api_key")
helper.copy("allowed_users")
helper.copy("allowed_rooms")
helper.copy("prompt")
helper.copy("language")
class WhisperPlugin(Plugin):
async def start(self) -> None:
await super().start()
self.config.load_and_update()
self.whisper_endpoint = self.config['whisper_endpoint']
self.api_key = self.config['openai_api_key']
self.prompt = self.config['prompt']
self.language = self.config['language']
self.allowed_users = self.config['allowed_users']
self.allowed_rooms = self.config['allowed_rooms']
self.log.debug("Whisper plugin started")
async def should_respond(self, event: MessageEvent) -> bool:
if event.sender == self.client.mxid:
return False
if self.allowed_users and event.sender not in self.allowed_users:
return False
if self.allowed_rooms and event.room_id not in self.allowed_rooms:
return False
return event.content.msgtype == MessageType.AUDIO or event.content.msgtype == MessageType.FILE
@event.on(EventType.ROOM_MESSAGE)
async def on_message(self, event: MessageEvent) -> None:
if not await self.should_respond(event):
return
try:
await event.mark_read()
await self.client.set_typing(event.room_id, timeout=99999)
audio_bytes = await self.client.download_media(url=event.content.url)
transcription = await self.transcribe_audio(audio_bytes)
await self.client.set_typing(event.room_id, timeout=0)
content = TextMessageEventContent(
msgtype=MessageType.TEXT,
body=transcription,
format=Format.HTML,
formatted_body=transcription
)
in_reply_to = InReplyTo(event_id=event.event_id)
if event.content.relates_to and event.content.relates_to.rel_type == RelationType.THREAD:
await event.respond(content, in_thread=True)
else:
content.relates_to = RelatesTo(
in_reply_to=in_reply_to
)
await event.respond(content)
except Exception as e:
self.log.exception(f"Something went wrong: {e}")
await event.respond(f"Something went wrong: {e}")
async def transcribe_audio(self, audio_bytes: bytes) -> str:
headers = {
"Authorization": f"Bearer {self.api_key}"
}
data = aiohttp.FormData()
data.add_field('file', audio_bytes, filename='audio.mp3', content_type='audio/mpeg')
data.add_field('model', 'whisper-1')
if self.prompt:
data.add_field('prompt', f"{self.prompt}")
if self.language:
data.add_field('language', f"{self.language}")
try:
async with aiohttp.ClientSession() as session:
async with session.post(self.whisper_endpoint, headers=headers, data=data) as response:
if response.status != 200:
self.log.error(f"Error response from API: {await response.text()}")
return f"Error: {await response.text()}"
response_json = await response.json()
return response_json.get("text", "Sorry, I can't transcribe the audio.")
except Exception as e:
self.log.exception(f"Failed to transcribe audio, msg: {e}")
return "Sorry, an error occurred while transcribing the audio."
@classmethod
def get_config_class(cls) -> Type[BaseProxyConfig]:
return Config
def save_config(self) -> None:
self.config.save()
async def update_config(self, new_config: dict) -> None:
self.config.update(new_config)
self.save_config()
self.log.debug("Configuration updated and saved")

BIN
sample1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 228 KiB