Logo ### @ztimson/ai-utils AI Utility Library - Unified interface for multiple AI providers [![Version](https://img.shields.io/badge/dynamic/json.svg?label=Version&style=for-the-badge&url=https://git.zakscode.com/api/v1/repos/ztimson/ai-utils/tags&query=$[0].name)](https://git.zakscode.com/ztimson/ai-utils/tags) [![Pull Requests](https://img.shields.io/badge/dynamic/json.svg?label=Pull%20Requests&style=for-the-badge&url=https://git.zakscode.com/api/v1/repos/ztimson/ai-utils&query=open_pr_counter)](https://git.zakscode.com/ztimson/ai-utils/pulls) [![Issues](https://img.shields.io/badge/dynamic/json.svg?label=Issues&style=for-the-badge&url=https://git.zakscode.com/api/v1/repos/ztimson/ai-utils&query=open_issues_count)](https://git.zakscode.com/ztimson/ai-utils/issues) ---
DocumentationRelease NotesReport a BugRequest a Feature
---
## Table of Contents - [@ztimson/ai-utils](#top) - [About](#about) - [Features](#features) - [Built With](#built-with) - [Setup](#setup) - [Production](#production) - [Development](#development) - [Documentation](https://ai-utils.docs.zakscode.com/) - [License](#license) ## About A TypeScript library that provides a unified interface for working with multiple AI providers, making it easy to integrate various AI capabilities into your applications. ### Features - **Multi-Provider LLM Support**: Seamlessly work with OpenAI, Anthropic (Claude), and Self-hosted (Ollama) models - **Audio Speech Recognition (ASR)**: Convert audio to text using Whisper models - **Optical Character Recognition (OCR)**: Extract text from images using Tesseract - **Semantic Similarity**: Compare text similarity using tensor-based cosine similarity - **Provider Abstraction**: Switch between AI providers without changing your code ### Built With [![Anthropic](https://img.shields.io/badge/Anthropic-de7356?style=for-the-badge&logo=anthropic&logoColor=white)](https://anthropic.com/) [![llama](https://img.shields.io/badge/llama.cpp-fff?style=for-the-badge&logo=ollama&logoColor=black)](https://github.com/ggml-org/llama.cpp) [![OpenAI](https://img.shields.io/badge/OpenAI-000?style=for-the-badge&logo=openai-gym&logoColor=white)](https://openai.com/) [![Pyannote](https://img.shields.io/badge/Pyannote-458864?style=for-the-badge&logo=python&logoColor=white)](https://github.com/pyannote) [![TensorFlow](https://img.shields.io/badge/TensorFlow-fff?style=for-the-badge&logo=tensorflow&logoColor=ff6f00)](https://tensorflow.org/) [![Tesseract](https://img.shields.io/badge/Tesseract-B874B2?style=for-the-badge&logo=hack-the-box&logoColor=white)](https://tesseract-ocr.github.io/) [![Transformers.js](https://img.shields.io/badge/Transformers.js-000?style=for-the-badge&logo=hugging-face&logoColor=yellow)](https://huggingface.co/docs/transformers.js/en/index) [![TypeScript](https://img.shields.io/badge/TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white)](https://typescriptlang.org/) [![Whisper](https://img.shields.io/badge/Whisper.cpp-000?style=for-the-badge&logo=openai-gym&logoColor=white)](https://github.com/ggerganov/whisper.cpp) ## Setup

Production

#### Prerequisites - [Node.js](https://nodejs.org/en/download) #### Instructions 1. Install the package: `npm i @ztimson/ai-utils` 2. For speaker diarization: `pip install pyannote.audio`

Development

#### Prerequisites - [Node.js](https://nodejs.org/en/download) - _[Whisper.cpp](https://github.com/ggml-org/whisper.cpp/releases/tag) (ASR)_ - _[Pyannote](https://github.com/pyannote) (ASR Diarization):_ `pip install pyannote.audio` #### Instructions 1. Install the dependencies: `npm i` 2. For speaker diarization: `pip install pyannote.audio` 3. Build library: `npm build` 4. Run unit tests: `npm test`
## Documentation ### Setup ```javascript const ai = new Ai({ path: '/ai-models', // Setup audio whisper: '/path/to/binary', // Required for ASR hfToken: '...', // Required for diarization asr: 'ggml-base.en.bin', // Override default ASR model // Setup LLM embedder: 'bge-small-en-v1.5', // Override default embedder model llm: { system: 'You are a helpful assistant.', compress: {max: 90_000, min: 50_000}, // Compress chat history to min tokens when max is reached temperature: 0.8, max_tokens: 100_000, memoryModel: 'gpt-4o', // Cheap model for managing memories in background, defaults to current model models: { 'claude-3-5-sonnet': {proto: 'anthropic', token: process.env.ANTHROPIC_TOKEN}, 'gpt-4o': {proto: 'openai', token: process.env.OPENAI_TOKEN}, 'llama3': {proto: 'ollama', host: 'http://localhost:11434'}, }, mcp: [ {name: 'files', url: 'https://mcp.example.com', token: process.env.MCP_TOKEN} ], skills: [ {name: 'Tone of voice', description: 'Brand writing guidelines', content: '# Tone of Voice\n\nAlways be concise and friendly...'} ], tools: [{ name: 'Marco?', description: 'Where is marco polo?', args: { shout: {type: 'boolean', default: 'Shout into the void?', description: false, required: false} }, fn: (args: any, stream: LLMRequest['stream'], ai: Ai) => { const {shout} = args; return shout ? 'Polo!' : 'Polo'; } }], }, // Setup Vision ocr: 'eng' // Override default OCR model }); ``` ### Audio ```javascript // Crate audio transcript const text = await ai.audio.asr('./path/to/audio.mp3'); console.log(text); // Break transcript into speakers const text = await ai.audio.asr('./path/to/audio.mp3', {diarization: true}); console.log(text); // Break transcript into named speakers const text = await ai.audio.asr('./path/to/audio.mp3', {diarization: 'llm'}); console.log(text); ``` ### Language ```javascript const history = [], memory = []; // Wait for entire response const text = await ai.language.ask('My favorite color is blue, whats yours?', {history, memory}); console.log(text); // Stream response const chunks = ''; await ai.language.ask('Write me a poem', { history, memory, stream: chunk => chunks += chunk, }); console.log(chunks); // Manually compile history into memories at end of conversation // Happens automatically when coverstaions are compressed await ai.language.updateMemory(history, memory); // Summarize text const summary = await ai.language.summarize(longText, 200); // Code response (no conversation or extra BS) const code = await ai.language.code('Write a fibonacci function'); // Structured JSON response const data = await ai.language.json('Extract the name and age', `{ "name": "string", "age": "number" }`, {system: 'Extract from user input'}); ``` #### Premade LLM Tools: - `cli`: Run a shell command, returns its output - `get_datetime`: Returns local date/time - `get_datetime_utc`: Returns current UTC date/time - `exec`: Execute code in cli, node, or python - `fetch`: Make HTTP requests (GET/POST/PUT/DELETE) - `exec_javascript`: Execute CommonJS JavaScript - `exec_python`: Execute Python via python -c - `read_webpage`: Scrape & clean content from a URL, handles HTML, JSON, CSV, media, PDFs etc. - `web_search`: Anonymous DuckDuckGo search, returns a list of URLs - `wikipedia_lookup`: Fetch a Wikipedia article (intro or full) - `wikipedia_search`: Search Wikipedia and return matching articles - `get_weather`: Fetch current weather + forecast for a location (just built!) ### Vision ```javascript // Extract text from image const text = await ai.vision.ocr('./path/to/image.png'); console.log(text); ``` ## License Copyright © 2023 Zakary Timson | Available under MIT Licensing See the [license](_media/LICENSE) for more information.