231 lines
8.6 KiB
Markdown
231 lines
8.6 KiB
Markdown
<!-- Header -->
|
|
<div id="top" align="center">
|
|
<br />
|
|
|
|
<!-- Logo -->
|
|
<img alt="Logo" width="200" height="200" src="https://git.zakscode.com/repo-avatars/a82d423674763e7a0c1c945bdbb07e249b2bb786d3c9beae76d5b196a10f5c0f">
|
|
|
|
<!-- Title -->
|
|
### @ztimson/ai-utils
|
|
|
|
<!-- Description -->
|
|
AI Utility Library - Unified interface for multiple AI providers
|
|
|
|
<!-- Repo badges -->
|
|
[](https://git.zakscode.com/ztimson/ai-utils/tags)
|
|
[](https://git.zakscode.com/ztimson/ai-utils/pulls)
|
|
[](https://git.zakscode.com/ztimson/ai-utils/issues)
|
|
|
|
<!-- Links -->
|
|
|
|
---
|
|
<div>
|
|
<a href="https://ai-utils.docs.zakscode.com" target="_blank">Documentation</a>
|
|
• <a href="https://git.zakscode.com/ztimson/ai-utils/releases" target="_blank">Release Notes</a>
|
|
• <a href="https://git.zakscode.com/ztimson/ai-utils/issues/new?template=.github%2fissue_template%2fbug.md" target="_blank">Report a Bug</a>
|
|
• <a href="https://git.zakscode.com/ztimson/ai-utils/issues/new?template=.github%2fissue_template%2fenhancement.md" target="_blank">Request a Feature</a>
|
|
</div>
|
|
|
|
---
|
|
</div>
|
|
|
|
## Table of Contents
|
|
- [@ztimson/ai-utils](#top)
|
|
- [About](#about)
|
|
- [Features](#features)
|
|
- [Built With](#built-with)
|
|
- [Setup](#setup)
|
|
- [Production](#production)
|
|
- [Development](#development)
|
|
- [Documentation](https://ai-utils.docs.zakscode.com/)
|
|
- [License](#license)
|
|
|
|
## About
|
|
|
|
A TypeScript library that provides a unified interface for working with multiple AI providers, making it easy to integrate various AI capabilities into your applications.
|
|
|
|
### Features
|
|
|
|
- **Multi-Provider LLM Support**: Seamlessly work with OpenAI, Anthropic (Claude), and Self-hosted (Ollama) models
|
|
- **Audio Speech Recognition (ASR)**: Convert audio to text using Whisper models
|
|
- **Optical Character Recognition (OCR)**: Extract text from images using Tesseract
|
|
- **Semantic Similarity**: Compare text similarity using tensor-based cosine similarity
|
|
- **Provider Abstraction**: Switch between AI providers without changing your code
|
|
|
|
### Built With
|
|
[](https://anthropic.com/)
|
|
[](https://github.com/ggml-org/llama.cpp)
|
|
[](https://openai.com/)
|
|
[](https://github.com/pyannote)
|
|
[](https://tensorflow.org/)
|
|
[](https://tesseract-ocr.github.io/)
|
|
[](https://huggingface.co/docs/transformers.js/en/index)
|
|
[](https://typescriptlang.org/)
|
|
[](https://github.com/ggerganov/whisper.cpp)
|
|
|
|
## Setup
|
|
|
|
<details>
|
|
<summary>
|
|
<h3 id="production" style="display: inline">
|
|
Production
|
|
</h3>
|
|
</summary>
|
|
|
|
#### Prerequisites
|
|
- [Node.js](https://nodejs.org/en/download)
|
|
|
|
#### Instructions
|
|
1. Install the package: `npm i @ztimson/ai-utils`
|
|
2. For speaker diarization: `pip install pyannote.audio`
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary>
|
|
<h3 id="development" style="display: inline">
|
|
Development
|
|
</h3>
|
|
</summary>
|
|
|
|
#### Prerequisites
|
|
- [Node.js](https://nodejs.org/en/download)
|
|
- _[Whisper.cpp](https://github.com/ggml-org/whisper.cpp/releases/tag) (ASR)_
|
|
- _[Pyannote](https://github.com/pyannote) (ASR Diarization):_ `pip install pyannote.audio`
|
|
|
|
#### Instructions
|
|
1. Install the dependencies: `npm i`
|
|
2. For speaker diarization: `pip install pyannote.audio`
|
|
3. Build library: `npm build`
|
|
4. Run unit tests: `npm test`
|
|
|
|
</details>
|
|
|
|
## Documentation
|
|
|
|
### Setup
|
|
```javascript
|
|
const ai = new Ai({
|
|
path: '/ai-models',
|
|
|
|
// Setup audio
|
|
whisper: '/path/to/binary', // Required for ASR
|
|
hfToken: '...', // Required for diarization
|
|
asr: 'ggml-base.en.bin', // Override default ASR model
|
|
|
|
// Setup LLM
|
|
embedder: 'bge-small-en-v1.5', // Override default embedder model
|
|
llm: {
|
|
system: 'You are a helpful assistant.',
|
|
compress: {max: 90_000, min: 50_000}, // Compress chat history to min tokens when max is reached
|
|
temperature: 0.8,
|
|
max_tokens: 100_000,
|
|
memoryModel: 'gpt-4o', // Cheap model for managing memories in background, defaults to current model
|
|
models: {
|
|
'claude-3-5-sonnet': {proto: 'anthropic', token: process.env.ANTHROPIC_TOKEN},
|
|
'gpt-4o': {proto: 'openai', token: process.env.OPENAI_TOKEN},
|
|
'llama3': {proto: 'ollama', host: 'http://localhost:11434'},
|
|
},
|
|
mcp: [
|
|
{name: 'files', url: 'https://mcp.example.com', token: process.env.MCP_TOKEN}
|
|
],
|
|
skills: [
|
|
{name: 'Tone of voice', description: 'Brand writing guidelines', content: '# Tone of Voice\n\nAlways be concise and friendly...'}
|
|
],
|
|
tools: [{
|
|
name: 'Marco?',
|
|
description: 'Where is marco polo?',
|
|
args: {
|
|
shout: {type: 'boolean', default: 'Shout into the void?', description: false, required: false}
|
|
},
|
|
fn: (args: any, stream: LLMRequest['stream'], ai: Ai) => {
|
|
const {shout} = args;
|
|
return shout ? 'Polo!' : 'Polo';
|
|
}
|
|
}],
|
|
},
|
|
|
|
// Setup Vision
|
|
ocr: 'eng' // Override default OCR model
|
|
});
|
|
|
|
```
|
|
|
|
### Audio
|
|
|
|
```javascript
|
|
// Crate audio transcript
|
|
const text = await ai.audio.asr('./path/to/audio.mp3');
|
|
console.log(text);
|
|
|
|
// Break transcript into speakers
|
|
const text = await ai.audio.asr('./path/to/audio.mp3', {diarization: true});
|
|
console.log(text);
|
|
|
|
// Break transcript into named speakers
|
|
const text = await ai.audio.asr('./path/to/audio.mp3', {diarization: 'llm'});
|
|
console.log(text);
|
|
```
|
|
|
|
### Language
|
|
|
|
```javascript
|
|
const history = [], memory = [];
|
|
|
|
// Wait for entire response
|
|
const text = await ai.language.ask('My favorite color is blue, whats yours?', {history, memory});
|
|
console.log(text);
|
|
|
|
// Stream response
|
|
const chunks = '';
|
|
await ai.language.ask('Write me a poem', {
|
|
history, memory,
|
|
stream: chunk => chunks += chunk,
|
|
});
|
|
console.log(chunks);
|
|
|
|
// Manually compile history into memories at end of conversation
|
|
// Happens automatically when coverstaions are compressed
|
|
await ai.language.updateMemory(history, memory);
|
|
|
|
// Summarize text
|
|
const summary = await ai.language.summarize(longText, 200);
|
|
|
|
// Code response (no conversation or extra BS)
|
|
const code = await ai.language.code('Write a fibonacci function');
|
|
|
|
// Structured JSON response
|
|
const data = await ai.language.json('Extract the name and age', `{
|
|
"name": "string",
|
|
"age": "number"
|
|
}`, {system: 'Extract from user input'});
|
|
```
|
|
|
|
#### Premade LLM Tools:
|
|
- `cli`: Run a shell command, returns its output
|
|
- `get_datetime`: Returns local date/time
|
|
- `get_datetime_utc`: Returns current UTC date/time
|
|
- `exec`: Execute code in cli, node, or python
|
|
- `fetch`: Make HTTP requests (GET/POST/PUT/DELETE)
|
|
- `exec_javascript`: Execute CommonJS JavaScript
|
|
- `exec_python`: Execute Python via python -c
|
|
- `read_webpage`: Scrape & clean content from a URL, handles HTML, JSON, CSV, media, PDFs etc.
|
|
- `web_search`: Anonymous DuckDuckGo search, returns a list of URLs
|
|
- `wikipedia_lookup`: Fetch a Wikipedia article (intro or full)
|
|
- `wikipedia_search`: Search Wikipedia and return matching articles
|
|
- `get_weather`: Fetch current weather + forecast for a location (just built!)
|
|
|
|
### Vision
|
|
|
|
```javascript
|
|
// Extract text from image
|
|
const text = await ai.vision.ocr('./path/to/image.png');
|
|
console.log(text);
|
|
```
|
|
|
|
## License
|
|
|
|
Copyright © 2023 Zakary Timson | Available under MIT Licensing
|
|
|
|
See the [license](_media/LICENSE) for more information.
|