ai-utils/README.md

<!-- Header -->
<div id="top" align="center">
  <br />

  <!-- Logo -->
  <img alt="Logo" width="200" height="200" src="https://git.zakscode.com/repo-avatars/a82d423674763e7a0c1c945bdbb07e249b2bb786d3c9beae76d5b196a10f5c0f">

  <!-- Title -->
### @ztimson/ai-utils

  <!-- Description -->
AI Utility Library - Unified interface for multiple AI providers

  <!-- Repo badges -->
[![Version](https://img.shields.io/badge/dynamic/json.svg?label=Version&style=for-the-badge&url=https://git.zakscode.com/api/v1/repos/ztimson/ai-utils/tags&query=$[0].name)](https://git.zakscode.com/ztimson/ai-utils/tags)
[![Pull Requests](https://img.shields.io/badge/dynamic/json.svg?label=Pull%20Requests&style=for-the-badge&url=https://git.zakscode.com/api/v1/repos/ztimson/ai-utils&query=open_pr_counter)](https://git.zakscode.com/ztimson/ai-utils/pulls)
[![Issues](https://img.shields.io/badge/dynamic/json.svg?label=Issues&style=for-the-badge&url=https://git.zakscode.com/api/v1/repos/ztimson/ai-utils&query=open_issues_count)](https://git.zakscode.com/ztimson/ai-utils/issues)

  <!-- Links -->

  ---
  <div>
    <a href="https://ai-utils.docs.zakscode.com" target="_blank">Documentation</a>
    • <a href="https://git.zakscode.com/ztimson/ai-utils/releases" target="_blank">Release Notes</a>
    • <a href="https://git.zakscode.com/ztimson/ai-utils/issues/new?template=.github%2fissue_template%2fbug.md" target="_blank">Report a Bug</a>
    • <a href="https://git.zakscode.com/ztimson/ai-utils/issues/new?template=.github%2fissue_template%2fenhancement.md" target="_blank">Request a Feature</a>
  </div>

  ---
</div>

## Table of Contents
- [@ztimson/ai-utils](#top)
	- [About](#about)
		- [Features](#features)
		- [Built With](#built-with)
	- [Setup](#setup)
		- [Production](#production)
		- [Development](#development)
	- [Documentation](https://ai-utils.docs.zakscode.com/)
	- [License](#license)

## About

A TypeScript library that provides a unified interface for working with multiple AI providers, making it easy to integrate various AI capabilities into your applications.

### Features

- **Multi-Provider LLM Support**: Seamlessly work with OpenAI, Anthropic (Claude), and Self-hosted (Ollama) models
- **Audio Speech Recognition (ASR)**: Convert audio to text using Whisper models
- **Optical Character Recognition (OCR)**: Extract text from images using Tesseract
- **Semantic Similarity**: Compare text similarity using tensor-based cosine similarity
- **Provider Abstraction**: Switch between AI providers without changing your code

### Built With
[![Anthropic](https://img.shields.io/badge/Anthropic-de7356?style=for-the-badge&logo=anthropic&logoColor=white)](https://anthropic.com/)
[![llama](https://img.shields.io/badge/llama.cpp-fff?style=for-the-badge&logo=ollama&logoColor=black)](https://github.com/ggml-org/llama.cpp)
[![OpenAI](https://img.shields.io/badge/OpenAI-000?style=for-the-badge&logo=openai-gym&logoColor=white)](https://openai.com/)
[![Pyannote](https://img.shields.io/badge/Pyannote-458864?style=for-the-badge&logo=python&logoColor=white)](https://github.com/pyannote)
[![TensorFlow](https://img.shields.io/badge/TensorFlow-fff?style=for-the-badge&logo=tensorflow&logoColor=ff6f00)](https://tensorflow.org/)
[![Tesseract](https://img.shields.io/badge/Tesseract-B874B2?style=for-the-badge&logo=hack-the-box&logoColor=white)](https://tesseract-ocr.github.io/)
[![Transformers.js](https://img.shields.io/badge/Transformers.js-000?style=for-the-badge&logo=hugging-face&logoColor=yellow)](https://huggingface.co/docs/transformers.js/en/index)
[![TypeScript](https://img.shields.io/badge/TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white)](https://typescriptlang.org/)
[![Whisper](https://img.shields.io/badge/Whisper.cpp-000?style=for-the-badge&logo=openai-gym&logoColor=white)](https://github.com/ggerganov/whisper.cpp)

## Setup

<details>
<summary>
  <h3 id="production" style="display: inline">
    Production
  </h3>
</summary>

#### Prerequisites
- [Node.js](https://nodejs.org/en/download)

#### Instructions
1. Install the package: `npm i @ztimson/ai-utils`
2. For speaker diarization: `pip install pyannote.audio`

</details>

<details>
<summary>
  <h3 id="development" style="display: inline">
    Development
  </h3>
</summary>

#### Prerequisites
- [Node.js](https://nodejs.org/en/download)
- _[Whisper.cpp](https://github.com/ggml-org/whisper.cpp/releases/tag) (ASR)_
- _[Pyannote](https://github.com/pyannote) (ASR Diarization):_ `pip install pyannote.audio`

#### Instructions
1. Install the dependencies: `npm i`
2. For speaker diarization: `pip install pyannote.audio`
3. Build library: `npm build`
4. Run unit tests: `npm test`

</details>

## Documentation

### Setup
```javascript
const ai = new Ai({
    path: '/ai-models',

    // Setup audio
    whisper: '/path/to/binary', // Required for ASR
    hfToken: '...', // Required for diarization
    asr: 'ggml-base.en.bin', // Override default ASR model

    // Setup LLM
    embedder: 'bge-small-en-v1.5', // Override default embedder model
    llm: {
        system: 'You are a helpful assistant.',
        compress: {max: 90_000, min: 50_000}, // Compress chat history to min tokens when max is reached
        temperature: 0.8,
        max_tokens: 100_000,
        memoryModel: 'gpt-4o', // Cheap model for managing memories in background, defaults to current model
        models: {
            'claude-3-5-sonnet': {proto: 'anthropic', token: process.env.ANTHROPIC_TOKEN},
            'gpt-4o':            {proto: 'openai',    token: process.env.OPENAI_TOKEN},
            'llama3':            {proto: 'ollama',    host: 'http://localhost:11434'},
        },
        mcp: [
            {name: 'files', url: 'https://mcp.example.com', token: process.env.MCP_TOKEN}
        ],
        skills: [
            {name: 'Tone of voice', description: 'Brand writing guidelines', content: '# Tone of Voice\n\nAlways be concise and friendly...'}
        ],
        tools: [{
            name: 'Marco?',
            description: 'Where is marco polo?',
            args: {
                shout: {type: 'boolean', default: 'Shout into the void?', description: false, required: false}
            },
            fn: (args: any, stream: LLMRequest['stream'], ai: Ai) => {
                const {shout} = args;
                return shout ? 'Polo!' : 'Polo';
            }
        }],
    },

    // Setup Vision
    ocr: 'eng' // Override default OCR model
});

```

### Audio

```javascript
// Crate audio transcript
const text = await ai.audio.asr('./path/to/audio.mp3');
console.log(text);

// Break transcript into speakers
const text = await ai.audio.asr('./path/to/audio.mp3', {diarization: true});
console.log(text);

// Break transcript into named speakers
const text = await ai.audio.asr('./path/to/audio.mp3', {diarization: 'llm'});
console.log(text);
```

### Language

```javascript
const history = [], memory = [];

// Wait for entire response
const text = await ai.language.ask('My favorite color is blue, whats yours?', {history, memory});
console.log(text);

// Stream response
const chunks = '';
await ai.language.ask('Write me a poem', {
	history, memory,
	stream: chunk => chunks += chunk,
});
console.log(chunks);

// Manually compile history into memories at end of conversation
// Happens automatically when coverstaions are compressed
await ai.language.updateMemory(history, memory);

// Summarize text
const summary = await ai.language.summarize(longText, 200);

// Code response (no conversation or extra BS)
const code = await ai.language.code('Write a fibonacci function');

// Structured JSON response
const data = await ai.language.json('Extract the name and age', `{
    "name": "string",
    "age": "number"
}`, {system: 'Extract from user input'});
```

#### Premade LLM Tools:
- `cli`: Run a shell command, returns its output
- `get_datetime`: Returns local date/time
- `get_datetime_utc`: Returns current UTC date/time
- `exec`: Execute code in cli, node, or python
- `fetch`: Make HTTP requests (GET/POST/PUT/DELETE)
- `exec_javascript`: Execute CommonJS JavaScript
- `exec_python`: Execute Python via python -c
- `read_webpage`: Scrape & clean content from a URL, handles HTML, JSON, CSV, media, PDFs etc.
- `web_search`: Anonymous DuckDuckGo search, returns a list of URLs
- `wikipedia_lookup`: Fetch a Wikipedia article (intro or full)
- `wikipedia_search`: Search Wikipedia and return matching articles
- `get_weather`: Fetch current weather + forecast for a location (just built!)

### Vision

```javascript
// Extract text from image
const text = await ai.vision.ocr('./path/to/image.png');
console.log(text);
```

## License

Copyright © 2023 Zakary Timson | Available under MIT Licensing

See the [license](_media/LICENSE) for more information.