PickSkill
← Back

memory_baidu_embedding_db

README.md
Rendered from GitHub raw
View raw ↗

Memory Baidu Embedding DB - Semantic Memory for Clawdbot

Vector-Based Memory Storage and Retrieval Using Baidu Embedding Technology

A semantic memory system for Clawdbot that uses Baidu's Embedding-V1 model to store and retrieve memories based on meaning rather than keywords. Designed as a secure, locally-stored replacement for traditional vector databases like LanceDB.

🚀 Features

Semantic Memory Search - Find memories based on meaning, not just keywords
Baidu Embedding Integration - Uses Baidu's powerful Embedding-V1 model
SQLite Persistence - Local, secure storage without external dependencies
Zero Data Leakage - All processing happens locally with your API credentials
Flexible Tagging System - Organize memories with custom tags and metadata
High Performance - Optimized vector similarity calculations
Easy Migration - Drop-in replacement for memory-lancedb systems ✅ Robust Error Handling - Comprehensive error handling with user-friendly messages

🔧 Why Choose This Over Traditional Memory Systems?

Traditional Keyword-Based Systems:

  • Only match exact words or phrases
  • Miss conceptually related information
  • Require precise search terms
  • Limited context understanding

Our Semantic Memory System:

  • Understands meaning and context
  • Finds conceptually related memories
  • Works with natural language queries
  • Learns semantic relationships

📋 Prerequisites

  • Clawdbot installation
  • Baidu Qianfan API credentials (API Key and Secret Key)
  • Python 3.8+

🛠️ Installation

Method 1: Manual Installation

# Navigate to your Clawdbot workspace
cd ~/clawd/skills  # or your workspace directory
 
# Clone or copy this skill
# (Assuming you have the skill files in place)
 
# The skill is ready to use with your existing Clawdbot setup

Method 2: From ClawdHub (Coming Soon)

# Once published to ClawdHub
clawdhub install memory-baidu-embedding-db

⚙️ Configuration

1. Get Baidu Qianfan API Credentials

  1. Sign up at Baidu Qianfan Console
  2. Create an API Key and Secret Key
  3. Ensure you have access to the Embedding-V1 model

2. Set Environment Variables

# Add to your ~/.bashrc or ~/.zshrc
export BAIDU_API_STRING='${BAIDU_API_STRING}'
export BAIDU_SECRET_KEY='${BAIDU_SECRET_KEY}'

Or set them directly before starting Clawdbot:

export BAIDU_API_STRING='${BAIDU_API_STRING}'
export BAIDU_SECRET_KEY='${BAIDU_SECRET_KEY}'

🚀 Usage

Basic Usage

from memory_baidu_embedding_db import MemoryBaiduEmbeddingDB
 
# Initialize the memory system
memory_db = MemoryBaiduEmbeddingDB()
 
# Add a memory
memory_db.add_memory(
    content="The user prefers concise responses and enjoys technical discussions",
    tags=["user-preference", "communication-style"],
    metadata={"importance": "high", "source": "conversation-2026-01-30"}
)
 
# Search for related memories using natural language
related_memories = memory_db.search_memories("What does the user prefer?", limit=3)
 
# Retrieve similar memories
similar_memories = memory_db.retrieve_similar_memories("User likes short answers", limit=5)

Advanced Usage

# Add multiple memories efficiently
examples = [
    {
        "content": "User's favorite programming languages are Python and JavaScript",
        "tags": ["tech-preference", "programming"],
        "metadata": {"confidence": 0.95}
    },
    {
        "content": "The user works as a software engineer with 5 years of experience",
        "tags": ["professional-info", "background"],
        "metadata": {"verified": True}
    }
]
 
for example in examples:
    memory_db.add_memory(example["content"], example["tags"], example["metadata"])
 
# Search with tag filtering
recent_python_memories = memory_db.search_memories(
    query="programming languages",
    tags=["tech-preference"],
    limit=5
)
 
# Get statistics about stored memories
stats = memory_db.get_statistics()
print(f"Total memories: {stats['total_memories']}")

Integration with Clawdbot

To integrate with your Clawdbot instance, modify your bot configuration to use this memory system:

// In your Clawdbot configuration
const { MemoryBaiduEmbeddingDB } = require('./skills/memory-baidu-embedding-db');
 
// Initialize the memory system
const memorySystem = new MemoryBaiduEmbeddingDB();
 
// Use in your message handlers
app.on('message', async (msg) => {
  // Store important information from the conversation
  if (isImportantInformation(msg.text)) {
    memorySystem.add_memory(msg.text, ['conversation'], {
      timestamp: new Date(),
      userId: msg.from
    });
  }
  
  // Retrieve relevant context before responding
  const relevantMemories = memorySystem.search_memories(msg.text, 5);
  const context = formatMemoriesForPrompt(relevantMemories);
  
  // Include context in your response generation
  const response = await generateResponse(msg.text, context);
  await msg.reply(response);
});

📊 Performance Metrics

  • Vector Dimension: 384 (Baidu Embedding-V1 output)
  • Storage: SQLite database (~1MB per 1000 memories)
  • Search Speed: ~50ms for 1000 memories (on typical hardware)
  • API Latency: Depends on Baidu API response time (typically <500ms)

🔐 Security Features

  • Local Storage: All memories stored in local SQLite database
  • Encrypted API Keys: Credentials stored securely in environment variables
  • No External Sharing: Memories never leave your system
  • Selective Access: Granular control over what gets stored

🔄 Migration Guide

From memory-lancedb to memory-baidu-embedding-db:

  1. Backup your existing memories (if applicable)
  2. Install this skill in your skills/ directory
  3. Configure your Baidu API credentials
  4. Initialize the new system: python3 memory_baidu_embedding_db.py
  5. Test search functionality
  6. Update your bot configuration to use the new memory system
  7. Verify data integrity and performance

Example Migration Script:

# migration_helper.py
import json
from memory_baidu_embedding_db import MemoryBaiduEmbeddingDB
 
def migrate_from_old_system():
    # Initialize new memory system
    new_memory = MemoryBaiduEmbeddingDB()
    
    # Load old memories (adjust this based on your old system)
    old_memories = load_old_memories()  # Implement this based on your old system
    
    migrated = 0
    for old_memory in old_memories:
        success = new_memory.add_memory(
            content=old_memory.get('content'),
            tags=old_memory.get('tags', []),
            metadata=old_memory.get('metadata', {})
        )
        if success:
            migrated += 1
    
    print(f"Migrated {migrated} memories successfully!")
 
if __name__ == "__main__":
    migrate_from_old_system()

🧪 Testing

Run the built-in tests to verify functionality:

cd /root/clawd/skills/memory-baidu-embedding-db
python3 memory_baidu_embedding_db.py

This will run a complete demonstration of all features including:

  • Database initialization
  • Memory addition
  • Semantic search
  • Similarity calculations
  • Statistics reporting

🛡️ Error Handling and Robustness

Our system includes comprehensive error handling to prevent crashes and provide helpful feedback:

Error Types Handled

  • API Credential Validation: Checks for missing or invalid environment variables
  • Input Validation: Validates content, tags, metadata types and formats
  • Database Operations: Handles connection failures, permission errors, and disk space issues
  • API Calls: Manages network timeouts and service unavailability
  • JSON Parsing: Safely handles malformed JSON data

User-Friendly Messages

All errors provide clear, actionable feedback:

  • Specific error descriptions
  • Root cause identification
  • Recommended solutions
  • Preventive measures

Example error message:

❌ 错误: 缺少必要的API凭据!
   请设置以下环境变量:
   export BAIDU_API_STRING='your_bce_v3_api_string'
   export BAIDU_SECRET_KEY='${BAIDU_SECRET_KEY}'
   您可以从 https://console.bce.baidu.com/qianfan/ 获取API凭据

Safe Defaults

  • Methods return appropriate default values when errors occur
  • No unexpected program termination
  • Graceful degradation of functionality

🤝 Contributing

We welcome contributions! Here are some ways you can help:

  • Report bugs and suggest features
  • Improve documentation
  • Add support for additional embedding models
  • Optimize performance for large memory sets
  • Create integration examples for different bot frameworks

Development Setup

# Fork and clone the repository
git clone https://github.com/your-username/memory-baidu-embedding-db.git
cd memory-baidu-embedding-db
 
# Install dependencies (if any Python packages are needed)
pip install -r requirements.txt  # if exists
 
# Run tests
python3 memory_baidu_embedding_db.py

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

If you encounter issues:

  1. Check that your Baidu API credentials are correct
  2. Verify that you have internet connectivity for API calls
  3. Ensure the SQLite database has proper write permissions
  4. Review the error messages for specific details

For additional support, please open an issue in the repository or contact the maintainers.

🙏 Acknowledgments

  • Thanks to Baidu for providing the Embedding-V1 model
  • Inspired by modern vector database implementations
  • Built for the Clawdbot ecosystem