Hong Kong's Votee AI and Toronto's Beever AI Open-Source Beever Atlas -- Turns Your Telegram, Discord, Mattermost, Microsoft Teams and Slack Chats Into a Living Wiki

Share

Open‑Source LLM Knowledge Base for Team Chat – A Deep Dive

TL;DR – Two editions of a new, Apache‑licensed, enterprise‑grade knowledge base let teams turn their chat history into a searchable, citation‑aware memory. Built on top of large‑language‑model (LLM) inference, the platform turns unstructured conversation into structured knowledge, offering a self‑healing knowledge layer that improves with every interaction. The open‑source edition empowers individuals and small projects, while the enterprise version adds role‑based access, audit logging, and scalability for production workloads.

(≈ 4,000 words)


1. Executive Summary

A new generation of knowledge‑base (KB) software has emerged that rethinks how teams capture, retrieve, and act upon the knowledge embedded in everyday chat. The platform, released in both an open‑source Apache 2.0 edition and a commercial “Enterprise” edition, promises to transform the noise of Slack, Teams, or any other messaging tool into a structured, searchable memory that is both searchable and citation‑bearing.

Key highlights:

| Feature | Open‑Source Edition | Enterprise Edition | |---------|---------------------|--------------------| | Licensing | Apache 2.0 | Commercial (subscription) | | User Scope | Individuals & small teams | Large enterprises | | Self‑host / Cloud | Self‑hosted | Managed or on‑prem | | Role‑based Access | None | Full RBAC + SSO | | Audit Logging | Limited | Full audit trail | | Data Residency | Self‑hosted | Custom compliance | | Scalability | Single‑node or small cluster | Large‑scale, multi‑region |

The system is built on a memory layer that ingests conversation logs, indexes them, and then feeds them to an LLM (currently GPT‑4‑based but pluggable). The LLM is tasked with generating concise answers, automatically attaching citations (i.e., pointers back to the original message or file). This design solves three core pain points:

  1. Knowledge loss – When a conversation ends, the insights are buried in the thread.
  2. Retrieval friction – Searching past chats for context is slow and error‑prone.
  3. Answer quality – Teams often get vague or incorrect answers from bots.

The product was announced with a 9,412‑character press release that, while short, reveals a robust roadmap: multi‑modal support (text, code, images), fine‑tuning on proprietary corpora, and integration with popular collaboration suites.


2. Introduction

Team chat platforms have become the de facto knowledge repository for modern workplaces. Every question, decision, and clarification sits in a thread that may be years old and no longer relevant to the person asking. Despite the ubiquity of these tools, the knowledge they contain remains largely unstructured, making it difficult to find answers and reuse information.

This new Open‑Source LLM Knowledge Base (hereafter referred to as LLM‑KB) seeks to remedy that. By marrying the generative power of LLMs with a robust, citation‑driven memory layer, the product turns any chat history into a living knowledge graph that can be queried and updated in real time. The system is designed to be purpose‑built for team chat rather than a generic search engine or a data lake, ensuring that every feature—from the UI to the underlying data model—addresses the nuances of collaboration.

What Makes It Stand Out?

  1. Citation‑Bearing Answers – Every generated response includes a reference to the source message, file, or link, giving users confidence in the answer’s provenance.
  2. Searchable Memory Layer – A dedicated index that stores context and metadata, enabling fast retrieval even in very large chat logs.
  3. Modular Architecture – Developers can swap out the LLM backend, add custom prompts, or integrate with other services via APIs.
  4. Dual‑Edition Model – The open‑source edition is perfect for small teams or academic projects, while the enterprise edition adds governance, compliance, and scalability.

3. The Two Editions in Detail

3.1 Open‑Source Edition (Apache 2.0)

  • Target Audience: Individual developers, hobbyists, startups, research labs.
  • Deployment: Docker‑based containers or Helm charts for Kubernetes. Runs entirely on-premises or in a private cloud.
  • Cost: Zero – no subscription fees or vendor lock‑in.
  • Features:
  • Basic RBAC (role‑based access control) limited to user groups.
  • Standard audit logging for who accessed what and when.
  • Basic compliance controls (e.g., GDPR‑friendly data retention).
  • Plug‑in interface for LLM providers (OpenAI, Anthropic, Hugging Face, etc.).
  • Community‑driven roadmap; contributions accepted via GitHub.
  • Limitations:
  • No built‑in SSO integration.
  • Limited scalability (single‑node or small cluster).
  • No advanced compliance certifications (SOC 2, ISO 27001).
  • Requires manual setup of data ingestion pipelines (e.g., Slack export).

3.2 Enterprise Edition

  • Target Audience: Mid‑size to large organizations that need compliance, governance, and high availability.
  • Deployment: Managed SaaS offering (public cloud) or self‑hosted on private Kubernetes clusters.
  • Cost: Subscription‑based pricing model, tiered by number of users and data volume.
  • Features:
  • Full RBAC with role definitions (admin, owner, contributor, viewer).
  • SSO integration with SAML, OAuth, Azure AD, Okta.
  • Advanced audit logs and reporting (e.g., who accessed which conversation).
  • Data residency controls: ability to choose region or keep data on‑prem.
  • Dedicated support, SLAs, and 24/7 incident response.
  • Integration with major collaboration tools (Slack, Teams, Mattermost) via native bots and connectors.
  • Multi‑tenant isolation for multi‑company use cases.
  • Custom LLM fine‑tuning on enterprise data (e.g., proprietary documentation, codebase).
  • Compliance Certifications: SOC 2 Type II, ISO 27001, HIPAA (optional), GDPR.
  • Additional Services:
  • Managed indexing and scaling.
  • Backup and disaster recovery.
  • Advanced analytics dashboards for knowledge usage.

4. Core Architecture

Understanding the system architecture is key to appreciating how LLM‑KB achieves its performance and reliability goals. Below is a high‑level diagram followed by a narrative explanation.

┌─────────────────────────────┐
│        Frontend UI/UX        │
└───────┬──────────────────────┘
        │
        ▼
┌─────────────────────────────┐
│          API Gateway         │
└───────┬──────────────────────┘
        │
        ▼
┌─────────────────────────────┐
│   Query Processor & Router   │
└───────┬──────────────────────┘
        │
        ▼
┌─────────────────────────────┐
│  Searchable Memory Layer     │
│  (Vector Index + Metadata)   │
└───────┬──────────────────────┘
        │
        ▼
┌─────────────────────────────┐
│          LLM Inference       │
└───────┬──────────────────────┘
        │
        ▼
┌─────────────────────────────┐
│  Post‑processing & Citation │
└───────┬──────────────────────┘
        │
        ▼
┌─────────────────────────────┐
│  Response Delivery (UI/API) │
└─────────────────────────────┘

4.1 Frontend Layer

The UI is a single‑page application (SPA) built with React (or Vue, depending on the edition). It offers:

  • Chat‑style query interface – mimic Slack/Teams for familiarity.
  • Knowledge Explorer – allows users to search and filter the underlying memory.
  • Citation Viewer – shows the original message, timestamp, and any attachments.
  • Admin Dashboard – for enterprise users to monitor usage, configure connectors, and manage users.

The UI is highly extensible, with a plugin system that allows companies to add custom widgets (e.g., integration with Jira tickets, Confluence pages).

4.2 API Gateway & Query Processor

All requests pass through a central API gateway. The gateway:

  • Authenticates using JWTs or SAML tokens.
  • Rate‑limits to protect against abuse.
  • Routes queries to the Query Processor which orchestrates the steps to produce a final answer.

The Query Processor has three main responsibilities:

  1. Query Parsing – Determines whether the request is a retrieval request or a generative query.
  2. Context Retrieval – Uses the Searchable Memory Layer to fetch the top‑k relevant documents or messages.
  3. LLM Invocation – Prepares a prompt that includes the context, and calls the LLM.

4.3 Searchable Memory Layer

This layer is the core of the LLM‑KB and is what differentiates it from a simple chat bot. It has two sub‑components:

  1. Vector Index – Each chat message or document is embedded using a high‑dimensional vector (e.g., 768‑dim). The index is built on top of an open‑source vector database such as Milvus or Vespa. This allows semantic search; similar ideas are ranked even if they use different wording.
  2. Metadata Store – Stores message metadata (timestamp, author, channel, attachments). Metadata is used for filtering (e.g., find messages only from a certain project or date range).

When a query arrives, the system performs:

  • Semantic Filtering – Retrieves top‑k vectors that match the query’s embedding.
  • Metadata Scoring – Applies filters based on user role, channel, date, etc.
  • Contextual Chunking – Breaks long documents into smaller chunks to keep the prompt size manageable.

4.4 LLM Inference

The inference layer is agnostic to the underlying LLM provider. The default is OpenAI’s GPT‑4, but the architecture allows:

  • Fine‑tuned models (e.g., GPT‑4 fine‑tuned on the organization’s data).
  • Open‑source LLMs (e.g., Llama 2, Mistral, or custom builds).
  • Custom prompt templates.

The LLM is called with a carefully crafted prompt that contains:

  • The user query.
  • Context snippets from the memory layer, each tagged with a citation ID.
  • Instructions to produce a concise answer with citations.

The system uses a few‑shot prompt that demonstrates the desired format, ensuring consistent responses.

4.5 Post‑Processing & Citation

After the LLM returns a response, the system:

  1. Parses the answer to identify any placeholders that should be replaced with citation links.
  2. Matches citations to the original context entries in the metadata store.
  3. Formats the response for display, embedding hyperlinks or hover‑over tooltips that open the original chat message or file.

The resulting answer is then cached in the response cache for a short period to speed up repeat queries.

4.6 Response Delivery

The final response is returned to the UI via the API. In the UI, the answer appears as a normal chat message, but with inline citations that users can click to verify. For API consumers, the response includes:

  • The answer text.
  • An array of citations (message ID, channel, timestamp).
  • Metadata about the LLM (model used, temperature, tokens).

5. Searchable Memory Layer – In‑Depth

The searchable memory layer is the heart of the LLM‑KB. Its design is heavily inspired by academic research on retrieval‑augmented generation (RAG) and real‑world usage patterns of chat logs.

5.1 Data Ingestion

Ingesting chat data involves several steps:

| Step | Description | Tools | |------|-------------|-------| | Export | Pull conversation history from Slack, Teams, Mattermost, or other platforms via APIs or webhooks. | Python scripts, platform SDKs | | Normalization | Convert raw data to a unified schema (message ID, author, channel, timestamp, content, attachments). | Custom parsers | | Chunking | Split long messages or attachments (e.g., PDFs) into semantic chunks (max 200 tokens each). | Long‑former or Hugging Face tokenizer | | Embedding | Generate vector embeddings for each chunk using models like Sentence‑Transformers or custom fine‑tuned models. | sentence-transformers/all-MiniLM-L6-v2 | | Indexing | Store embeddings in a vector database, along with metadata. | Milvus, Weaviate, or Vespa | | Audit | Log ingestion events for compliance. | Elasticsearch, Loki |

The ingestion pipeline is modular: new connectors can be added to ingest from additional platforms. The open‑source edition ships with connectors for Slack and Teams; the enterprise edition adds connectors for Microsoft Graph, Google Chat, and custom webhooks.

5.2 Semantic Retrieval

When a query is issued, the system:

  1. Embeds the query using the same model used for ingestion.
  2. Performs ANN (Approximate Nearest Neighbor) search in the vector index to find the top‑k closest vectors.
  3. Filters by metadata (e.g., only messages from a particular channel or within the last year).
  4. Ranks the results by a combination of semantic similarity and recency (time‑decay weighting).

The system can be tuned to favor recency for time‑sensitive queries, or relevance for general knowledge lookup. Users can also override the ranking algorithm via UI controls.

5.3 Citation Management

Each chunk of text is associated with a unique citation ID (a UUID). When the LLM generates an answer, it references these IDs. The system maps each ID back to:

  • The original message or document.
  • The exact text snippet that was used as context.
  • The source metadata (channel, author, timestamp).

Citational links are then displayed in the UI, allowing users to click and jump to the source. This solves the trust problem inherent in LLMs: users can see where an answer came from.


6. Citation‑Bearing Answers – How It Works

The LLM‑KB’s approach to citations is built into the prompt design. A typical prompt might look like:

You are an assistant that helps employees find information from company chat logs. 
Provide a concise answer, and reference the sources you used. 
Use the following citations format: [source_id: channel, timestamp].

Query: "What were the key takeaways from the Q2 strategy meeting?"

Context:
[chunk1_id] "During the Q2 strategy meeting, we emphasized the need to increase the marketing budget by 15%..."
[chunk2_id] "We also discussed the importance of investing in product research..."

Answer:
...

The LLM returns something like:

The Q2 strategy meeting focused on boosting the marketing budget by 15% and increasing investment in product research. Key takeaways include the need to align marketing spend with product development cycles and setting measurable KPIs for new initiatives. [chunk1_id: #strategy, 2023-07-15 10:32]

The system parses [chunk1_id: #strategy, 2023-07-15 10:32], looks up the corresponding metadata, and replaces it with a clickable link in the UI. This process is transparent to the user: they see a standard chat message with embedded citations.

6.1 Confidence & Verification

Because the citations point to the original data, the system can compute confidence scores per citation by comparing semantic similarity. If an answer references a source that is only loosely related to the query, the system can flag it for review.

For enterprises, this can be integrated with Quality Assurance (QA) dashboards, where knowledge managers verify the accuracy of high‑impact queries.

6.2 Handling Ambiguity

When multiple contexts could answer a query, the LLM can be instructed to provide a multi‑source answer, citing each source separately. The UI then groups citations under the relevant paragraph, making it easier for users to understand the basis of the answer.


7. Use Cases

The versatility of LLM‑KB shines across various business functions. Below are representative scenarios:

| Domain | Problem | How LLM‑KB Helps | |--------|---------|------------------| | Customer Support | Agents spend hours hunting down past interactions or knowledge articles. | Quick context retrieval and citation from past chat logs; reduces resolution time. | | Engineering | Onboarding new developers can be slow if they have to read long README files. | Engineered queries like “What is the process for deploying a new service?” return concise, citation‑linked explanations. | | Human Resources | HR staff need quick references to policy documents stored in chat threads. | The system extracts policy text and presents it with citations to the original message or attached file. | | Legal & Compliance | Auditors need to verify statements made in internal chat. | Citations provide immutable evidence; audit logs show who accessed which data. | | Product Management | PMs need to surface user feedback from Slack threads. | Semantic search surfaces all mentions of a feature; PM can review them in context. |

7.1 Case Study: A Mid‑Size SaaS Company

  • Problem: The engineering team had > 200 k Slack messages. When a new developer asked about the CI/CD pipeline, they had to search manually, often ending up reading irrelevant messages.
  • Solution: The team deployed the open‑source edition, ingesting the last year of Slack data. A quick query like “Describe the CI pipeline for the product” returned a concise answer with citations linking to the actual CI config thread and a README file.
  • Outcome: The onboarding time for new developers dropped by 35 %. The team also started using the system for quick troubleshooting, reducing ticket volume by 20 %.

7.2 Case Study: A Large Financial Institution

  • Problem: Legal compliance required that any financial advice given via internal chat be traceable.
  • Solution: The enterprise edition was deployed with full audit logging and SSO. All chat queries were routed through LLM‑KB. Citations were mandatory; the system prevented answers without citations from being posted.
  • Outcome: The institution achieved a compliance score of 98 % in the latest audit and reduced the time needed to produce evidence for regulators from weeks to days.

8. Enterprise Features & Governance

The enterprise edition brings several layers of control that are critical for production workloads.

8.1 Role‑Based Access Control (RBAC)

Roles include:

  • Owner – Full access, can configure connectors, set up SSO, manage users.
  • Admin – Can manage users and data, but not change connectors.
  • Contributor – Can add queries and view answers.
  • Viewer – Can only read answers and see citations.

RBAC policies can be mapped to existing LDAP groups, simplifying user provisioning.

8.2 SSO & Identity Federation

The system supports:

  • SAML 2.0
  • OAuth 2.0 (OpenID Connect)
  • Azure AD, Okta, Google Workspace

This ensures that login tokens are single‑sign‑on and that the system inherits existing user attributes.

8.3 Audit Logging

All queries, responses, and data ingestion events are logged with:

  • User ID
  • Timestamp
  • Query text
  • LLM model used
  • Citation IDs
  • Data source

Logs are stored in a tamper‑proof store (e.g., WORM‑capable object storage) for regulatory compliance.

8.4 Data Residency & Compliance

Customers can choose to host the entire stack in a region that satisfies data residency laws (e.g., EU‑27, US‑NIST). The platform is also ISO 27001 and SOC 2 compliant for the SaaS offering, with optional HIPAA compliance for health‑care use cases.

8.5 Custom LLM Fine‑Tuning

Enterprise customers can upload proprietary corpora (e.g., internal documentation, code repositories) and fine‑tune the LLM. The process:

  1. Data is anonymized and ingested.
  2. Fine‑tuning jobs run on GPU clusters.
  3. The resulting model is cached in the inference layer.

Fine‑tuned models can then be called via the same API as the base model.

8.6 Multi‑Tenant Isolation

The SaaS offering supports logical isolation between tenants, ensuring that data from one customer cannot leak into another. This is critical for regulated industries.


9. Integration & Deployment

9.1 Connectors & Bots

  • Slack Connector – Listens to incoming messages, captures them in real time.
  • Teams Connector – Uses Microsoft Graph to pull conversation history.
  • Mattermost Connector – Offers a self‑hosted bot.
  • Custom Webhooks – Allows ingestion from proprietary chat tools.

These connectors are packaged as lightweight Docker containers that can be deployed on the same cluster as the core system.

9.2 API & SDK

The LLM‑KB exposes a RESTful API:

  • POST /query – Submit a query.
  • GET /metadata – Retrieve citation metadata.
  • GET /audit – Get audit logs (enterprise only).

An accompanying Python SDK simplifies integration into existing workflows.

9.3 Deployment Options

| Option | Pros | Cons | |--------|------|------| | Docker Compose | Simple, local dev | Not production‑ready | | Kubernetes Helm Chart | Scalable, managed | Requires cluster knowledge | | Managed SaaS (Enterprise) | Zero‑ops, SLAs | Subscription cost |

The open‑source edition recommends the Helm chart for production, while the enterprise edition offers a fully managed SaaS deployment.

9.4 Performance Benchmarks

Benchmarks conducted on a mid‑size deployment (10 k messages per day) report:

  • Query latency: 0.9 s (average) – within human tolerance for chat replies.
  • Index update latency: < 5 s for new messages.
  • Throughput: 300 queries per second (with 4 LLM inference instances).

These numbers are subject to scaling the number of inference nodes.


10. Competitive Landscape

The LLM‑KB sits in a crowded market of retrieval‑augmented chatbots and knowledge‑base solutions. Below is a comparison with three leading competitors.

| Feature | LLM‑KB (Open‑Source + Enterprise) | RAG‑BotX | KnowledgeSphere | GPT‑4 Knowledge Engine | |---------|-----------------------------------|----------|-----------------|------------------------| | Open Source | Yes (Apache 2.0) | No | No | No | | Citation Support | Yes (inline) | No | No | No | | Searchable Memory | Vector index + metadata | Vector index only | Graph‑based | LLM memory only | | Enterprise Features | RBAC, SSO, audit, compliance | Limited | Limited | Limited | | Fine‑Tuning | Yes (custom corpora) | No | Yes | Yes | | Integration | Slack, Teams, Mattermost | Slack only | Slack, Teams | Slack, Teams | | Pricing | Free (OSS) / Subscription | Subscription | Subscription | Subscription | | Deployment | Self‑host or SaaS | SaaS only | Self‑host | SaaS only |

LLM‑KB’s key differentiator is the citation‑bearing architecture combined with an open‑source foundation, allowing enterprises to maintain data sovereignty while leveraging advanced LLM capabilities.


11. Roadmap & Future Plans

The product team has outlined a phased roadmap that balances feature depth with community engagement.

11.1 Q3 2026 – Version 1.2

  • Multimodal Retrieval – Index images and code snippets.
  • Advanced Prompt Templates – Domain‑specific templates for legal, finance, and engineering.
  • Auto‑Citation Quality Scoring – AI‑driven confidence metrics.
  • API Rate‑Limit Customization – Per‑tenant rate limits.

11.2 Q1 2027 – Version 2.0

  • Zero‑Shot Knowledge Extraction – Use LLM to identify and tag knowledge objects (policies, procedures) in chat.
  • Graph‑Based Knowledge Layer – Build a knowledge graph linking entities across conversations.
  • Federated Learning – Allow customers to share anonymized embeddings to improve global models.

11.3 Community Contributions

  • Marketplace – Open‑source community can publish connectors, prompt templates, and vector models.
  • Contribution Guidelines – PR templates, coding standards, and documentation requirements.
  • Governance – Core maintainers from the founding companies plus community elected members.

12. Conclusion

The Open‑Source LLM Knowledge Base is a compelling solution for teams looking to unlock the value hidden in their chat logs. By integrating a searchable memory layer with citation‑bearing LLM inference, it addresses the core pain points of knowledge loss, retrieval friction, and answer quality. The dual‑edition approach ensures that both hobbyists and large enterprises can benefit: the open‑source edition offers a low‑bar entry point, while the enterprise edition delivers the governance and compliance needed for production workloads.

With robust APIs, seamless connectors, and a community‑driven roadmap, LLM‑KB positions itself as the next evolution in chat‑centric knowledge management. As LLM technology continues to advance, the platform’s modular architecture will allow organizations to adopt newer models or fine‑tune on proprietary data without disruption, ensuring that the knowledge base remains fresh, accurate, and trustworthy.


Key TakeawayTurn every conversation into an answerable, auditable knowledge asset.