Earn 45,000 ($450.00)

due 4 months ago

Canceled

RAG/Internal LLM connected via Slack to Databricks

ankur38

Posted 4 months ago

Details

Applications

Discussion

Bounty Description

Project Overview
We need an internal AI assistant—accessed via Slack—that automatically queries Databricks for historical data. Non-technical teams (marketing, product, operations) can ask natural-language questions about user behaviors, cohorts, or individuals. For example:

“Show me the top 5 users by total time spent for yesterday.”
"which new users are likely to churn in the next 7 Days"
"What offer/promotion can i give to users to prevent churn over the next 7 days"

The system interprets such queries, fetches data, and replies in plain text.
Eventually, it should handle predictive questions (e.g., “Which user cohort is most likely to churn next month?”). Users give feedback (thumbs up/down), and every 6 hours the assistant refines its logic and LLM prompts, continually improving. The goal is a simple, seamless experience where no SQL or specialized data knowledge is required.

Technical Requirements
A. Architecture & Core Components

Slack Integration: A Slack App with a bot user that listens to messages in channels or via

, sending results back in Slack.

RAG Flow:
NLP/Query Understanding – Convert natural language to data lookups or ML tasks.
Data Retrieval – Query Databricks for user behavior data.
LLM Generation – Use a Large Language Model (e.g., OpenAI) to craft a natural-language response.

Self-Learning Feedback Loop: Slack reactions (thumbs up/down) are logged; every 6 hours, negative feedback prompts improvements in query interpretation, data retrieval logic, and LLM prompting.
B. Databricks Integration

Data Storage: Connect to various Databricks catalogs containing historical user/cohort data and predictive model outputs.

Queries: Generate appropriate SQL/ML queries that handle large datasets efficiently.

Security/Access: Use secure tokens/credentials. Only authorized Slack requests can trigger queries.

C. Data Types & Use Cases

User-Level Data: Deposits, logins, activity, etc., plus the ability to pinpoint an individual user’s stats on demand.
Cohort-Level Data: Segment users (VIPs, new signups, churned, etc.) to see aggregated metrics or predictions (e.g., “Which cohort is at highest churn risk?”).
Any Combination of Metrics: Combine filters, time frames, or advanced ML outputs for more complex analytics.

D. Response & Visualization

Natural Language Answers: The LLM generates easy-to-read summaries (e.g., “The top 5 depositing users contributed a total of $12,500 yesterday…”).
Optional Graphs/Charts: For trends or distributions, the system can generate simple visuals (posted as images or Slack files).

E. System Components & Workflow

Slack Bot: Receives messages via Slack Events API, sends acknowledgments like “Processing your request...”
Backend Orchestration (AWS Lambda or similar):
Parses user text, determines data or ML retrieval, queries Databricks, calls the LLM, and replies to Slack.
Databricks: Stores historical data, possibly ML models or integrated model outputs.
LLM Service: Summarizes data, returning domain-specific insights.
Feedback Storage & Retraining: Logs user reactions; every 6 hours, negative feedback triggers adjustments (e.g., synonyms, prompt tuning, data logic fixes).

What Success Looks Like

Business Impact: Non-technical staff quickly get data-driven answers (cohort trends, churn predictions, user stats) with minimal friction. The bot improves steadily via feedback.

Technical Performance: Slack queries resolve in a few seconds, handle concurrency securely, and properly protect sensitive data.
Self-Improvement: The 6-hour feedback loop reduces mistakes over time, refining both query logic and LLM responses.