Open AI emebdding, vector database, semantic tagging, website scraping PoC Project
$250-750 USD
Відкрито
Опублікований 15 minutes ago
•
Закінчується за 23 годин(-и)
$250-750 USD
Оплачується при отриманні
Critical requirements : Open AI, Embedding, Vector databases, Scraping
*** If you are interested you must be able to show me a previous related (Open AI embedding) project in your portolfio. Please send me a notification with a link to this project****
General Concept
We are building a data-driven content creation system for our agency / clients. The project involves extracting content from client websites, structuring it into JSON packets, tagging it with metadata via a user-friendly web interface, and using OpenAI's embedding with a vector database and GPT APIs to retrieve and generate new content.
This will be a new independent prorotype project to get the integrations and workflow set up and working correctly.
Full-Stack Developer (Preferred One-Dev Solution)
Responsibilities:
Develop and maintain a backend pipeline to:
- Scrape and preprocess data from client websites.
- Chunk content into JSON packets and prepare it for tagging.
- Integrate with Supabase for data storage.
Build a simple web app for manual metadata tagging:
- Design a clean UI for reviewing and tagging content.
- Integrate pre-defined tag options for consistency.
Set up a vector database for semantic search.
Integrate OpenAI’s text-embedding-ada-002 and GPT models to:
- Generate embeddings for tagged data.
- Retrieve and pass relevant data to GPT for content generation.
- Ensure scalability for multi-client support.
Requirements:
Essential experience with:
- APIs: OpenAI API, vector database APIs (Pinecone, Weaviate).
- Proficiency in web scraping tools (BeautifulSoup, Puppeteer).
- Familiarity with embedding and vector database concepts.
- Understanding of token limits and prompt engineering.
Other requirements:
- Backend: Python (FastAPI), Node.js (Express).
- Frontend: React, Vue, or Next.js.
- Databases: Supabase, PostgreSQL.
===============================