Looking for the Python version? Check out Saiku.py.
- About
- Features
- Prerequisites
- 1. Using Saiku in Your Own Projects
- 2. Using the Project Itself
- 3. Global Installation (Not Recommended Yet)
- Demo
- Setting Up Environment Variables
- Available Commands
- Use Cases
- Use Case 1: Transcribe Audio to Text
- Use Case 2: Extract Text from an Image
- Use Case 3: Summarize a Long Article
- Use Case 4: HTML to PDF Conversion
- Use Case 5: Take a Screenshot of a Webpage
- Use Case 6: Text to Speech
- Use Case 7: Create a Simple Chart
- Use Case 8: Parse PDF Content
- Use Case 9: Perform a Database Query
- Use Case 10: File Actions (Read/Write)
- Future Features
- Contributing
- Support Saiku
- Feedback and Issues
- API Rate Limits/Cost
- Note
- License
This project aims to create a robust, intelligent AI Agent capable of automating various tasks. Our agent is designed following the PEAS (Performance measure, Environment, Actuators, Sensors) framework to ensure it's robust, scalable, and efficient.
"Saiku" (細工) in Japanese refers to detailed or delicate work, symbolizing the intricate and intelligent workings of our AI agent.
- S: Smart
- A: Artificial
- I: Intelligent
- K: Knowledgeable
- U: Unmatched
We chose a Japanese name to symbolize precision, innovation, and advanced technology, attributes highly respected in Japanese culture. Even though we are based in Tunisia, we believe in global collaboration and the universal appeal and understanding of technology.
PEAS stands for Performance measure, Environment, Actuators, and Sensors. It's a framework used to describe the various components of an intelligent agent:
- Performance Measure: How well is the agent doing in its environment
- Environment: Where the agent operates
- Actuators: What actions the agent can take
- Sensors: How the agent perceives its environment
- Modular Design
- OpenAI GPT-4 Integration
- Extensible and Customizable
- Node.js installed
- OpenAI API key
- Google Cloud SDK installed and configured with a project:
- Install Google Cloud SDK
- Authenticate with Google Cloud:
gcloud auth login
- Set your project ID:
gcloud config set project <your-project-id>
- Enable the Google Vision API for your project:
- Visit the Google Cloud Console
- Navigate to the 'APIs & Services > Dashboard'
- Click on '+ ENABLE APIS AND SERVICES', search for 'Vision API' and enable it.
- Download the service account JSON file from your GCP project page
Saiku is a versatile tool that enhances projects with advanced functionalities. This guide will help you integrate Saiku into your applications, covering the installation, configuration, and usage.
- Step: Run
npm install saiku
in your project directory to add Saiku as a dependency.
- Code:
import Agent from 'saiku';
- Example:
async function main(opts) { const agent = new Agent(opts); // Initialize the agent // Additional initialization code }
-
AgentOptions:
- actionsPath (
string
|optional
): Path to custom action scripts. - systemMessage (
string
|optional
): Default system message or instructions. - allowCodeExecution (
boolean
|optional
): Flag to enable/disable code execution. - interactive (
boolean
|string
|optional
): Interactive mode setting. - speech (
'input' | 'output' | 'both' | 'none'
): Configures speech functionality. - llm (
'openai' | 'vertexai' | 'ollama' | 'huggingface'
): Specifies the language learning model. Default is'openai'
. - [key: string]: any (
optional
): Allows additional custom properties for unique project requirements.
- actionsPath (
-
Example Configuration:
let opts = { actionsPath: "../actions", systemMessage: "Welcome to Saiku", allowCodeExecution: true, interactive: true, speech: "both", llm: "openai", // Custom options };
-
Process:
- Listening for User Input: Implement input mechanisms for user interaction.
- Processing Queries: The agent processes and performs actions based on queries.
- Generating Responses: Generates responses or results from actions.
- Speaking Output: For speech-enabled applications, configure spoken output.
-
Example Interaction:
do { let userQuery = await getUserInput(); // Get user input agent.messages.push({ role: "user", content: userQuery }); await agent.interact(); // Process and perform actions // Additional code } while (userQuery.toLowerCase() !== "quit");
-
Clone the Repository:
git clone https://github.com/nooqta/saiku.git
-
Navigate to Project Folder:
cd saiku
-
Install Dependencies:
npm install
-
Run the Project Locally:
Before starting Saiku locally, build the project using the following command:
npm run build
To start the agent:
npm start
For automated building during development, use:
npm run watch
This will automatically build the project whenever files are changed, helping streamline the development process.
Saiku is available globally but is still in early development. Local installation is recommended.
npm install -g saiku
For detailed documentation and API usage, refer to the upcoming Saiku documentation, which will provide comprehensive guidance for advanced uses.
This guide is designed to provide clarity and ease of use for integrating Saiku into various projects, catering to a wide range of developers.
Although Saiku is available as an npm package, we are still in the early stages of development, and drastic changes to the architecture will occur. We don't recommend installing it globally yet. However, if you still wish to do so:
npm install -g saiku
saiku-browser.mp4
Before running Saiku, configure the necessary environment variables. Copy the example environment file and then fill in the details.
cp .env.example .env
Edit the .env
file to include your specific information:
# OpenAI
OPENAI_API_KEY=
OPENAI_MODEL=gpt-3.5-turbo
# Eleven Labs
ELEVENLABS_API_KEY=
# Database related
DB_HOST=
DB_USER=
DB_PASSWORD=
# Email related
EMAIL_SERVICE=
EMAIL_USER=
DISPLAY_FROM_EMAIL=
EMAIL_PASS=
# User related
USER=
COMPANY=
COUNTRY=
CITY=
PHONE=
LATITUDE=
LONGITUDE=
# Twilio
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=
# Weather API
WEATHER_API_KEY=
# Stability AI
STABILITY_API_KEY=
# GITLAB
GITLAB_GRAPHQL_ENDPOINT=
GITLAB_PERSONAL_ACCESS_TOKEN=
GITLAB_USERNAME=
GITLAB_VERSION=
GITLAB_API_VERSION=
Use Saiku with various options to tailor its operation to your needs:
AI agent to help automate your tasks
Options:
-v, --version Output the current version.
-exec, --allowCodeExecution Execute the code without prompting the user.
-s, --speech <type> Receive voice input from the user and/or output responses as speech.
Possible values: input, output, both, none. Default is "none".
-role, --systemMessage The model system role message.
-m, --llm <model> Specify the language model to use.
Possible values: openai, vertexai, ollama, and huggingface. Default is "openai".
-h, --help Display help for command.
Commands:
action [options] Manage actions: create an new action using AI, list available actions and activate an action.
autopilot [options] AI agent to help automate your tasks on autopilot mode (in progress).
serve Chat with the Saiku agent in the browser.
To allow code execution without prompting:
saiku -exec
or
npm start -- -exec
To enable voice input and output:
saiku -s both
or
npm start -- --speech both
To specify a language model:
saiku -m huggingface
or
npm start -- --llm huggingface
To chat with Saiku in the browser
saiku serve
# or
npm start -- serve
To create a new action
saiku action create
# or
npm start -- action create
Prompt Example: "Please transcribe the audio from interview.mp3." Description: Saiku will use the speech_to_text function to transcribe the audio file interview.mp3 and provide the user with the text content.
Prompt Example: "Extract text from this photo image_of_document.jpg." Description: Saiku will use google_vision with the DOCUMENT_TEXT_DETECTION feature to analyze the image image_of_document.jpg and return any readable text found in the image.
Prompt Example: "Summarize the following article content for me: ...(article text)..." Description: Saiku utilizes the text_summarizer function to produce a concise summary of the provided article text.
Prompt Example: "Convert this HTML code to a PDF file and save it as report.pdf." Description: Saiku employs the html_to_pdf tool to transform the given HTML code into a PDF document and saves it with the filename report.pdf.
Prompt Example: "Take a full-page screenshot of the website at http://example.com and name the file screenshot.png." Description: Saiku uses the take_screenshot feature, set to capture the full page, to create an image file screenshot.png of the URL provided.
Prompt Example: "Please convert the following text to speech: Hello World!." Description: Saiku runs the text_to_speech function to convert the text "Hello World!" into an audio file and will play it if the user requests.
Prompt Example: "Make a pie chart with this data: { 'Data A': 30, 'Data B': 70 }." Description: Using the d3_chart_generation, Saiku will generate a pie chart image based on the data provided.
Prompt Example: "Extract the text from the PDF file named report.pdf." Description: Saiku will apply the parse_pdf function to read the PDF file report.pdf and extract its text content.
Prompt Example: "Perform a SQL query SELECT * FROM users on the local userDB database." Description: Saiku executes the database_query action, running the provided SQL query on the specified database.
Prompt Example: "Create a text file named notes.txt with the following content: Meeting notes...." Description: Saiku will utilize the file_action function to write the provided content into a new text file called notes.txt.
Incorporation of Diverse Models: While currently relying on OpenAI and its code interpreter, future versions of Saiku aim to incorporate various other AI and LLM models to enhance its capabilities and versatility- Web Compatible Version: Development of a web-compatible version of Saiku to ensure easy accessibility and integration into web-based platforms.
- Python Version: Creation of a Python version of Saiku to cater to Python developers and AI enthusiasts, allowing seamless integration into Python-centric projects.
- Configuration Management: Implementation of a robust configuration management system to ensure Saiku’s smooth and efficient operation in diverse environments.
- Enhanced Debugging and Logging: Improvement in debugging and logging capabilities for easier identification and resolution of issues, ensuring Saiku's robust performance.
- Comprehensive Tests: Development of comprehensive tests to continuously evaluate and ensure Saiku's functionality, reliability, and performance.
Voice Commands: Integration with technologies like Whisper for efficient and user-friendly voice command functionalities.Speaking Agent: Implementation of Text-to-Speech technologies like Elevenlabs, enabling Saiku to interact using voice, enhancing user experience.- Enhanced Memory Handling: Upgrades in memory handling for optimal and consistent performance.
- Document Summarization: Integration of document summarization features for effective handling of large textual data.
- Advanced Actions: Inclusion of computer vision and image interpretation capabilities, broadening the spectrum of tasks Saiku can adeptly handle.
- OpenAI Cost Tracking: Incorporating features to track and analyze the costs associated with OpenAI API usage, enabling better budget management and cost-efficiency.
- Budget Settings: Implementation of budget settings to allow users to set and manage spending limits on AI resources, ensuring cost-effective operation.
- Multi-Agent Systems: Exploration and integration of multi-agent systems to promote collaborative problem-solving and to enrich the PEAS framework within Saiku, potentially elevating the project's ability to handle complex, dynamic environments.
- PEAS Enhancement: Further refining the existing PEAS framework to accommodate a wider range of environments, actuator capabilities, and sensor inputs, aiming for a more versatile and adaptive AI agent.
We welcome contributions from the community. If you'd like to contribute, please follow these steps:
- Fork the repository
- Create your feature branch (
git checkout -b feature/YourFeature
) - Commit your changes (
git commit -m 'Add some feature'
) - Push to the branch (
git commit push origin feature/YourFeature
) - Create a new Pull Request
We are actively seeking sponsors and contributors. If you believe in the potential of Saiku, support the project in any way you can. Your support will help us make Saiku a reality.
We value your feedback. If you encounter any issues or have suggestions for improvements, please open an issue on our GitHub repository.
Please be aware of the rate limits and costs associated with the APIs used by Saiku. Each service provider may have different policies, and it's essential to stay informed to avoid unexpected charges.
Please note that we are in the experimental stage. The architecture and features are subject to significant changes.
This project is licensed under the MIT License - see the LICENSE.md file for details.