Skip to content

Commit

Permalink
Start Kula changes, and new endpoints and remove the old ones (#275)
Browse files Browse the repository at this point in the history
* Start Kula changes, and new endpoints and remove the old ones

* Dh 5073/agents generating sql queries (#276)

* DH-5073/text-to-SQL agents

* DH-5073/removing the intermediate steps

* DH-5073/removing the intermediate steps

* DH-5073/update tests

* Create services and improve models

* DH-5080/refactor_engine_with_new_resources (#277)

* DH-5080/refactor_engine_with_new_resources

* DH-5080/update the tests

* DH-5080/update the testst

* DH-5080/updating the evaluatord

* Dh 5080/sql generations services (#278)

* DH-5080/sql_generations service

* DH-5080/reformat with black

* DH-5080/removing generator from env example

* DH-5082/nl_generations implementation (#279)

* DH-5080/removing db connection validation (#280)

* Create Prompt service (#281)

* DH-5086/endpoints for nl and sql generation (#282)

* DH-5085 Re-name golden-records as golden-sqls (#283)

* [DH-5068] Add metadata and created_at fields (#284)

* [DH-5087] Add Prompts GET endpoints (#285)

* DH-5089/fixing_the_new_agent (#286)

* DH-5089/fixing_the_new_agent

* DH-5089/the get endpoints

* DH-5089/refactor with black

* DH-5089/renaming golden records

* DH-5090/execute sql generation endpoint (#287)

* DH-5090/execute sql generation endpoint

* DH-5090/do not initiate the repo

* DH-5099/save sql_generation on initial then update (#288)

* [DH-5088] Add endpoints to update metadata (#289)

* DH-5103/nl-generation bug (#290)

* [DH-5100] Script to migrate old collections (#291)

* Replace PATCH endpoint for PUT (#292)

* Fix migrate script (#293)

* solving the issue with PUT sql_generations (#294)

* solving the issue with PUT sql_generations

* reformat

* DH-5104/update the fine-tuning metadata (#295)

* DH-5104/fixing the issue of finetuning (#297)

* Fix datatime responses (#298)

* Fix metadata in table-descriptions (#299)

* Dh 5104/metadata issue for endpoints (#300)

* DH-5104/fixing the golden sqls endpoint

* DH-5401/reformat

* Replace nl_answer by text (#301)

* DH-5110/fixing evalutor issue (#302)

* DH-5110/fixing evalutor issue

* DH-5110/fixing the 500 error for sql execution

* Improve resouces requests (#303)

* DH-5109/csv export endpoint (#304)

* DH-5094/fix the ARRAY issue (#309)

* [DH-5113] Return datetimes with explicit UTC (#311)

* Fix migration script (#306)

* Dh-5094/reverting_the_fixes (#310)

* Dh-5094/reverting_the_fixes

* Update the states of the scanner

* change the states of the scanner

* DH-5094/add exception handling

* DH-5094/add exception handling

* DH-5094/exception handling

* DH-3135/Nulltype handling plus script for scanner (#312)

* DH-5137/removing the markdown from queries (#313)

* DH-5147/rename the csv endpoint (#314)

* DH-5146/add get finetunings to engine (#316)

* DH-5149/add delete endpoint and make basellm and alias optional (#317)

* DH-5149/add delete endpoint and make basellm and alias optional

* DH-5149/update states

* DH-5158/update the finetuning statuses (#318)

* DH-5158/update the finetuning statuses

* finxing the openai states

* Add value

* DH-5153 Add golden sqls in batches (#319)

* DH-5167/update the finetuning statuses (#320)

* DH-5167/update the finetuning statuses

* DH-5167/refromat with black

* Dh 5168/raise exception for not availlable finetuning (#321)

* DH-5168/better exception handling for SQL generation

* DH-5168/reformat with black

* DH-5166/the parser to extract SQL queries (#322)

* [DH-5173] Fix table-descriptions PUT endpoint response (#323)

* DH-5177/fixing the empty sql queries (#325)

* DH-5175/fix_the_status_update_on_engine (#324)

* DH-5181/fixing the prompts of fientuning agent (#326)

* DH-5184/fix (#327)

* [DH-5187] Fix table-descriptions filter by table_name field (#328)

* DH-5184/fix the query parameters of finetuning (#329)

* DH-5184/fix the query parameters of finetuning

* DH-5184/reformat

* DH-5204/sql generation error (#330)

* DH-5205/chromadb script fix (#331)

* [DH-5212] Fix UTC timezone (#332)

* Fix datatime fields (#333)

* DH-5225/fix the sql injection (#336)

* Fix datetime timezone format for the responses (#337)

* DH-5232/changing the error message for no sql (#338)

* Rename migration file (#339)

* Dh 5228/docs for new endpoints (#340)

* DH-5228/add finetuning module

* DH-5228/add finetuning endpoints

* DH-5225/adding the docs for prompts

* DH-5228/add nl generations docs

* DH-5228/adding the docs for sql generations

* DH-5228/add sql generations resource

* [DH-5238] Fix migration script (#341)

* DH-5225/persist nl generations and return id (#343)

* [DH-5117] Sync-schema returns a Table descriptions list (#342)

* [DH-5239] Fix datetime format (#344)

* DH-5251/avoid question rephrasing (#346)

* DH-5253/fix the finetuning get endpoint (#347)

---------

Co-authored-by: Mohammadreza Pourreza <71866535+MohammadrezaPourreza@users.noreply.github.com>
  • Loading branch information
jcjc712 and MohammadrezaPourreza authored Jan 16, 2024
1 parent 0f95783 commit a226a9d
Show file tree
Hide file tree
Showing 88 changed files with 3,665 additions and 1,557 deletions.
Binary file removed .DS_Store
Binary file not shown.
3 changes: 1 addition & 2 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,13 @@ UPPER_LIMIT_QUERY_RETURN_ROWS = #The upper limit on number of rows returned from
ENCRYPT_KEY =


GOLDEN_RECORD_COLLECTION = 'my-golden-records'
GOLDEN_SQL_COLLECTION = 'my-golden-records'
#Pinecone info. These fields are required if the vector store used is Pinecone
PINECONE_API_KEY =
PINECONE_ENVIRONMENT =

# Module implementations to be used names for each required component. You can use the default ones or create your own
API_SERVER = "dataherald.api.fastapi.FastAPI"
SQL_GENERATOR = "dataherald.sql_generator.dataherald_sqlagent.DataheraldSQLAgent"
EVALUATOR = "dataherald.eval.simple_evaluator.SimpleEvaluator"
DB = "dataherald.db.mongo.MongoDB"
VECTOR_STORE = 'dataherald.vector_store.chroma.Chroma'
Expand Down
208 changes: 139 additions & 69 deletions dataherald/api/__init__.py
Original file line number Diff line number Diff line change
@@ -1,26 +1,39 @@
import io
from abc import ABC, abstractmethod
from typing import List

from fastapi import BackgroundTasks
from fastapi.responses import FileResponse

from dataherald.api.types import Query
from dataherald.api.types.query import Query
from dataherald.api.types.requests import (
NLGenerationRequest,
NLGenerationsSQLGenerationRequest,
PromptRequest,
PromptSQLGenerationNLGenerationRequest,
PromptSQLGenerationRequest,
SQLGenerationRequest,
UpdateMetadataRequest,
)
from dataherald.api.types.responses import (
DatabaseConnectionResponse,
GoldenSQLResponse,
InstructionResponse,
NLGenerationResponse,
PromptResponse,
SQLGenerationResponse,
TableDescriptionResponse,
)
from dataherald.config import Component
from dataherald.db_scanner.models.types import QueryHistory, TableDescription
from dataherald.sql_database.models.types import DatabaseConnection, SSHSettings
from dataherald.db_scanner.models.types import QueryHistory
from dataherald.sql_database.models.types import DatabaseConnection
from dataherald.types import (
CancelFineTuningRequest,
CreateResponseRequest,
DatabaseConnectionRequest,
Finetuning,
FineTuningRequest,
GoldenRecord,
GoldenRecordRequest,
Instruction,
GoldenSQL,
GoldenSQLRequest,
InstructionRequest,
Question,
QuestionRequest,
Response,
ScannerRequest,
TableDescriptionRequest,
UpdateInstruction,
Expand All @@ -36,43 +49,13 @@ def heartbeat(self) -> int:
@abstractmethod
def scan_db(
self, scanner_request: ScannerRequest, background_tasks: BackgroundTasks
) -> bool:
pass

@abstractmethod
def answer_question(
self,
run_evaluator: bool = True,
generate_csv: bool = False,
question_request: QuestionRequest = None,
) -> Response:
pass

@abstractmethod
def answer_question_with_timeout(
self,
run_evaluator: bool = True,
generate_csv: bool = False,
question_request: QuestionRequest = None,
) -> Response:
pass

@abstractmethod
def update_response(self, response_id: str) -> Response:
pass

@abstractmethod
def get_questions(self, db_connection_id: str | None = None) -> list[Question]:
pass

@abstractmethod
def get_question(self, question_id: str) -> Question:
) -> list[TableDescriptionResponse]:
pass

@abstractmethod
def create_database_connection(
self, database_connection_request: DatabaseConnectionRequest
) -> DatabaseConnection:
) -> DatabaseConnectionResponse:
pass

@abstractmethod
Expand All @@ -92,75 +75,85 @@ def update_table_description(
self,
table_description_id: str,
table_description_request: TableDescriptionRequest,
) -> TableDescription:
) -> TableDescriptionResponse:
pass

@abstractmethod
def list_table_descriptions(
self, db_connection_id: str, table_name: str | None = None
) -> list[TableDescription]:
) -> list[TableDescriptionResponse]:
pass

@abstractmethod
def get_table_description(self, table_description_id: str) -> TableDescription:
def get_table_description(
self, table_description_id: str
) -> TableDescriptionResponse:
pass

@abstractmethod
def add_golden_records(
self, golden_records: List[GoldenRecordRequest]
) -> List[GoldenRecord]:
def create_prompt(self, prompt_request: PromptRequest) -> PromptResponse:
pass

@abstractmethod
def execute_sql_query(self, query: Query) -> tuple[str, dict]:
def get_prompt(self, prompt_id) -> PromptResponse:
pass

@abstractmethod
def create_response(
self,
run_evaluator: bool = True,
sql_response_only: bool = False,
generate_csv: bool = False,
query_request: CreateResponseRequest = None,
) -> Response:
def update_prompt(
self, prompt_id: str, update_metadata_request: UpdateMetadataRequest
) -> PromptResponse:
pass

@abstractmethod
def get_query_history(self, db_connection_id: str) -> list[QueryHistory]:
def get_prompts(self, db_connection_id: str | None = None) -> List[PromptResponse]:
pass

@abstractmethod
def get_responses(self, question_id: str | None = None) -> list[Response]:
def add_golden_sqls(
self, golden_sqls: List[GoldenSQLRequest]
) -> List[GoldenSQLResponse]:
pass

@abstractmethod
def get_response(self, response_id: str) -> Response:
def execute_sql_query(
self, sql_generation_id: str, max_rows: int = 100
) -> tuple[str, dict]:
pass

@abstractmethod
def get_response_file(
self, response_id: str, background_tasks: BackgroundTasks
) -> FileResponse:
def export_csv_file(self, sql_generation_id: str) -> io.StringIO:
pass

@abstractmethod
def delete_golden_record(self, golden_record_id: str) -> dict:
def get_query_history(self, db_connection_id: str) -> list[QueryHistory]:
pass

@abstractmethod
def get_golden_records(
def delete_golden_sql(self, golden_sql_id: str) -> dict:
pass

@abstractmethod
def get_golden_sqls(
self, db_connection_id: str = None, page: int = 1, limit: int = 10
) -> List[GoldenRecord]:
) -> List[GoldenSQL]:
pass

@abstractmethod
def update_golden_sql(
self, golden_sql_id: str, update_metadata_request: UpdateMetadataRequest
) -> GoldenSQL:
pass

@abstractmethod
def add_instruction(self, instruction_request: InstructionRequest) -> Instruction:
def add_instruction(
self, instruction_request: InstructionRequest
) -> InstructionResponse:
pass

@abstractmethod
def get_instructions(
self, db_connection_id: str = None, page: int = 1, limit: int = 10
) -> List[Instruction]:
) -> List[InstructionResponse]:
pass

@abstractmethod
Expand All @@ -172,7 +165,7 @@ def update_instruction(
self,
instruction_id: str,
instruction_request: UpdateInstruction,
) -> Instruction:
) -> InstructionResponse:
pass

@abstractmethod
Expand All @@ -187,6 +180,83 @@ def cancel_finetuning_job(
) -> Finetuning:
pass

@abstractmethod
def get_finetunings(self, db_connection_id: str | None = None) -> list[Finetuning]:
pass

@abstractmethod
def delete_finetuning_job(self, finetuning_job_id: str) -> dict:
pass

@abstractmethod
def get_finetuning_job(self, finetuning_job_id: str) -> Finetuning:
pass

@abstractmethod
def update_finetuning_job(
self, finetuning_job_id: str, update_metadata_request: UpdateMetadataRequest
) -> Finetuning:
pass

@abstractmethod
def create_sql_generation(
self, prompt_id: str, sql_generation_request: SQLGenerationRequest
) -> SQLGenerationResponse:
pass

@abstractmethod
def create_prompt_and_sql_generation(
self, prompt_sql_generation_request: PromptSQLGenerationRequest
) -> SQLGenerationResponse:
pass

@abstractmethod
def get_sql_generations(
self, prompt_id: str | None = None
) -> list[SQLGenerationResponse]:
pass

@abstractmethod
def get_sql_generation(self, sql_generation_id: str) -> SQLGenerationResponse:
pass

@abstractmethod
def update_sql_generation(
self, sql_generation_id: str, update_metadata_request: UpdateMetadataRequest
) -> SQLGenerationResponse:
pass

@abstractmethod
def create_nl_generation(
self, sql_generation_id: str, nl_generation_request: NLGenerationRequest
) -> NLGenerationResponse:
pass

@abstractmethod
def create_sql_and_nl_generation(
self,
prompt_id: str,
nl_generation_sql_generation_request: NLGenerationsSQLGenerationRequest,
) -> NLGenerationResponse:
pass

def create_prompt_sql_and_nl_generation(
self, request: PromptSQLGenerationNLGenerationRequest
) -> NLGenerationResponse:
pass

@abstractmethod
def get_nl_generations(
self, sql_generation_id: str | None = None
) -> list[NLGenerationResponse]:
pass

@abstractmethod
def get_nl_generation(self, nl_generation_id: str) -> NLGenerationResponse:
pass

@abstractmethod
def update_nl_generation(
self, nl_generation_id: str, update_metadata_request: UpdateMetadataRequest
) -> NLGenerationResponse:
pass
Loading

0 comments on commit a226a9d

Please sign in to comment.