by tsdocode from CoE
Use KMS email to access resources
.
├── README.md
├── Training
│ ├── Example.ipynb # Example notebook for Colab Training
│ ├── README.md
│ ├── __init__.py
│ └── train.py # Train GPT model
│ ├── build-dataset.py # Apply preprocessing and build txt dataset from JSON
│ ├── cloud.py # Using OwnCloud services
│ ├── model.py # GPT model
│ ├── preprocess.py # Apply preprocessing
│ ├── requirements.txt # Requirements Python package
├── WebApp
│ ├── API
│ │ ├── main.py
│ │ ├── requirements.txt
│ │ └── server
│ │ ├── copy-saved-model-here
│ │ ├── app.py
│ │ ├── text2sql
│ │ │ ├── __init__.py
│ │ │ ├── gpt_model.py # GPT model
│ │ │ ├── postprocess.py # Postprocess for inference
│ │ │ ├── preprocess.py # Preprocess for inference
│ │ │ └── text2sql.py # Text2SQL model
│ │ ├── models # API model
│ │ │ ├── GPT.py # GPT request format
│ │ │ └── reponse.py # Response format for API
│ │ └── routes
│ │ ├── GPT.py # GPT routes
│ │ ├── __init__.py
│ └── UI
│ ├── app.py # UI with Streamlit
│ ├── log.txt # log file
│ └── utils.py # utils.py is used to generate UI
└── requirements.txt
1.1 Install Miniconda
Follow this instruction to install miniconda. Skip it if you using Google Colab or sth else.
If you using Google Colab, please add ! symbol before instruction below
Example:
!pip install -r requirements.txt
1.2 Install Library
git clone
cd
pip install -r requirements.txt
Sample data format
{
'data' : [
{
'schema' : "",
'question' : "",
'sql' : ""
},
{
'schema' : "",
'question' : "",
'sql' : ""
}
]
}
Collect and annotate your data with this format.
cd Training
python build-dataset.py -i <input_file> -o <output_file>
if you don't provide --output-file, file name will be the same as --input-file
You can also custom txt dataset by modify make_prompt() function in dataset.py
python train.py -i <input_data> -o <output_model_folder> -m <model_name> -p <pretrained_model_path> -e <epochs> -l <learning_rate>
default:
- model_name: 125M
- epochs: 1
- learning_rate: 5e-5
if you want to continued training from a previous model, you can use -p to specify the path of the previous model.
cd WebApp
- Copy your model into
WebApp/API/server/copy-saved-model-here
- Add your model path in WebApp/API/server/gpt/text2sql.py
model = GPTModel(model_path="./API/server/copy-saved-model-here")
- In terminal enter:
python API/main.py & streamlit run UI/app.py
API port: 8001 UI port: 8000