Adding new tidy table article (#1204)

instructor-ai · Nov 21, 2024 · 0c6de0e · 0c6de0e
1 parent 7485747
commit 0c6de0e
Show file tree

Hide file tree

Showing 13 changed files with 286 additions and 121 deletions.
diff --git a/docs/blog/posts/google-openai-client.md b/docs/blog/posts/google-openai-client.md
@@ -22,6 +22,8 @@ If you're unfamiliar with instructor, we provide a simple interface to get struc
 
 This makes it easy to switch between providers, get reliable outputs from language models and ultimately build production grade LLM applications.
 
+<!-- more -->
+
 ## The current state
 
 The new integration provides an easy integration with the Open AI Client, this means that using function calling with Gemini models has become much easier. We don't need to use a gemini specific library like `vertexai` or `google.generativeai` anymore to define response models.

diff --git a/docs/blog/posts/img/untidy_table.png b/docs/blog/posts/img/untidy_table.png
diff --git a/docs/blog/posts/introducing-structured-outputs-with-cerebras-inference.md b/docs/blog/posts/introducing-structured-outputs-with-cerebras-inference.md
@@ -1,23 +1,24 @@
 ---
 authors:
-- ivanleomk
-- sarahchieng
+  - ivanleomk
+  - sarahchieng
 categories:
-- API Development
-- Pydantic
-- Performance Optimization
+  - API Development
+  - Pydantic
+  - Performance Optimization
 comments: true
 date: 2024-10-15
-description: Learn how to use Cerebras Inference for structured outputs, faster model
+description:
+  Learn how to use Cerebras Inference for structured outputs, faster model
   inference, and seamless integration with Pydantic models.
 draft: false
 slug: introducing-structured-outputs-with-cerebras-inference
 tags:
-- Cerebras Inference
-- Pydantic
-- API Integration
-- Fast Inference
-- Structured Outputs
+  - Cerebras Inference
+  - Pydantic
+  - API Integration
+  - Fast Inference
+  - Structured Outputs
 ---
 
 # Introducing structured outputs with Cerebras Inference
@@ -32,6 +33,8 @@ Sign up for a Cerebras Inference API key here at [cloud.cerebras.ai](http://clou
 
 To get guaranteed structured outputs with Cerebras Inference, you
 
+<!-- more -->
+
 1. Create a new Instructor client with the `from_cerebras` method
 2. Define a Pydantic model to pass into the `response_model` parameter
 3. Get back a validated response exactly as you would expect
@@ -125,4 +128,4 @@ for person in resp:
     # > Person(name='Jessica', age=26)
 ```
 
-And that’s it! We're excited to see what you build with Instructor and Cerebras! If you have any questions about Cerebras or need to get off the API key waitlist, please reach out to sarah.chieng@cerebras.net.
+And that’s it! We're excited to see what you build with Instructor and Cerebras! If you have any questions about Cerebras or need to get off the API key waitlist, please reach out to sarah.chieng@cerebras.net.
diff --git a/docs/blog/posts/llm-as-reranker.md b/docs/blog/posts/llm-as-reranker.md
@@ -1,19 +1,19 @@
 ---
 authors:
-- jxnl
+  - jxnl
 categories:
-- LLM
-- Pydantic
+  - LLM
+  - Pydantic
 comments: true
 date: 2024-10-23
 description: Learn how to use Instructor and Pydantic to create an LLM-based reranker for improving search results relevance.
 draft: false
 tags:
-- LLM
-- Pydantic
-- Instructor
-- Search Relevance
-- Reranking
+  - LLM
+  - Pydantic
+  - Instructor
+  - Search Relevance
+  - Reranking
 ---
 
 # Building an LLM-based Reranker for your RAG pipeline
@@ -30,6 +30,8 @@ In this blog post, we'll show you how to create an LLM-based reranker using Inst
 
 By the end of this tutorial, you'll be able to implement a llm reranker to label your synthetic data for fine-tuning a traditional reranker, or to build out an evaluation pipeline for your RAG system. Let's dive in!
 
+<!-- more -->
+
 ## Setting Up the Environment
 
 First, let's set up our environment with the necessary imports:
@@ -167,7 +169,7 @@ If you want to extend this example, you could use the `rerank_results` function
 
 Moreover, we could also add validators to the `Label.chunk_id` field to ensure that the chunk_id is present in the `chunks` list. This might be useful if labels are `uuids` or complex strings and we want to ensure that the chunk_id is a valid index for the chunks list.
 
-heres an example 
+heres an example
 
 ```python
 class Label(BaseModel):
@@ -184,4 +186,4 @@ class Label(BaseModel):
         return v
 ```
 
-This will automatically check that the `chunk_id` is present in the `chunks` list and raise a `ValueError` if it is not, where `context` is the context dictionary that we passed into the `rerank_results` function.
+This will automatically check that the `chunk_id` is present in the `chunks` list and raise a `ValueError` if it is not, where `context` is the context dictionary that we passed into the `rerank_results` function.
diff --git a/docs/blog/posts/multimodal-gemini.md b/docs/blog/posts/multimodal-gemini.md
@@ -1,19 +1,19 @@
 ---
 authors:
-- ivanleomk
+  - ivanleomk
 categories:
-- Gemini
-- Multimodal
+  - Gemini
+  - Multimodal
 comments: true
 date: 2024-10-23
 description: Learn how to use Google's Gemini model for multimodal structured extraction of YouTube videos, extracting structured recommendations for tourist destinations.
 draft: false
 tags:
-- Gemini
-- Multimodal AI
-- Travel Recommendations
-- Pydantic
-- Python
+  - Gemini
+  - Multimodal AI
+  - Travel Recommendations
+  - Pydantic
+  - Python
 ---
 
 # Structured Outputs with Multimodal Gemini
@@ -30,6 +30,8 @@ import instructor
 import google.generativeai as genai
 ```
 
+<!-- more -->
+
 ## Defining Our Data Models
 
 We'll use Pydantic to define our data models for tourist destinations and recommendations:
@@ -86,27 +88,27 @@ print(resp)
 
     ```python
     Recomendations(
-        chain_of_thought='The video recommends visiting Takayama city, in the Hida Region, Gifu Prefecture. The 
-    video suggests visiting the Miyagawa Morning Market, to try the Sarubobo good luck charms, and to enjoy the 
-    cookie cup espresso, made by Koma Coffee. Then, the video suggests visiting a traditional Japanese Cafe, 
-    called Kissako Katsure, and try their matcha and sweets. Afterwards, the video suggests to visit the Sanmachi 
+        chain_of_thought='The video recommends visiting Takayama city, in the Hida Region, Gifu Prefecture. The
+    video suggests visiting the Miyagawa Morning Market, to try the Sarubobo good luck charms, and to enjoy the
+    cookie cup espresso, made by Koma Coffee. Then, the video suggests visiting a traditional Japanese Cafe,
+    called Kissako Katsure, and try their matcha and sweets. Afterwards, the video suggests to visit the Sanmachi
     Historic District, where you can find local crafts and delicious foods. The video recommends trying Hida Wagyu
-    beef, at the Kin no Kotte Ushi shop, or to have a sit-down meal at the Kitchen Hida. Finally, the video 
+    beef, at the Kin no Kotte Ushi shop, or to have a sit-down meal at the Kitchen Hida. Finally, the video
     recommends visiting Shirakawa-go, a World Heritage Site in Gifu Prefecture.',
-        description='This video recommends a number of places to visit in Takayama city, in the Hida Region, Gifu 
-    Prefecture. It shows some of the local street food and highlights some of the unique shops and restaurants in 
+        description='This video recommends a number of places to visit in Takayama city, in the Hida Region, Gifu
+    Prefecture. It shows some of the local street food and highlights some of the unique shops and restaurants in
     the area.',
         destinations=[
             TouristDestination(
                 name='Takayama',
-                description='Takayama is a city at the base of the Japan Alps, located in the Hida Region of 
+                description='Takayama is a city at the base of the Japan Alps, located in the Hida Region of
     Gifu.',
                 location='Hida Region, Gifu Prefecture'
             ),
             TouristDestination(
                 name='Miyagawa Morning Market',
-                description="The Miyagawa Morning Market, or the Miyagawa Asai-chi in Japanese, is a market that 
-    has existed officially since the Edo Period, more than 100 years ago. It's open every single day, rain or 
+                description="The Miyagawa Morning Market, or the Miyagawa Asai-chi in Japanese, is a market that
+    has existed officially since the Edo Period, more than 100 years ago. It's open every single day, rain or
     shine, from 7am to noon.",
                 location='Hida Takayama'
             ),
@@ -117,19 +119,19 @@ print(resp)
             ),
             TouristDestination(
                 name='Koma Coffee',
-                description="Koma Coffee is a shop that has been in business for about 50 or 60 years, and they 
+                description="Koma Coffee is a shop that has been in business for about 50 or 60 years, and they
     serve coffee in a cookie cup. They've been serving coffee for about 10 years.",
                 location='Hida Takayama'
             ),
             TouristDestination(
                 name='Kissako Katsure',
-                description='Kissako Katsure is a traditional Japanese style cafe, called Kissako, and the name 
+                description='Kissako Katsure is a traditional Japanese style cafe, called Kissako, and the name
     means would you like to have some tea. They have a variety of teas and sweets.',
                 location='Hida Takayama'
             ),
             TouristDestination(
                 name='Sanmachi Historic District',
-                description='Sanmachi Dori is a Historic Merchant District in Takayama, all of the buildings here 
+                description='Sanmachi Dori is a Historic Merchant District in Takayama, all of the buildings here
     have been preserved to look as they did in the Edo Period.',
                 location='Hida Takayama'
             ),
@@ -146,7 +148,7 @@ print(resp)
             ),
             TouristDestination(
                 name='Kin no Kotte Ushi',
-                description='Kin no Kotte Ushi is a shop known for selling Beef Sushi, especially Hida Wagyu Beef 
+                description='Kin no Kotte Ushi is a shop known for selling Beef Sushi, especially Hida Wagyu Beef
     Sushi. Their sushi is medium rare.',
                 location='Hida Takayama'
             ),
@@ -202,6 +204,7 @@ To address these limitations and expand the capabilities of our video analysis s
 2. **Speaker Diarization**: Implement speaker recognition to attribute recommendations to specific individuals. This could be particularly useful for videos featuring multiple hosts or interviewees.
 
 3. **Segment-based Analysis**: Process longer videos in segments to maintain accuracy and capture all relevant information. This approach could involve:
+
    - Splitting the video into smaller chunks
    - Analyzing each chunk separately
    - Aggregating and deduplicating results

diff --git a/docs/blog/posts/openai-multimodal.md b/docs/blog/posts/openai-multimodal.md
@@ -1,24 +1,26 @@
 ---
 authors:
-- jxnl
+  - jxnl
 categories:
-- OpenAI
-- Audio
+  - OpenAI
+  - Audio
 comments: true
 date: 2024-10-17
 description: Explore the new audio capabilities in OpenAI's Chat Completions API using the gpt-4o-audio-preview model.
 draft: false
 tags:
-- OpenAI
-- Audio Processing
-- API
-- Machine Learning
+  - OpenAI
+  - Audio Processing
+  - API
+  - Machine Learning
 ---
 
 # Audio Support in OpenAI's Chat Completions API
 
 OpenAI has recently introduced audio support in their Chat Completions API, opening up exciting new possibilities for developers working with audio and text interactions. This feature is powered by the new `gpt-4o-audio-preview` model, which brings advanced voice capabilities to the familiar Chat Completions API interface.
 
+<!-- more -->
+
 ## Key Features
 
 The new audio support in the Chat Completions API offers several compelling features:

diff --git a/docs/blog/posts/pairwise-llm-judge.md b/docs/blog/posts/pairwise-llm-judge.md
@@ -1,19 +1,19 @@
 ---
 authors:
-- jxnl
+  - jxnl
 categories:
-- LLM
-- Pydantic
+  - LLM
+  - Pydantic
 comments: true
 date: 2024-10-17
 description: Explore how to use Instructor and Pydantic to create a pairwise LLM judge for evaluating text relevance.
 draft: false
 tags:
-- LLM
-- Pydantic
-- Instructor
-- Text Relevance
-- AI Evaluation
+  - LLM
+  - Pydantic
+  - Instructor
+  - Text Relevance
+  - AI Evaluation
 ---
 
 # Building a Pairwise LLM Judge with Instructor and Pydantic
@@ -24,6 +24,8 @@ In this blog post, we'll explore how to create a pairwise LLM judge using Instru
 
 Evaluating text relevance is a common task in natural language processing and information retrieval. By leveraging large language models (LLMs) and structured outputs, we can create a system that judges the similarity or relevance between a question and a given text.
 
+<!-- more -->
+
 ## Setting Up the Environment
 
 First, let's set up our environment with the necessary imports:

diff --git a/docs/blog/posts/parea.md b/docs/blog/posts/parea.md
@@ -1,20 +1,21 @@
 ---
 authors:
-- jxnl
-- joschkabraun
+  - jxnl
+  - joschkabraun
 categories:
-- LLM Observability
+  - LLM Observability
 comments: true
 date: 2024-07-17
-description: Explore how Parea enhances the OpenAI instructor, enabling better monitoring,
+description:
+  Explore how Parea enhances the OpenAI instructor, enabling better monitoring,
   collaboration, and error tracking for LLM applications.
 draft: false
 tags:
-- Parea
-- OpenAI
-- LLM
-- instructor
-- validation
+  - Parea
+  - OpenAI
+  - LLM
+  - instructor
+  - validation
 ---
 
 # Parea for Observing, Testing & Fine-tuning of Instructor
@@ -29,10 +30,11 @@ tags:
 
     Before starting this tutorial, make sure that you've registered for a [Parea](https://www.parea.ai) account. You'll also need to create an [API key](https://docs.parea.ai/api-reference/authentication).
 
-
 ## Example: Writing Emails with URLs from Instructor Docs
 
-We will demonstrate Parea by using `instructor` to write emails which only contain URLs from the `instructor` docs. We'll need to install our dependencies before proceeding so simply run the command below. 
+We will demonstrate Parea by using `instructor` to write emails which only contain URLs from the `instructor` docs. We'll need to install our dependencies before proceeding so simply run the command below.
+
+<!-- more -->
 
 ```bash
 pip install -U parea-ai instructor
@@ -133,13 +135,11 @@ To take a look at trace of this execution checkout the screenshot below. Noticea
 
 ![](./img/parea/trace.png)
 
-
 Above we can see that while the email was successfully created, there was a validation error which meant that additional cost & latency were introduced because of the initially failed validation.
 Below we can see a visualization of the average validation error count for our instructor usage over time.
 
 ![](./img/parea/validation-error-chart.png)
 
-
 ## Label Responses for Fine-Tuning
 
 Sometimes you may want to let subject-matter experts (SMEs) label responses to use them for fine-tuning. Parea provides a way to do this via an annotation queue. Editing raw JSON objects to correct tool use & function calling responses can be error-prone, esp. for non-devs. For that purpose, Parea has a so-called [Form Mode](https://docs.parea.ai/manual-review/overview#labeling-function-calling-tool-use-responses) which allows the user to safely fill-out a form instead of editing the JSON object. The labeled data can then be exported and used for fine-tuning.
@@ -152,9 +152,9 @@ Sometimes you may want to let subject-matter experts (SMEs) label responses to u
 
     ```python hl_lines="5 6"
     from parea import Parea
-    
+
     p = Parea(api_key=os.getenv("PAREA_API_KEY"))
-    
+
     dataset = p.get_collection(DATASET_ID)  #(1)!
     dataset.write_to_finetune_jsonl("finetune.jsonl")  #(2)!
     ```
@@ -166,4 +166,4 @@ Sometimes you may want to let subject-matter experts (SMEs) label responses to u
 
     ```bash
     instructor jobs create-from-file finetune.jsonl
-    ```
+    ```