OpenAI representation fails to produce output when response content is None #2176

jeaninejuliettes · 2024-10-11T06:15:14Z

Have you searched existing issues? 🔎

I have searched and found no existing issues

Desribe the bug

I ran into issues when using the OpenAI representation as it sometimes produces a content of None, which then produced an error when trying to run:
label = response.choices[0].message.content.strip().replace("topic: ", "")

Which makes sense, since the content is not a string.
I'm unable to generate a minimal example since this is due to the output of OpenAI GPT.

I see two ways to work around this, but both have their own downsides/impact on the results, maybe anyone else sees better option:

set the content to type string before processing it any further. With the major downside that the label will then be set to the string 'None'
use a try and except to extract the content, strip this and replace the 'topic:' part of the string. If this fails the label is set to a fixed value like an empty string (and producing a warning that his has happened)

For now I fixed it by creating an inherited customOpenAI representation class within my script where I used the second option as a solution.

Reproduction

from bertopic import BERTopic

BERTopic Version

0.16.4

MaartenGr · 2024-10-11T06:29:21Z

Thank you for sharing this. I see that you opened a similar issue (#2177). Are you alright with closing that one? To me, they seem like duplicates.

With respect to your issue, the idea of content violation was mentioned in earlier issues and addressed with the following:

BERTopic/bertopic/representation/_openai.py

Lines 232 to 237 in 9518035

    
           # Check whether content was actually generated 
        
           # Addresses #1570 for potential issues with OpenAI's content filter 
        
           if hasattr(response.choices[0].message, "content"): 
        
               label = response.choices[0].message.content.strip().replace("topic: ", "") 
        
           else: 
        
               label = "No label returned"

Which makes it rather surprising that you get this issue. It may be that the API of OpenAI was updated and now always returns "content" but I'm not sure. Either way, simply doing an additional check here makes sense to me.

jeaninejuliettes · 2024-10-11T08:39:20Z

No, I'm sorry this was unclear, for this specific issue I don't get any errors regarding content violation. It simply seems that the result of response.choices[0].message returns None, which then produces an error, since you can't use strip on a NoneType object. I don't know when/why this happens, but it doesnt seem to be the result of an error produced by the API, since the response object exists.

Also the reason why I created a separate "issue" (discussion/question) for the content violation, since I grasped from the code that that supposed to have been fixed, but I'm still running into this unfortunately. But that is a discussion for the #2177 as far as I'm concerned. They don't seem to be related. (as far as I can tell)

MaartenGr · 2024-10-11T10:03:19Z

I think that this:

I ran into issues when using the OpenAI representation as it sometimes produces a content of None, which then produced an error when trying to run:
label = response.choices[0].message.content.strip().replace("topic: ", "")

and this:

response.choices[0].message returns None

contradict with one another. The reason why I think that is because you shouldn't be able to reach label = ... at all because there is this check (which is used for content violation):

BERTopic/bertopic/representation/_openai.py

Lines 232 to 237 in 9518035

    
           # Check whether content was actually generated 
        
           # Addresses #1570 for potential issues with OpenAI's content filter 
        
           if hasattr(response.choices[0].message, "content"): 
        
               label = response.choices[0].message.content.strip().replace("topic: ", "") 
        
           else: 
        
               label = "No label returned"

Thus, response.choices[0].message returns None cannot be the case because there is check to see whether it contains the attribute "content", right? Or did you mean that "content" returns None? If so, then the API of OpenAI servers might have changed since it didn't show that behavior before.

Looking through the issues, it seems that this was mentioned before and a PR that hasn't been updated in a couple of months. API changes might relate here but also the reason why you get a None, which typically is a content violation issue. Based on what I see, I'm convinced they relate to one another since the None you get is typically some sort of content violation issue.

jeaninejuliettes · 2024-10-11T11:09:32Z

Yeas, I mean that the content returns None, the response exists, but the content its returning is empty, the element content does exist in the response object.
Ah, I didn't see that issue (apologies), but it is the exact error message I'm seeing. And reading through the issue, it looks quite similar. But the PR is inactive?

Funny thing is, I'm still also getting content violation errors, but let's keep that out of this discussion for now ;)

MaartenGr · 2024-10-15T13:50:41Z

It does seem to be inactive and unfortunately, I currently do not have the time to look it over. I would also be alright with a small PR just making sure it gives no error. Any additional work can be done later.

jeaninejuliettes · 2024-10-16T06:25:32Z

Ok, I can look into that!

jeaninejuliettes added the bug Something isn't working label Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI representation fails to produce output when response content is None #2176

OpenAI representation fails to produce output when response content is None #2176

jeaninejuliettes commented Oct 11, 2024

MaartenGr commented Oct 11, 2024

jeaninejuliettes commented Oct 11, 2024

MaartenGr commented Oct 11, 2024

jeaninejuliettes commented Oct 11, 2024 •

edited

Loading

MaartenGr commented Oct 15, 2024

jeaninejuliettes commented Oct 16, 2024

OpenAI representation fails to produce output when response content is None #2176

OpenAI representation fails to produce output when response content is None #2176

Comments

jeaninejuliettes commented Oct 11, 2024

Have you searched existing issues? 🔎

Desribe the bug

Reproduction

BERTopic Version

MaartenGr commented Oct 11, 2024

jeaninejuliettes commented Oct 11, 2024

MaartenGr commented Oct 11, 2024

jeaninejuliettes commented Oct 11, 2024 • edited Loading

MaartenGr commented Oct 15, 2024

jeaninejuliettes commented Oct 16, 2024

jeaninejuliettes commented Oct 11, 2024 •

edited

Loading