• 0 Posts
  • 1 Comment
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle
  • Wait a second here… I skimmed the paper and GitHub and didn’t find an answer to a very important question: is this GPT3.5 or 4? There’s a huge difference in code quality between the two and either they made a giant accidental omission or they are being intentionally misleading. Please correct me if I missed where they specified that. I’m assuming they were using GPT3.5, so yeah those results would be as expected. On the HumanEval benchmark, GPT4 gets 67% and that goes up to 90% with reflexion prompting. GPT3.5 gets 48.1%, which is exactly what this paper is saying. (source).