TechTakes

1933 readers

51 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

Apple: ‘Reasoning’ AIs fail hard if they actually have to think (pivot-to-ai.com)

submitted 2 days ago by dgerard@awful.systems to c/techtakes@awful.systems

20 comments fedilink hide all child comments

Video version

you are viewing a single comment's thread
view the rest of the comments

[–] YourNetworkIsHaunted@awful.systems 7 points 1 day ago (1 children)

That would be the best way to actively catch the cheating happening here, given that the training datasets remain confidential. But I also don't know that it would be conclusive or convincing unless you could be certain that the problems in the private set were similar to the public set.

In any case either you're doubledipping for credit in multiple places or you absolutely should get more credit for the scoop here.

[–] diz@awful.systems 6 points 1 day ago

I’d just write the list then assign randomly. Or perhaps pseudorandomly like sort by hash and then split in two.

One problem is that it is hard to come up with 20 or more completely unrelated puzzles.

Although I don’t think we need a large number for statistical significance here, if it’s like 8/10 solved in the cheating set and 2/10 in the hold back set.