One thing that I found really interesting was the ability of the LLM to inspect the COM files for ZEXALL / ZEXCOM tests for the Z80, easily spot the CP/M syscalls that were used (a total of three), and implement them for the extended z80 test (executed by make fulltest). So, at this point, why not implement a full CP/M environment? Same process again, same good result in a matter of minutes. This time I interacted with it a bit more for the VT100 / ADM3 terminal escapes conversions, reported things not working in WordStar initially, and in a few minutes everything I tested was working well enough (but, there are fixes to do, like simulating a 2Mhz clock, right now it runs at full speed making CP/M games impossible to use).
Testing LLM reasoning abilities with SAT is not an original idea; there is a recent research that did a thorough testing with models such as GPT-4o and found that for hard enough problems, every model degrades to random guessing. But I couldn't find any research that used newer models like I used. It would be nice to see a more thorough testing done again with newer models.
Дания захотела отказать в убежище украинцам призывного возраста09:44。WPS官方版本下载对此有专业解读
Opens in a new window。旺商聊官方下载是该领域的重要参考
Watch: Timelapse shows Nasa rocket's 12-hour journey to launch pad
Complete digital access to quality FT journalism with expert analysis from industry leaders. Pay a year upfront and save 20%.。业内人士推荐safew官方版本下载作为进阶阅读