Vinayak Vikram

Writing & Training a Science Bowl Set Generator

vinayak-vikram/bowler vinayak-vikram/scibowlsetsannotated

I began by writing a simple rust program to give me the text contents of a PDF, to speed things up. Then, I used it to export a bunch of sets to .txts (see the scibowlsetsannotated repo).

My first thought was to have a simple pipeline where an LLM generated around 3 questions per page, which was passed into a filtering and classification algorithm/model and finally back into an LLM for option generation (in the event that it is MCQ). However, I realized that if pages were overly dense/overly sparse, this could go horribly. My current idea is something like: