Discussion about this post

User's avatar
Tyler Sloan's avatar

Thanks for this article, very interesting!

A major takeaway I've kept with me since the good research code handbook, /9despite a complete overhaul of the way we code since then ) has been the philosophy of externalizing as much as possible with good structure, so there is less decision fatigue and you can leverage as much of your working memory as possible on the task at hand.

I've been working with a different set of tools, but brushing up against some fairly similar challenges and issues. I use VScode mostly via Copilot agents (ChatGPT usually), but much of the structure is similar. My codebase was getting out of hand, so I found it very useful to codify the rules for what kind of functions should go into which submodules into a set of agent instructions, and have specific agents for certain kinds of tasks (core, exploratory, hygiene) etc. I've found it useful to keep a document with all of the function docstrings so every time an agent is prompted to do something, it first follows a set of steps to check the existing codebase for reusable code, decide based on our rules where a new function should go, and automatically updates the code index each time a change is made to the core codebase. I finally have my working memory back.

I'm curious to hear your thoughts on 'chain of prompts', logging user prompts to be able to reconstruct the thought process that went into whatever was generated. Ultimately that's what a scientist is being trained to do, ask the right questions. If I were a PI and my students were using LLMs for scientific coding, I would want to be able to check that - as if it were a lab book. It would also incentivize students to prompt as clearly and unambiguously as possible - a net positive for good outcomes, and good thinking.

Chetan Kandpal's avatar

Using text-based notebooks was a great idea! Context matters. But have you ever been worried about "taste" in some sense? I align with your metacognition concerns as good safety rails, but more and more scientific Claude code inserts design decisions in your code, which looks fine to some extent, but draws with it "code smell" and a very synthetic taste. Considering code quality in some sense results in downstream inference during metacog, do you propose some methods to tackle taste? How do we protect taste with increasing usage? Senior folks develop their research/code taste overtime, but what about the junior population, do you foresee some serious homogeneity (which not always be great) in code infra/critical thinking in scientific workspaces?

3 more comments...

No posts

Ready for more?