People that deal with large amounts of data and code –
1. Where do you get your data from and where do you store it? Locally? In a database in cloud?
2. What are you guys using to clean the data? Is it a manual process for you?
3. What about writing code? Do you use claude or one of the other llms to help you write code? Does that work well?
4. Are you always using your university’s cluster to run the code?
I assume you spend significant amount of your time in this process, have llms reduced that time?
Comments
Hey there. Much data begins locally or on shared lab drives then moves to cluster storage or repositories funding agencies might require. Cleaning is rarely pure manual it’s mostly scripts you constantly refine since spotting weird artifacts takes code not just staring at it. LLMs can definitely draft code snippets or fix basic errors saving time but they don’t grasp your core research question or experimental nuances. You’ll likely run big jobs on the cluster only after debugging locally on test data first.