🌲 Organizing and segregata data is still useful

- [<] Status Log - created:: 2025-05-23 - status-updated:: - status:: sapling - type:: thoughts - [S] Marketing - purpose:: Obsidian / AI content! To go live when Readwise's themed reviews do, maybe? - desc:: On structured data, bounded datasets, and focused AI models. - connections:: - [[2021-09-24 Yet Another Hot Take on “Folders vs Tags”]] - [[2025-02-06 AI cannot replace genuine human connection]] - [[2022-11-30 When an AI is so wrong it's actually helpful]] - references:: - [initial conversation audio transcript](https://chatgpt.com/c/6830c207-2514-8010-a94d-7c1b0c263a8c) It's been a couple of years now since I first fired off my personal take on the folders vs. tags debate, and I've gotta say, the more I use AI -- especially MCP servers -- the more vindicated I feel. Structured, local files make it easier for AI to deliver meaningful, cost-effective results. Trying my hand at dumping everything into a large language model and hoping for results has really highlighted the value of a well-organized, curated data structure. Although Claude is kind of the worst model for it (of the ones that actually support MCP... still waiting for OpenAI to support this!) the advantage of having most of my notes organized with folders is that when I use the "local files" MCP server for interacting with my notes, I can control which notes go into the LLM. This is useful for privacy purposes, of course, but the real advantage is that it helps me avoid overwhelming the AI's context window. As I type this in Obsidian, there are 1,230,817 words in my Obsidian vault. This does not count words contained PDFs or images (which are indexed and searchable thanks to Omnisearch), and does not count my Readwise database, which contains the _vast_ majority of my highlights, annotations, and archived full-text articles. That is an order of magnitude more than even the most badass models can handle as part of their context window right now. So one of the perennial arguments is tags versus folders, and one of the things that tags do in a lot of cases is they act sort of as plain text, like they just exist in a file, and in order to find the tag in a lot of these scenarios, the AI has to actually read the file before it finds the tag, whereas with folders, you are automatically eliminating anything that doesn't have that tag. Now, this isn't always the case. For example, when you are looking at an index, the index already has the tags and those sets are bounded in that way, but you find that this is harder to do if the tool you are using is not set up for it, and almost all of the tools for AI already account for folders in a way that not all of them account for tags, specifically using the example of MCP servers. Leaving aside my Obsidian notes for the moment, I want to argue for the value of using not just "the right" tool for the job, but _a different_ tool for most jobs. It's entirely **possible** to manage tasks in Notion -- but tools like Linear are built for software developers, like Trello is built for project managers, and Plan2Eat is built for home cooks. I've discovered that I don't really *want* my recipes mixed in with my academic research. I like having them stored in a different place. Tana lends itself toward a different kind of note-taking than Notion or Obsidian. I have an easier time staying focused on long-form writing when I'm in a restaurant than in my office, because my office is for work... in much the same way that I have an easier time writing essays in Obsidian or the Substack interface than in a Google Keep note. I get kind of stressed out when I have my ebooks in the same list as my short-form and long-form articles. Bounded sets and organized data makes me happy _and more productive_. AI may not have emotions, but it's definitely more effective when given organized, clear, neat data instead of a brain-dump of messy thoughts -- even (nay, especially!) when it seems like it can handle precisely that. Focusing on bounded data sets like "just my Linear tickets" or "search my Readwise highlights" gets me precise and relevant results. This is what I love about having MCP servers hooked up to [Raycast](https://www.raycast.com/core-features/ai), which is what I personally use for most of my work with AI models (it's got everything except image generation and deep research). I like that it has fine-tuned presets and useful extensions and integrations. Raycast even has [an Obsidian integration](https://www.raycast.com/marcjulian/obsidian). This approach leverages the strengths of AI without overwhelming it—or the user—with noise. It’s a way to maintain control and ensure that your AI interactions remain efficient and meaningful, especially for power users with large collections of files and notes. is that things like MCP servers can actually interact with them in a meaningful way, whereas most normal people don't really understand how to use a vector database or all of those things. So I feel like having local files where I can very easily bound what I'm giving the AI and MCP servers is good. In the ongoing debate between tags and folders, the case for bounded data sets has never been stronger. While tags offer flexibility, they often rely on AI parsing the full content of files to find relevance. Folders, on the other hand, inherently create a structured, bounded set that streamlines AI operations and enhances efficiency. One of the fundamental advantages of folders is that they predefine the scope of the data set. When you organize your information into folders, you automatically eliminate anything that doesn't belong to that category. This means that AI, like MCP servers, can interact with these sets more meaningfully and efficiently. You're not asking the AI to sift through an ocean of data; instead, you're giving it a well-defined pond. Tags, while powerful, often function like plaintext labels within files. To leverage them, the AI must parse the entire file, which can add noise and inefficiency. This becomes especially apparent in tools not optimized for tag-based organization. In contrast, folders are a universal construct that most AI tools, including MCP servers, readily support. Take Readwise as an example: it excels at handling a specific subset of information—highlights and notes—making it easier to retrieve and utilize insights. This bounded approach ensures that the AI focuses on a curated data set, leading to more relevant and actionable results. In essence, while tags can be powerful in certain contexts, the structured nature of folders offers a significant advantage in managing large volumes of information. It’s a strategy that not only saves time but also enhances the quality of the AI’s output, making your digital life a whole lot easier.