Blogthumb: Journal Notes (brown) (c. Eric Hepperle, 2021)
All Art Confessions of a Talented & Gifted Dropout Eric's Published Writings Eric's Works Organization Project Management Writing

Journal Notes: A Better Organizational Plan for my Digital Files

Organizing My Digital Files

I have come to the realization that it is time for a better logical organization system and process for my digital files. Lets first look at some of the challenges we need to overcome at present.

The Main Challenge

Whenever one is problem solving, the first step is to clearly identify the problem and then to break it down into smaller challenges. Here are some of the challenges we are trying to solve today:

It is difficult to quickly find ALL my photos, all my graphic designs, all my writings, all my blog posts, etc. This is because I am constantly creating new creative content and hence, files. Subsequently, I have run out of contiguous hard drive space to store the files.

Recent Solution Attempts

In order to be able to locate my files, the best solution I have heretofore come up with is this:

  1. Burn all files to BDR, DVD, or CD optical disc storage and use freeware programs like Agent Ransack and Snap2HTML.
  2. With Snap2HTML I make a searchable index of files and folders (keeping the folder structure intact) which allows me to quickly and easily document the contents of any storage media including optical discs (which most file indexing programs cannot read).
  3. A searchable HTML file is generated for each disc processed.
  4. All HTML files are stored in the same folder.
  5. When I need to retrieve a file or set of files with similar naming conventions, I just use Agent Ransack to do a file contents search on all HTML files generated by Snap2HTML which identifies what discs the files are located on.
  6. Manually load each disc and copy file to a hard drive so it can be worked with.

The Problem with My Current Approach

The problem with this approach is that it is tedious and time consuming. It seems like a better solution would be something like maintaining a library catalog/database. A catalog of files is basically what I’m generating with the above method, but it isn’t super efficient and the files are spread across dozens of discs that have to be manually loaded to be retrieved.

A Breakdown of Current Organizational System Issues to Solve

  1. Try to burn entire folders of related content on as few discs as possible. Optimally, all photos would fit on one disc, all graphic designs on another, all writings on another, etc. But, unfortunately it didn’t work that way. In the end I have about 6-7 discs. That is per backup session.
  2. Reduce size and frequency of backed up file duplication. I periodically backup all my files to disc. In between backups there have been several files added, of course, but there have also been file and folder deletions, renamings, and movings. Which leads to another problem: keep track of a file or folder that has gone through many revisions is difficult currently.
  3. I need a bigger hard drive to migrate all my files onto. Like 300 Terabytes. But the cost for such a unit (if it exists) is likely prohibitive at this point. So for now, I have files partitioned across eight large external hard drives and two internal laptop drives.

Possible Solutions

  • Create a catalog of digital files using a third-party open source software solution.
  • Create a folder for every creative project. For each writing, create a folder for it. For each graphic design, create a folder for it to live in. Do the same for each video or audio file. Don’t create a folder for each photo, but each photo that has edits should have a folder of edits created so that all the edits are in one place. Collages are considered designs, not edits. Adding a watermark to a photo is an edit.

Risks with this Approach

  1. I have thousands of files that will need folders and there is a limit in Windows 10 of how many subfolders a folder can contain.
  2. To give the files and folders meaningful semantic names will mean that in some cases the full path name length will exceed the limit.
  3. In addition to the afore mentioned limits, there is also the impact on performance cost.

I don’t have much time left tonight, so here is a quick summary of these Windows limits (I haven’t verified the accuracy of these figures, but at least it is a place to start):

SpiritX on Microsoft Forums says:

You can put 4,294,967,295 files into a single folder if drive is formatted with NTFS (would be unusual if it were not) as long as you do not exceed 256 terabytes (single file size and space) or all of disk space that was available whichever is less.

For older FAT32 drives the limits are 65,534 files in a single folder and max file size of 4 Gigabytes and 2TB of total space or all of disk space that was available or whichever is less.

harrymc on SuperUser.com said:

As far the theoretical capacities of NTFS are concerned, there is no problem.

The Microsoft article on Maximum Sizes on an NTFS Volume specifies that the maximum of files per volume is 4,294,967,295, and that should also be the maximum on folders. However, you would need an extremely fast computer with lots of RAM to be able to even view that folder in Explorer.

From my own experience, on a good computer of several years ago, viewing a folder with thousands of sub-folders took some dozen of seconds just to show the folder.
… I really suggest to rethink again your folder architecture.

And MarkR on StackOverflow.com:

But the problem is always tools. Third party tools (such as MS explorer, your backup tool, etc) are likely to suck or at least be extremely unusable with large numbers of files per directory.

Anything which does a directory scan, is likely to be quite slow, but worse, some of these tools have poor algorithms which don’t scale to even modest (10k+) numbers of files per directory.

Conclusion

Based on my research today it seems that although there are theoretically super-high (over millions) of files or subfolders that can be in a folder, accessing and reading them is where the bottleneck lies. Third-party search and indexing tools and possibly (likely?) system resources are the weak link.

Unfortunately, I was not able to find anything in my few hours of online researching this issue that lists “practical file and folder size limits for Windows NTFS folder architecture

Further Reading: