The Interconnectedness of Things

Part 2: Modernizing Document Identification and Indexing Across the Federal Government

QFlow Systems, LLC

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 19:32

Share Your Thoughts With Us!

What happens after documents enter your system—and why does everything still feel so slow?

In this episode of The Interconnectedness of Things, host Emily Nava sits down with Dr. Andrew Hutson to unpack one of the most overlooked (and time-consuming) phases of the document lifecycle: identification, triage, and indexing.

Despite decades of investment in metadata, tagging, and search tools, many agencies still struggle with inconsistent classification, institutional knowledge gaps, and the manual burden placed on staff. The result? Bottlenecks that delay work, reduce efficiency, and limit the value of digital systems.

Explore how emerging approaches—like vector embeddings and AI-augmented transcription—are transforming this process. By converting documents into mathematical representations of meaning, organizations can move beyond keyword search to true semantic understanding. The result is smarter automation: systems that can recognize, classify, and enrich documents based on patterns from thousands of prior records—without adding more work for users.

They also discuss:

  •  Why traditional metadata strategies fall short in dynamic environments 
  •  How poor standardization at intake creates downstream inefficiencies 
  •  The hidden cost of “corporate amnesia” in document management systems 
  •  How AI-driven indexing unlocks better workflows, retention, and retrieval 

This episode is part two of a five-part series on the full lifecycle of work—focusing on how to reduce cognitive load, eliminate manual bottlenecks, and enable agencies to truly “do more with what they have.”

About "The Interconnectedness of Things"
Welcome to "The Interconnectedness of Things," where hosts Dr. Andrew Hutson and Emily Nava explore the ever-evolving landscape of technology, innovation, and how these forces shape our world. Each episode dives deep into the critical topics of enterprise solutions, AI, document management, and more, offering insights and practical advice for businesses and tech enthusiasts alike.

Brought to you by QFlow Systems
QFlow helps manage your documents in a secure and organized way. It works with your existing software to make it easy for you to find all your documents in one place. Discover how QFlow can transform your organization at qflow.com

Follow Us!
Andrew Hutson - LinkedIn
Emily Nava - LinkedIn 

Intro and Outro music provided by Marser

WEBVTT

00:00:02.916 --> 00:00:17.726
<v Nava Emily>Hello, and welcome back to another episode of the interconnectedness of things. I'm your friendly host, Emily Nava, and I'm joined here today with our COO, doctor Andrew Hudson. Hi. Hi, Hudson.

00:00:17.727 --> 00:00:19.770
<v Hutson>Hey.

00:00:19.771 --> 00:00:51.200
<v Nava Emily>So last episode, we talked about, work intake and how work is put into systems. This episode we're talking about once work or data is put into your system, now what?

00:00:51.201 --> 00:01:10.766
<v Nava Emily>Okay. So I'm just gonna start off here with a question for you, Hudson.

00:01:10.767 --> 00:01:12.595
<v Hutson>Alright. Set me up.

00:01:12.596 --> 00:01:21.086
<v Nava Emily>So why is document identification and triage still one of the slowest parts of government work?

00:01:21.087 --> 00:02:02.300
<v Hutson>Oh, man. This is a good question. And I'm gonna refer back to stuff that we said in the last episode. So just those listening, hey, hop back there and take it a listen if you're interested in this kind of stuff. So If we assume that the best way to set up intake and hope to use AI tools is through structure, that does place a burden on the individual to identify and route those documents through processes.

00:02:02.301 --> 00:02:04.835
<v Hutson>It's a big cognitive load.

00:02:04.836 --> 00:02:06.275
<v Nava Emily>Mhmm. It takes a lot of time.

00:02:05.707 --> 00:03:02.326
<v Hutson>Something that can change a lot. Oh, yeah. It takes a lot of time. And, also, the the target changes every unpredictable amount of time to adjust to changing environments, changing, regulations, changing climates, all the words about that could change, just makes it inconsistent. So In my view, at least over the last two decades, if not longer, private industry catching up into public have invested a huge amount of time categorizing, identifying, and adding metadata to specific documents for the purpose of Retrieval.

00:03:02.327 --> 00:03:09.955
<v Hutson>That's why they did it. How can I find it again? So standardization was put in how we classify stuff with centralized,

00:03:09.956 --> 00:03:12.005
<v Nava Emily>Mhmm.

00:03:12.006 --> 00:03:26.609
<v Hutson>just to try to give some hope that we could find it again and then later, not only retrieval, but relation. These documents are related to these other documents by some commonality.

00:03:26.610 --> 00:03:35.395
<v Nava Emily>So to ground this in a real situation, SharePoint, which a lot of Agencies use that search bar in SharePoint.

00:03:34.706 --> 00:03:35.955
<v Hutson>Mhmm.

00:03:35.956 --> 00:03:48.161
<v Nava Emily>How can we make that work the best it can? And that's by adding different metadata to the documents. Is that what you're getting at?

00:03:48.162 --> 00:04:32.301
<v Hutson>That's what I'm getting at. And I'll and I'll go farther, not using the search bar. They had metadata tags in SharePoint, advanced searches, to target these fields. They call them site columns. Just try to get stuff back. And it's cumbersome and it's hard and not everybody knows how to do it. And further, if you have new folks coming in, or You have seasoned folks going out, then that knowledge, skill, and ability, turns into kind of a a corporate amnesia.

00:04:32.302 --> 00:04:39.075
<v Hutson>How is this supposed to be classified? Why do we do it that way? What's the benefit of this?

00:04:39.076 --> 00:04:39.741
<v Nava Emily>Mhmm. Right.

00:04:39.742 --> 00:05:11.515
<v Hutson>That's a, that's a tone everybody uses. So I think we're in a a moment of change that has been accelerated by the use of AI, but has been around before the craze of AI. And that's something called vector embeddings. Oh my god. If you're googling it right now, I don't blame you.

00:05:11.516 --> 00:05:16.286
<v Nava Emily>Which I have done several times.

00:05:16.287 --> 00:05:23.435
<v Hutson>So now I'm interested. Nava, you've looked it up. What's your understanding of vector embedding so far?

00:05:23.436 --> 00:05:35.980
<v Nava Emily>Well, okay. First I should preface that my original context for vector, the word vector is through the lens of graphic design. And so to me,

00:05:35.101 --> 00:05:36.300
<v Hutson>Oh, yeah. Yeah.

00:05:36.301 --> 00:05:50.906
<v Nava Emily>that's just an endlessly scalable image. I didn't know what it was going on in the background. All I knew is I could make a logo on a button, or I could make it on a billboard, and it doesn't matter.

00:05:50.907 --> 00:05:51.494
<v Hutson>Mhmm.

00:05:51.495 --> 00:05:56.600
<v Nava Emily>But that mhmm.

00:05:52.012 --> 00:05:58.280
<v Hutson>First of all, I love that that application of vectors. Well done.

00:05:58.281 --> 00:06:26.211
<v Nava Emily>But now in the age of AI, Vector means something a little bit different. Although I think The technical background of it might be the same. From what I can understand is a vector embedding is information that is stored in numb numbers.

00:06:26.212 --> 00:06:52.835
<v Hutson>Mhmm. Nailed it. So like vector embeddings, an SVG, Which is a vector graphic file extension, how that that vector or that image can scale is because it's not defined by pixels. It's defined by math,

00:06:52.836 --> 00:06:53.281
<v Nava Emily>Yapma.

00:06:53.282 --> 00:07:25.341
<v Hutson>Math being the vectors. And so it's a it's an articulation of an angle and a position, that's if you go look at the paths are just anchor points and where they're positioned. And then how they they connect to one another. So it can scale infinitely, and not be a problem, which is cool. I think it's cool. Anybody don't think it's cool? Like, alright. That's fine. Stick with your graphics.

00:07:25.342 --> 00:07:31.135
<v Hutson>Now when we move to vector embeddings,

00:07:31.136 --> 00:07:31.666
<v Nava Emily>Mhmm.

00:07:31.667 --> 00:07:34.670
<v Hutson>Gonna do as simple of an explanation. I can't you tell me how I do.

00:07:34.671 --> 00:07:35.586
<v Nava Emily>Okay.

00:07:35.587 --> 00:08:08.606
<v Hutson>So using a simple model, LLM model, it's going to look at your documents and the words, and it's going to convert them into vector math. So it's basically saying here is the numeric Definition of your content. Why in the world would you care about that?

00:08:08.607 --> 00:08:37.350
<v Hutson>Well, my friend, if we hearken back to basic trigonometry and cosine similarities, What this is able to do is actually measure semantic relevance between documents because of their Of this vector translation from an LLM, so that you can see if something is related to another document more than just keywords.

00:08:37.351 --> 00:08:39.181
<v Nava Emily>Gotcha. Okay.

00:08:39.182 --> 00:09:12.700
<v Hutson>Okay. Great. Now what? I could see if something is kind of like something else. Cool. Thanks for coming out. Well, alright. Calm down. What if this can do if used intelligently, and this is in our product that we're doing this, is you can convert these vector embeddings alongside those keywords.

00:09:12.701 --> 00:10:03.526
<v Hutson>If your movement from paper to digital included OCR, optical character recognition, Which then converted your image of your scan into letters. So that next step of converting it into vectors allows you to see that similarity. And then, if you take that similarity and pair it with the actual metadata from the document, now you can then see I have a new document entering the system, sans metadata, But it is similar to these 10,000 other documents that are classified in a very particular way.

00:10:03.527 --> 00:10:36.451
<v Hutson>And so it can use those to then identify and, fill out the different properties that that document could have. That severely Decreases the burden by nearly 90% or more, when identifying a document saying, Hey, we've seen this before.

00:10:36.452 --> 00:10:51.635
<v Hutson>And if you invested decades in identifying those documents and they've been relatively similar and holy cow through the roof, You can really just get the system to do the work for you.

00:10:51.636 --> 00:10:53.075
<v Nava Emily>And that's true automation.

00:10:51.786 --> 00:10:53.715
<v Hutson>And to me, that's go ahead.

00:10:53.716 --> 00:10:57.625
<v Nava Emily>And that's true automation to me.

00:10:57.626 --> 00:11:28.141
<v Hutson>Yeah. It's not it's not a workflow per se, but it's doing it's unburdening the worker from doing the thing that's slowing them down. Because honestly, there isn't much value. You look at a of a time spent to value made, it's not very big. When you're like, it's it's kinda upside down when you're classifying something. So what will happen more often than not? And I've seen this with countless teams.

00:11:28.142 --> 00:11:30.735
<v Hutson>They look for workarounds to not have to fill it out,

00:11:30.736 --> 00:11:31.341
<v Nava Emily>Yep.

00:11:31.342 --> 00:11:34.290
<v Hutson>or they make everything optional, which makes it moot.

00:11:34.291 --> 00:11:36.701
<v Nava Emily>Mhmm.

00:11:36.702 --> 00:12:12.486
<v Hutson>In our system and how we deployed it and coached our customers is that It now opens the door for workflows that can be articulated and triggered based on metadata, automated retention schedules that can now be based on that metadata. And the next benefit coming is we can auto index it and auto identify it. Based off of the work that you've already put in, now you get an additional benefit with it not having to do a single thing more.

00:12:12.487 --> 00:12:15.115
<v Hutson>So I to me is why it's pretty powerful.

00:12:15.116 --> 00:12:25.195
<v Nava Emily>Yeah. And, again, we keep saying this over and over, but the more standardized you are at the beginning of the process, the smoother the rest of the process is gonna be.

00:12:24.752 --> 00:12:25.835
<v Hutson>Oh,

00:12:25.836 --> 00:12:27.115
<v Nava Emily>If you have a train wreck at the beginning,

00:12:26.672 --> 00:12:32.351
<v Hutson>yeah. You can't put the toothpaste back in

00:12:30.861 --> 00:12:32.511
<v Nava Emily>it's gonna be a train.

00:12:32.512 --> 00:12:34.780
<v Hutson>the tube, as they say,

00:12:34.781 --> 00:12:35.340
<v Nava Emily>They do say that.

00:12:35.232 --> 00:12:41.335
<v Hutson>which isn't true. I totally tried it. You can do it.

00:12:41.336 --> 00:12:47.015
<v Nava Emily>That was one of that was, like, a life lesson thing we did in third grade. A teacher passed out a toothpaste

00:12:45.787 --> 00:12:48.695
<v Hutson>Yeah.

00:12:48.696 --> 00:12:51.735
<v Nava Emily>to everybody, to all the students, and had to squeeze it all out on our desk,

00:12:50.907 --> 00:12:52.455
<v Hutson>Uh-huh.

00:12:52.456 --> 00:12:53.655
<v Nava Emily>which was fun for a third grader.

00:12:52.962 --> 00:12:54.589
<v Hutson>Uh-huh.

00:12:54.590 --> 00:12:55.230
<v Nava Emily>And then she said,

00:12:54.721 --> 00:12:55.389
<v Hutson>Yeah.

00:12:55.390 --> 00:12:58.241
<v Nava Emily>now put it back in.

00:12:58.242 --> 00:13:02.749
<v Hutson>No problem. I'm gonna need a compressor.

00:13:02.750 --> 00:13:05.406
<v Nava Emily>I think we got a toothpick.

00:13:05.407 --> 00:13:11.005
<v Hutson>Oh, well, that that that I mean, obviously, it's not gonna work with a toothpick. That's just the wrong tool.

00:13:10.216 --> 00:13:12.366
<v Nava Emily>Right.

00:13:12.367 --> 00:13:20.121
<v Hutson>That's that's trying to make you prove a point rather than giving you the freedom to prove or disprove it.

00:13:18.616 --> 00:13:25.310
<v Nava Emily>Well, when you're in a class full of third graders, you can kinda railroad things like that. But but,

00:13:23.882 --> 00:13:25.790
<v Hutson>Yeah. I guess so.

00:13:25.791 --> 00:13:27.801
<v Nava Emily>I mean, the sentiment's the same.

00:13:27.802 --> 00:13:40.355
<v Hutson>I'm gonna I'm gonna give that teacher talking to. I've decided. Oh, how'd I do with my vector explanation?

00:13:40.356 --> 00:13:45.155
<v Nava Emily>It started off I was getting a little glazed over, but then it started to make sense.

00:13:44.487 --> 00:13:45.715
<v Hutson>Yeah.

00:13:45.716 --> 00:14:06.086
<v Nava Emily>Once once you were talking about noting the similarities in the system is like, okay. I've seen something similar to this before. Maybe not exactly this, but something similar. And now we can connect the dots that way.

00:14:06.087 --> 00:14:15.455
<v Hutson>It seems to be the vector seems to be more reliable than saying, okay, I see all the words you've used, and this document has a lot of the same words.

00:14:15.456 --> 00:14:17.686
<v Nava Emily>Mhmm.

00:14:17.687 --> 00:14:50.215
<v Hutson>For for some reason, that doesn't work. Just because you have the same words, doesn't make it the same document. Because it has to do with, where those words are, in context and how they relate to what is around surrounding those words. That is really how humans can see the similarity.

00:14:50.216 --> 00:15:12.855
<v Nava Emily>Mhmm. So in that, that allows for variability in your process too, which is gonna Mhmm.

00:14:59.766 --> 00:15:35.160
<v Hutson>Which is gonna happen. Yeah. There's nothing you can do about it. So this is it's just a cool way that you can start to find the signal through the noise. Think about trying to do that with just a paper record. Boggles the mind. So now you got it to digital and you invested that time. And now we're just converting the digital Mathematically so that you can get closer or really a better ROI from the investment that you made identifying all the documents.

00:15:35.161 --> 00:15:35.826
<v Nava Emily>Yeah.

00:15:35.827 --> 00:15:56.546
<v Hutson>And that's always something I challenge back every time we're, you know, you're doing a data warehouse, you're doing a new intake Roam, like, what's in it for you? And if you can't say what it's in it for someone, it becomes real hard. In change management.

00:15:55.276 --> 00:15:58.874
<v Nava Emily>Right. Which is which is the name of the game when you're implementing

00:15:58.067 --> 00:15:59.914
<v Hutson>Holy. Oh,

00:15:59.915 --> 00:16:02.270
<v Nava Emily>any kind of AI or automation?

00:16:01.266 --> 00:16:36.141
<v Hutson>yeah. Yeah. Any any of this on you want me to handle the reins over to a computer? I get hit with two questions, typically. First, how dare you? Followed up with, who do you think you are? So that's a tough conversation starter, but we get through it. Mostly, it's my answer is, I I didn't think of myself as very daring, and I Roam no one of importance.

00:16:36.142 --> 00:16:51.895
<v Hutson>So let's go back to the the process, shall we? Alright. That's all I got on this one. You got you got more stuff you wanna talk about?

00:16:51.896 --> 00:17:28.160
<v Nava Emily>Yeah. So kind of like with this first one, if somebody, let's just say leader of a federal agency mission area, they've, they're pretty good with their intake process. They've got it standardized. They Oh, to look forward to ask for, what would be their next step in possibly implementing some AI into the digit digitization or identification of documents.

00:17:28.161 --> 00:17:32.106
<v Nava Emily>What kind of questions should they be asking?

00:17:32.107 --> 00:17:45.594
<v Hutson>Well, like the first question, what is all this work that we had setting up the intake doing? What's doing for us? Is it just organizing for organizing sake, or is it some end in mind?

00:17:45.595 --> 00:17:46.571
<v Nava Emily>Mhmm.

00:17:46.572 --> 00:18:43.121
<v Hutson>Or are they experiencing pushback? On filling out all these fields, classifying everything. You know, those two scenarios are pretty direct on give me a call or you I guess you can do your research on how you can create those embeddings and leverage them in your work process. For us, it's just built in. So if you want an easy button on having it done, And flip the switch on embeddings, and now you got the power to do other things, which on every conversation we're having recently, the retirements that folks are going through and changing the jobs and reduction in funding, this is, like, perfectly positioned to help folks out.

00:18:43.122 --> 00:18:55.175
<v Hutson>And at the end of the day, that's that's why we're doing this. I hate to see folks miserable in their job just as they got to upload a document. Like, let's get you back to what you love doing.

00:18:55.176 --> 00:18:57.726
<v Nava Emily>Yep. Ease the burden.

00:18:57.727 --> 00:19:00.790
<v Hutson>Yep.

00:19:00.791 --> 00:19:39.680
<v Nava Emily>Alright. Well, that's phase two of the life cycle of work. Again, we started intake. Now we're at the stage where you're indexing your documents, digitizing it, organizing them so that they can be retrieved later. The next episode, we are most likely talking about what happens when you do need to retrieve that document, when it needs to be escalated, when it needs to be routed and how that process can work either through standardization AI or both, and what happens when it breaks.

00:19:39.681 --> 00:19:45.360
<v Nava Emily>So I hope that you enjoyed this episode if you did, if you found it insightful, if you found it funny.