Join our community of software engineering leaders and aspirational developers. Always
stay in-the-know by getting the most important news and exclusive content delivered
fresh to your inbox to learn more about at-scale software development.
RESUBSCRIPTION REQUIRED
It seems that you’ve previously unsubscribed from our newsletter
in the past. Click the button below to open the re-subscribe form
in a new tab. When you’re done, simply close that tab and continue
with this form to complete your subscription.
RE-SUBSCRIBE
The New Stack does not sell your information or share it with
unaffiliated third parties. By continuing, you agree to our
Terms of Use and
Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
COUNTRY
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
How many employees are in the organization you work with?
REQUIRED
Welcome!
We’re so glad you’re here. You can expect all the best TNS content to arrive
Monday through Friday to keep you on top of the news and at the top of your game.
What’s next?
Check your inbox for a confirmation email where you can adjust your preferences
and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your
first TNS newsletter.
As a JavaScript developer, what non-React tools do you use most often?
✓
Angular
0%
✓
Astro
0%
✓
Svelte
0%
✓
Vue.js
0%
✓
Other
0%
✓
I only use React
0%
✓
I don’t use JavaScript
0%
2025-01-17 12:04:43
How to Use AI for Company Documents: Summarization, Extraction, and Beyond
contributed,
How AI simplifies document workflows with intelligent extraction.
Jan 17th, 2025 12:04pm by
Tommy Thyen
Every organization handles documents in some way: Registration forms, invoices, blog posts, and technical write-ups, just to name a few. These documents are critical in communicating information between different departments and customers. They contain seemingly limitless combinations of styles and data types in seemingly limitless file formats. With all these means of receiving information, extracting it accurately in a format that provides context for the user to absorb it can be difficult.
Raw data extraction has been around for years. Still, with recent advances in artificial intelligence, we can now add Intelligent Document Processing (IDP) and summarization capabilities to document workflows. From a software development perspective, various document styles and input formats take hours of manual work to account for. Tables were a particular area of concern, as they vary widely in structure. Some have column headers, some have blank cells, and some exist as an image within a document. With IDP, advanced AI models can make this type of extraction trivial. Tables can now be consumed regardless of their structure and output with a logical row/column format, typically presented in JSON or XML.
In addition to structural context, Large Language Models can provide human-like summarizations of input documents. This can trim hours of reading to a single-paragraph summary and even extend beyond documents to summarizing virtual meetings or other long-form content. Retrieval Augmented Generation (RAG) adds to this feature by allowing LLMs to reference sources that scope beyond their original training data. This provides a way to maintain accurate responses as time passes and information shifts. This summarization plus structured output is the most significant advantage of modern AI regarding document-related workflows.
Speaking from personal experience, I use public LLMs like Microsoft’s Copilot and OpenAI’s ChatGPT more often than I admit. Contrary to popular belief, these AI assistants cannot do your job for you. What they do provide, however, is a fantastic ability to condense web search scope down to only relevant information, as well as trivializing mundane tasks like simple syntax differences between coding languages. Before this type of AI, developers could spend hours searching for the right forum post that answered their question or days parsing obscure documentation to find a specific class/method that meets the requirement they are looking to achieve. Instead, a well-formulated prompt can output the perfect answer with related reference links in seconds.
These benefits come with a fair share of tradeoffs regarding data privacy and the ethical concerns of AI. LLMs must be trained before use, which requires massive amounts of validated inputs for accurate results. This creates questions like: Where did this data come from? Who owns it? And who validated it? High-volume models accessible via APIs can refine their results based on user prompts. This means that input data like code snippets, images, or documents are processed and potentially reveal Personally Identifiable Information (PII). Developers must take exceptional care when using these resources to prevent unwanted sharing of confidential data.
Access to these online models has never been easier. Most have a free tier with an (almost) unlimited number of uses. Nowadays, you can even grab the underlying source code and create your models, training them on data you provide for problems you need to solve. This technology can be embedded in all types of applications, providing awesome capabilities and a huge increase in productivity. However, Uncle Ben from the original Spiderman had it right when he said, “With great power comes great responsibility.” Data and privacy must be protected. Regulations must be set, and guidelines must be followed to utilize the capabilities AI provides legally and optimally.
Overall, AI is a potent tool that boosts productivity and efficiency, leading to both making and saving more money. It fills a massive gap in document-based data extraction, providing contextual outputs that can be quickly analyzed to produce an optimal action plan. Its summarization capabilities expand beyond just documents to web searches about any topic you want to know more about. AI is an invaluable asset to any organization if the technology is understood and the proper precautions are taken.
TRENDING STORIES
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don’t miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.
SUBSCRIBE
Group
Created with Sketch.

Tommy Thyen is a Solution Engineer and Developer Advocate at Apryse, the market leader in document processing technology. In his current role, Tommy is at the intersection of technology and customer success, specializing in a broad range of programming languages….
Read more from Tommy Thyen
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER
Receive a free roundup of the most recent TNS articles in your inbox each day.
Credit to the Original Article | Explore More of Their Work If You Found This Article Enjoyable.
https://thenewstack.io/how-to-use-ai-for-company-documents-summarization-extraction-and-beyond/



