Blog

New function – CleanText word count analysis. What is it?

We at WebsiteDownloader have some exciting news to share with you. We have recently released CleanText, the new file format that makes exporting text from websites easier and more efficient than ever before! But before we dive into the specifics of this new functionality, here’s a mandatory corny joke to start each blog with.

Why did the translator cross the road? To get to the CleanText functionality, of course!

With that formality out of the way, let’s get down to business, shall we?

Introducing CleanText: The new file format for efficient website text analysis

CleanText is a must-have functionality for any translation agency or individual translator looking to streamline their workflow and save time and money. With CleanText, you can now export text from a website into a .txt document without any of the repetitive text, such as menus, headers, and footers, all with just a few clicks.

CleanText also improves the accuracy of word count analysis for translation projects by removing all the repeated text that only needs to be translated once, resulting in a more precise estimate of the total word count. Translation agencies can provide more accurate quotes to their clients and avoid overcharging or undercharging for their services.

CleanText was created in response to feedback from our users, who asked us to create a faster, easier, and more cost-effective solution than traditional computer-assisted translation (CAT) tools. With CleanText, we have achieved that and much more!

Best of CleanText: Analyzing repetitions and segments in different languages for better translation quotes

One of the most useful features of CleanText for translators and translation agencies is its ability to analyze repetitions and no matches of segments and source words each, similar to well-known CAT tools like memoQ and Trados. This feature is particularly valuable for translators who are looking to optimize their translation process and prepare offers for clients based on word count.

CleanText is designed to provide precise, high-quality content that is ready for translation, with the added benefit of reducing translation costs by avoiding overcharging for repetitive text. By using this new file format, translation agencies can prepare quotes for their clients in less time and with greater accuracy and avoid the hassle of having to sort through irrelevant content.

CleanText also includes a language detection feature, which automatically detects the language of the website being analyzed. This is a valuable feature for translators, as it allows them to quickly identify the source language of the content they will be translating.

But what sets CleanText apart from other similar tools is how easy it is to use. You don’t need to worry about getting your hands on the right files. Simply input the website URL and let the tool scan and analyze the content for you. You’ll get an extremely accurate word count and can even choose to download only the files you need – whether it’s all content, no matches, or just repetitions. This makes CleanText a truly time and cost-effective method for website translation.

CleanText limitations: What you need to know before using the tool

However, it’s important to keep in mind that CleanText currently only supports 18 languages. These languages include Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Italian, Norwegian, Polish, Portuguese, Russian, Slovenian, Spanish, Swedish, and Turkish. If you are working with a website that is written in a language that is not currently supported by CleanText, English will be used for text segmentation. But don’t worry, our developers are constantly working to add support for new languages, so it’s possible that more languages will be supported in the future.

CleanText features: The juice of it all!

  • Repetition analysis: This feature allows translators to easily identify and eliminate repetitive text, saving time and improving accuracy in wordcount analysis.
  • No matches analysis: This feature helps identify text that does not have a matching source on the website, which can help in identifying missing or hidden content.
  • Segment analysis: This feature breaks down the text into segments for easier translation and editing.
  • Source word analysis: This feature helps in identifying the total number of source words in the text, allowing translators to estimate the time required for translation more accurately.
  • Time elapsed functionality: This feature allows users to keep track of the time that has elapsed since the analysis started.
  • Status indicator: This shows users the current status of the analysis, such as whether it is currently running or complete.
  • Language detection: This feature automatically detects the language of the website being analyzed, allowing translators to quickly identify the source language of the content they will be translating.
  • Selective URL analysis: Users can choose to analyze only the content on selected URLs, which can help speed up the analysis process and make it more accurate.

One of the most impressive features of CleanText is its ability to export text into a TXT file that is properly segmented for easy import into any CAT tool. This makes the translation process much smoother and eliminates the need for manual cleanup, saving translators hours of work with each website.

How to use CleanText: A step-by-step guide to optimizing website content and preparing translation offers

How does CleanText work?

  1. First, navigate to WebsiteDownloader.com on your web browser.
  2. Enter the URL of the website you want to download and analyze in the input field provided on the homepage.
  3. Click on the “Start downloading your website” button. Wait until the scan is complete. Depending on the size of the website, the scan may take some time to finish. If you don’t need the entire site to be scanned, pause the process, and check what pages are scanned.
  4. Once the scan is complete, you can start using CleanText to analyze the scanned website. On the right side of the screen, you will see a panel with a list of options. Click on the “CleanText analysis” option. This will reveal a new section on the site.
  5. Click on the “Start analysis” to analyze the text content of the website. Don’t forget that only text content on selected URLs (subpages) will be analyzed!
  6. The time it takes to analyze the text content of the website depends on the size of the website and the number of pages (approx. 400 subpages/minute – this may change during heavy traffic on WebsiteDownloader.com).
  7. Once the analysis is complete, you can download the results by clicking on the “Download results” button. The downloaded file will be in a .zip format.
  8. Extract the files from the downloaded .zip file on your computer. You will find four files: “all”, “no matches text”, “repetition”, and a .csv file with analysis counts.
  9. The “all” file contains all the text content from the website. The “no matches text” file contains text content that has no matches on the website. The “repetition” file contains text content that repeats on different pages of the website, such as menus and footers. The .csv file contains the analysis counts.
  10. You can use the information provided in the analysis to optimize your website’s content and structure for better performance or for preparing a translation offer for your client.

What others say: Pavel’s journey

To give you an example of how CleanText can save you time and effort, let’s hear from one of our users. Pavel, a language professional, recently had a project where he needed to translate a website with a lot of complex formatting, embedded code, and other distractions that made word counting a hassle.

Before CleanText, Pavel would have had to manually clean up the text, which would have taken him hours. With CleanText, Pavel was able to download the text and get an accurate word count in seconds, which saved him a lot of time and effort.

Check out what Pavel thought about WebisteDownloader’s new functionality here: https://bit.ly/3nhJb8a

In conclusion, CleanText is an invaluable tool for translators, localization specialists, or anyone who works with web content. It is easy to use, cost-effective, and provides accurate analysis that saves time and improves accuracy in wordcount estimates. Check it out for yourself!

Thank you for reading, and happy translating!

WebsiteDownloader – saving anteaters everywhere from website copying fatigue!

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google
Spotify
Consent to display content from - Spotify
Sound Cloud
Consent to display content from - Sound