Website Sources

Website sources are a cornerstone in ensuring your getchat bot is equipped with the most current and relevant information from your site. Here's a comprehensive guide on how these sources function and how to optimize them.

How Website Sources work

Regular Content Downloads:

For every assigned website source, getchat diligently downloads and ingests its content into the chatbot's knowledge base. This ensures that the bot remains updated with the latest content changes on your website.

Efficient Content Extraction:

We prioritize clarity and efficiency. During the content ingestion phase, all HTML tags are stripped away. This ensures they don't inflate your character count, and the bot only focuses on the pure content.

Transparency in Content Capture:

Users can, at any time, view both the original HTML markup and the extracted content sans HTML tags. This transparency ensures you are always aware of the exact content that powers your chatbot's responses.

Responsive Linking:

If content from a particular page aligns with a user's query, the chatbot will provide links to the relevant website source in its reply. This establishes credibility and directs users to the primary source for detailed information.

Automated Source Creation with Crawlers:

As you expand and add new pages to your website, our crawlers are designed to automatically detect and create new website sources. For an in-depth understanding of how our crawlers operate, check out our documentation on crawlers.

More about crawlers

Things to consider

Update Frequency based on Plan:

How often getchat updates content from your website sources is dictated by the plan you've subscribed to. Each plan offers different download frequencies tailored to diverse needs. For a comprehensive breakdown of our plans, do visit our pricing page.

Adherence to Web Standards:

We respect the permissions set by websites. Our system only accesses content from sites that have granted permissions to robots.

The user agent for getchat is: getchat-crawler/1.0 (+https://getchat.org)

With this identifier, you can easily allow or block our bot's access to specific pages or sections of your site using the robots.txt file.

Need to understand more about robots.txt? Google provides a comprehensive guide on creating and managing a robots.txt file: https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt

Non-execution of JavaScript:

Our crawlers do not execute JavaScript. It's essential to ensure that vital content you wish the bot to be trained on is accessible even when JavaScript is disabled. This guarantees that the bot captures all the necessary data during its crawl.

Questions and support

For further assistance or inquiries about website sources, ask the getchat Bot or feel free to contact our support team.