Server-side Jobs

Currently the "website crawler" is the only job running in parallel to the application server(s).

Website crawler

The website crawler has to be started together with the application server. Its task is the classification of websites specified by users during the registration process. The crawler constantly watches the "websites" database table for newly added web pages. Once a new web page was found the crawler tries to download this page, to extract all words from it and to classify the resulting bag-of-words into the categories tree. The results are added to the table.

Websites are currently only crawled once.