Нужно решение на Java_Crawler Agent

Elena_83

Привіт!
Статус: Offline
Реєстрація: 25.05.2009
Повідом.: 2
Нужно решение на Java_Crawler Agent

Есть идеи по решению - пишите в личку и оставьте свою контактную инфо

Store Agent may be represented as 3 components:
1) Crawler
2) Administrative tool
3) Parser

Crawler

The crawler does not only craw through predefine site, it also has to be able to crawl through other related site – either by feeding it with key words or it doing it that automatically.

There is predefined list of sites (stores) in the system.
Crawler selects some site from this list, than it visits every page of selected site.
Crawler pre-treat every page of the site (separate images from the text, etc)
After pre-treatment Crawler saves all pages of the site into Primary database

Administrative tool
Administrative tool allows two major things:
1) adding sites to predefined list of stores in the system
2) check quality of parsing in Secondary database

We suppose that administrator checks quality of automatic parsing before appropriate site will be added to the stores list. It will be implemented in following way.
• There is task list in the system, it allows administrator to perform an isochronous activities in the system
• Administrator adds task to add some site to stores list
• Crawler checks task list, process appropriate site and save it into Primary database
• Parser parses and save results into Secondary Database, but site doesn’t receive status Approved by Administrator
• Administrator notices that Task is completed and approve or reject results
• After approving system adds site to the stores list

Periodically developers changes code of the site, that’s why administrator should check quality of Parser’s work. Administrative tool should allow administrator to perform such selective inspection and possibility to remove selected site from the stores list.

Parser
Similar to Crawler, Parser is agent in the system.
Parser monitors Primary database for a new sites added by Crawler
Work of the Parser is to divide pages into blocks (each block corresponds to some good in a store), after dividing Parser parses each block (i.e. split it into category, brand name, price, etc)
After that Parser save data into Secondary Database
 
Назад
Зверху Знизу