A NodeJS based web-Crawler which can scale on the go!
- Support both Static & Dynamic Page Crawling
- Linux (Ubuntu)
- Redis
- Nodejs
- Install
NodeJSby executing the below command in root directory of project:$ cd init-scripts/ $ sudo bash install-nodejs.sh - Install
Redis$ sudo bash install-redis.sh
- Install project dependencies. In root directory of the project execute the following command:
$ npm install
$ node index.js "<url>" "path-to-store-url"
$ node index.js "https://stacksapien.com" "./temp"- In Above Example, Files like
valid-urls.txt,external-urls.txt&invalid-urls.txtwill be generated intempfolder of your git project directory.