Technology Stack: PHP, Selenium, MySQL, Laravel, Jquery, BootStrap
I was the lead developer in a small team tasked with creating an automated screen scrape tool for two social media sites. The web application is called Data 360. Data 360 contained a responsive web application that gave users access to the scraped data.
Data 360 had four main components: (1) user and project management, (2) database querying and management, (3) Selenium screen scraper for social media sites, (4) spider a website. Data 360 was designed to automatically scrape two social media sites Ask.fm and Vkontakte.com.Selinum would crawl the site and scrape data based on some configurable terms and store the results an a full-text search enable MySql database. This data can then be queried and archived from the web interface.
he Data 360 user and project management component has a three tier authentication system. The three role ares: (1) Admin, (2) Read-Only, and (3) Read-Write. Data 360 project management allows authenticated users to create projects that will be used to store specific database queries relevant to that project.
The Data 360 database querying and management section allows authenticated users to query all databases with an open text field. You can preview or export the results. Authorized users can also access a graphical view of the database structure off all associated databases.
The Data 360 Selenium screen scrape tool is two customized instances of Selenium, that would crawl the social media sites Ask.fm and Vkontakte.com. It scrapes and stores postings and some images based configurable search terms. The scrape is automated to fire off every night and there is also an archive procedure to keep the active data set from getting too massive.
The Data 360 spider a website feature allows an authorized user to initiate a web crawl of a certain website and screen scrape data based on certain terms. The data derived from these crawls could be queried and then archived once it was no longer needed.