Web-scraping or how to automate absolutely everything

Web-scraping

You maybe have learned from our article on API that this is a way to download data and present it in an interface in a spreadsheet. This method is very convenient and provides ready-made structured data from the primary source. But what if we need information, for example, about prices for a category of goods in an online store.

Agree with that, it would be nice to have these data, for example, 230 pages of 100 pieces on one page of items, with one button to unload in the form of a table, in about half a minute.

In this article we will tell you what web scraping is and how it can be useful for a financial specialist. Web-scraping is the process of gathering information from a web page. This method relies on finding information by defining markup language elements (html, xhtml).

An example of the steps for forming a database using Web-scraping:

  1. Execution of a get-request (url page on which the necessary information is displayed). For example, in python, this can be done using the requests library.
  1. Search and collecting data. The BeautifulSoup4 library can help with this. Using the find method, we find the elements we need on the page. (For example, a card of some product). You will notice that on any online store, the product cards are outwardly the same. Therefore, most likely in html, each of the card markup objects has the same class names (identifiers). These are the ones we refer to when looking for information. Selecting these cards will help you create a loop that finds information on all other cards by referencing the identifier. We now have a list of information.
  2. Launching into the CaseWare IDEA spreadsheet. Soon a video on how to connect to the IDEA program using Python will be released on our YouTube channel. And how much will it expand your capabilities not only in process automation, but also in more in-depth data analysis.

With the help of web scraping, you can open the way for unlimited collection of information in an automated audit and data analysis process.