AI-powered data analysis with Python and Github Copilot

AI-powered coding makes it easier than ever to analyze data. We show how to perform an interactive Python analysis in Visual Studio Code with the Github Copilot. Including the full code as download. 

How does AI-powered coding for data analytics work?

Instead of analysing and visualizing your data in Excel, databases or dashboards, you can also do this with Python and other programming languages. The learning curve is steep at first, but you will save valuable time in the long run. Once you have developed the analysis code, you can often reuse it easily. This allows you to perform the same coding tasks and analyses faster and faster. You can also avoid errors that often occur when copying back and forth in Excel.

The first step is to choose a suitable development environment (IDE). Visual Studio Code, Google Colab or other tools are often used here (see article: The most popular Python IDEs). Python has proven to be a particularly good programming language for data analyses. Analyses with Python are best implemented in so-called “Python Notebooks”. This is because the code can be executed quickly in code cells and results can be visualized and even documented. Python notebooks are a mixture of code environment and wiki.

Generative AI can be used for AI support, as this makes it possible to generate code instead of rewriting it. This is exactly what the Github Copilot offers.

What is Github Copilot?

Github Copilot is an AI tool that can generate code in many programming languages. It is based on a cooperation between GitHub and OpenAI. The underlying language model is called “Codex” (technically based on GPT-3) and has been trained on the extensive code data of the popular Github platform. Github Copilot is available as a plugin for the most important IDEs and delivers very good code results with which you can significantly increase your coding and data analytics productivity.

Advantages of Github Copilot

  • Faster coding (our opinion: a 10-fold increase is possible)
  • Code in all programming languages and frameworks, even if you haven’t learned them before
  • It’s easier to learn coding because you can ask Copilot questions about the code at any time
  • Usecomplex data analyses such as clustering, regression or classification, even if you are not a data scientist
  • Developers can have code documented automatically and create unit tests automatically

In which IDEs can Github Copilot be used?

  • Visual Studio Code
  • Visual Studio
  • JetBrains suite (IntelliJ IDEA and others)
  • Vim
  • Neovim
  • Azure Data Studio
  • Github.com (Github Copilot Enterprise)

What does Github Copilot cost?

  • approx. 10 € / month

Which languages can Github Copilot generate?

The answer is simple: Github Copilot can code all programming languages that are available on Github. This means almost all

  • Python
  • Java
  • PHP
  • HTML, CSS
  • Javascript, Typescript
  • Javascript frameworks such as React, Angular, vue.js, Svelte
  • SQL
  • Scala, Rust, Ruby, Go, …
  • various others

Tutorial: Interactive data analysis with Python and Github Copilot

In this tutorial, we show you how to implement an SEO analysis of ranking changes with Python. A small highlight: You can even use a dropdown to interactively select which URL you want to analyze. We use Visual Studio Code and the Github Copilot as tools.

Steps:

  1. Install Visual Studio Code
  2. Install Github Copilot
  3. Download Sistrix data
  4. Prepare Python analysis (install Python, create file)
  5. Perform Python analysis

Step 1: Install Visual Studio Code

Visual Studio Code is Microsoft’s free code solution and probably the most widely used IDE in the world. You can download Visual Studio Code for Windows, MacOS or Linux here:

Visual Studio Code ist eine der beliebtesten kostenlosen IDEs zur Python-Entwicklung
Visual Studio Code is one of the most popular free IDEs for Python development

Step 2: Install Github Copilot

For the Github Copilot you need a Github account and a Copilot subscription (10€/month)

Create a Github account

Install Github Copilot

Activate Visual Studio extensions

  • To be able to use Copilot and Python in VS Code in general, a few extensions must now be installed
  • To do this, click on the extension icon in the menu on the left, search for the extension using the search function and click to install.
  • Practical: Many of the extensions are automatically suggested to you. Simply click on “Install” in the notification window.

  • Check whether the Github copilot is activated: The copilot is now displayed in the icon at the bottom right

Step 3: Download Sistrix data

To analyze SEO ranking data, we download the data from Sistrix here:

  • Keywords > Ranking Changes > Export

Step 4: Prepare Python analysis (install Python, create file)

Install Python

Create a new Python file

  • Create a Python file: For the analysis, we create a new Python file in Visual Studio Code, which we name “seo-analysis.ipynb”. This identifies the Python notebook format.
  • Extensions: If we are still missing extensions for this, VS Code shows us a note at the bottom right, which we simply confirm.

Select Virtual Environment

  • The most difficult step for Python beginners: Python needs a virtual environment (“venv”) in which the Python runtime and all necessary libraries etc. are installed. Click on “Run All” in the menu at the top to execute the code and select the proposed Python environment in the dialog that appears.

Step 5: Create Python analysis with Github Copilot

Now that we have everything ready, let’s create the actual analysis.

Setting up our notebook:

We prepare the data in a Python notebook cell and display it in another cell. In between, we place headings as Markdown cells. To make headings bold, you can mark them with the “#” character space, e.g. “# Step 1”. Tip: Headlines can be marked in different sizes, e.g. with “# Headline 1”, “## Headline 2”, “### Headline 3” etc.

Use Github Copilot:

You can use the Copilot in several ways:

  • In the editor: Suggestions are made automatically when coding. Press Tab to accept the suggested line. You then jump to the next line with Return and accept again with Tab until the code is finished.
  • In the window: Press CTRL I (MacOS: Command I ) to open a prompt window. Here you can describe what you want to have done. E.g. “Create a tool for analyzing rankings” or “add documentation” or others. Similar to a Git diff, you can now see a comparison of the generated and original code and you can accept or reject the code.
  • In the chat: In the black menu bar on the left, you open the chat area and can now chat with the copilot. This is practical as the chat is displayed next to the code.

Python code – Part 1: Reading in data

We let Github Copilot create the code to read in and clean the file. Here are some suggestions for a suitable prompt.

read the csv file "seo-data.csv", separated by ";"

create a new column "Type" with the value "Lost" if Position#2 is NaN, "Lower" if Position#2 is higher than Position#1, "Higher" if Position#2 is lower than Position#1

set Change to difference between Position#1 and Position#2

remove entries where SearchVolume is 0

create df_result. This contains the grouped data by URL, the sum of ClicksDelta

Python code – Part 2: Displaying charts

Now we let Github Copilot display the charts. Here are some suggestions for the prompt:

Show two histograms:
1. Change histogram: Show count of keywords for by position. Bin size is 5.
2. Position histogram: Show in the same histogram how many rankings per position we have on date #1 (blue) vs. date #2 (orange). Bin size is 5.

Add a dropdown which contains all URLs. When changing the dropdown, show the chart for the selected URL.

The result: Interactive SEO data analysis in the Python notebook

With this quickly created tool, you can now interactively analyze your data and gain insights. This type of analysis would be much more time-consuming in Excel, as you would have to create charts for each of the 500 URLs. And with just a few clicks, the analysis can be carried out again and again with new data or for other projects. New features can now also be added using AI coding support with Github Copilot. So you can build your own analysis tool just the way you want it. Have fun coding!

Here you can download the full code: