DEV Community: Vinícius Gajo

Microsoft Azure Trial Hackathon on DEV: OpenCV with Azure Functions and Python

Vinícius Gajo — Mon, 28 Feb 2022 13:37:52 +0000

Overview of My Submission

Have you ever though about deploying your custom Python + OpenCV projects in a cheap and easy-to-use platform? Read this post and find a way to do it using Azure Functions, the serverless compute service available at Microsoft Azure cloud.

In this tutorial I'll show you how to develop a simple Python application that uses the built-in Canny function to detect edges in images submitted by clients.

Like mentioned before, this project will be deployed in Azure Functions, where the client should submit an image through a POST HTTP endpoint, that is later manipulated by our Python script to extract its edges and send back to the client the final cool result.

Alright, let's start then. The setup I used to develop this project consists of:

Ubuntu 20.04 - OS;
Node.js v14.17.0 and npm 7.14.0;
Python 3.8.5 and pip 22.0.3;
Azure CLI tool, (AKA az) 2.31.0;
Insomnia (to trigger the endpoint, although you can use curl directly).

After installing the required tools (I suppose that you will be able to use newer versions, just give it a try), the first package you need to install is the azure-functions-core-tools using npm. Then, you could use it to start a new project:

# this command will install the required npm package
$ npm install -g azure-functions-core-tools
# later you can explore your global packages with
# npm ls --global

# next step is to start our project using some
# boilerplate to make it easier to develop
$ func init AzureFunctions-OpenCV
# select 4 for python project

After running those commands you will see a result like this on your screen:

Next step is to add a template code to this project folder:

# move within the project folder
$ cd AzureFunctions-OpenCV/

# create a new template project
$ func new
# select the option 9 for HttpTrigger

Open this folder in your IDE (Integrated Development Environment) of preference. In my case I'll go with VS Code, although recently I'm experimenting with Emacs and I'm really liking it (I'll write about it in a future post).

Continuing, now you need to update the requirements.txt file, adding the required packages to deal with OpenCV using Python. Make sure that your local file have this content:

# Do not include azure-functions-worker as it may conflict with the Azure Functions platform

azure-functions==1.9.0
numpy==1.20.1
opencv-contrib-python==4.5.5.62

There we just added the numpy and opencv-contrib-python packages, setting those package versions we want to use (best practice to increase reproducibility). To install it locally you could use the following command in the terminal:

# install packages mentioned in requirements.txt
$ pip install -r requirements.txt

Now let's start exploring our real project. Its main program will be inside the OpenCVHttpTrigger/__init__.py file. To assert that it is working fine in your local environment, you could just change it's code to print the OpenCV version that is installed.

First, copy this script to the init.py file:

import logging

import azure.functions as func

# import opencv package
import cv2

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')
    # print OpenCV's version
    logging.info(f'OpenCV version: {cv2.__version__}')

    name = req.params.get('name')
    if not name:
        try:
            req_body = req.get_json()
        except ValueError:
            pass
        else:
            name = req_body.get('name')

    if name:
        return func.HttpResponse(f"Hello, {name}. This HTTP triggered function executed successfully.")
    else:
        return func.HttpResponse(
             "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.",
             status_code=200
        )

Make sure that your project and dependencies are cool and start this project using the following command in the terminal:

# start this project locally
$ func host start

# ................
# in a different terminal window
# while running the server
# check its GET endpoint using curl
$ curl http://localhost:7071/api/OpenCVHttpTrigger

# results in
This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.

After reaching this endpoint you must see the following logs in the server:

There's a line there saying OpenCV version: 4.5.5, that is what we want to appear for now.

Cool, dependencies are ok. Let's focus now on setting our local environment to trigger our future application endpoint.

In this tutorial I'll show you two ways to do this. First, with Insomnia, you'll have access to a pretty UI. Later, I'll show you how to use curl in the terminal directly.

Since this is not the main goal of this tutorial I will not cover in much details how to configure your Insomnia locally from scratch. In your environment, after making sure that this program is installed, you'll be able to just load my configuration and have access to the same endpoints.

Just check this GIF to see how I loaded the configuration JSON for my Insomnia instance.

Now, getting back to the Azure Functions, if you check the official docs there are those important information (reference link).

The HTTP request length is limited to 100 MB (104,857,600 bytes), and the URL length is limited to 4 KB (4,096 bytes). These limits are specified by the httpRuntime element of the runtime's Web.config file.

If a function that uses the HTTP trigger doesn't complete within 230 seconds, the Azure Load Balancer will time out and return an HTTP 502 error. The function will continue running but will be unable to return an HTTP response. For long-running functions, we recommend that you follow async patterns and return a location where you can ping the status of the request. For information about how long a function can run, see Scale and hosting - Consumption plan.

So, you need to make sure that your request length is not bigger than 100 MB and that the computation time is less than 230 seconds.

Since we are dealing with images and this Canny algorithm is relatively fast and lightweight, we are safe to go. Finally, let's start exploring the real project code (__init__.py):

import logging
import azure.functions as func

import cv2
import numpy as np

# https://stackoverflow.com/a/37032551
def loadImageFromRequestBody (
        req: func.HttpRequest) -> [np.uint8]:
    """Load image as uint8 array from the request body."""
    img_bin = req.get_body()
    img_buffer = np.asarray(bytearray(img_bin), dtype=np.uint8)
    return img_buffer

# https://docs.opencv.org/4.x/dd/d1a/group__imgproc__feature.html#ga04723e007ed888ddf11d9ba04e2232de
# cv.Canny(image, threshold1, threshold2[, edges[, apertureSize[, L2gradient]]]) -> edges
# cv.Canny(dx, dy, threshold1, threshold2[, edges[, L2gradient]]) -> edges
def extractEdges (
        buf: [np.uint8],
        threshold1: int,
        threshold2: int) -> [np.uint8]:
    """Tranform the input image to show its edges using the Canny algorithm."""
    img = cv2.imdecode(buf, cv2.IMREAD_GRAYSCALE)
    img_edges = cv2.Canny(img, threshold1, threshold2)
    return img_edges

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')

    # CONSTANTS
    THRESHOLD1 = 20
    THRESHOLD2 = 60

    img_buffer = loadImageFromRequestBody(req)
    img_edges = extractEdges(
        img_buffer, THRESHOLD1, THRESHOLD2)

    # Tips to debug locally
    # cv2.imshow('Image edges', img_edges)
    # cv2.waitKey(0)

    img_encoded = cv2.imencode('.jpg', img_edges)
    img_response = img_encoded[1].tobytes()

    # https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
    headers = {
        'Content-Type': 'image/jpeg',
        'Content-Disposition': 'attachment; filename="image.jpg"',
        'Access-Control-Allow_Origin': '*'
    }

    return func.HttpResponse(
        body=img_response,
        headers=headers,
        status_code=200)

As you can see, we have changed almost all the previous code. We are using type annotations to make it clear what our functions expect to receive and what it returns, we have links and comments related to those functions over the code in case you want to understand where I got the knowledge to build it and finally we have some docstrings to make it easier to reason about its functions.

According to this code, you should send the image as binary data in a POST request. Take a look at the following piece of code to understand how to use it with curl:

# start the function locally
$ func host start

# submit a local image:
$ curl --request POST \
  --url http://localhost:7071/api/OpenCVHttpTrigger \
  --header 'Content-Type: image/jpeg' \
  --data-binary '@/home/gajo/Pictures/myself-01.jpg' \
  --output ~/Desktop/image.jpg

In order to use this command you should have an image in the ~/Pictures directory with name myself-01.jpg. You could change this in your local environment. Then, in the end, you could see the result in the ~/Desktop folder under image.jpg.

Example result:

Input:

Output:

As you can see, in the output image it is clear where are the edges of the input picture.

Notice that this result is totally related to the values we used for the thresholds in the Python code. Look at the next example:

Input:

Model's Instagram account.

Output with:

# code inside the main() function

# CONSTANTS
THRESHOLD1 = 20
THRESHOLD2 = 60

Output with:

# code inside the main() function

# CONSTANTS
THRESHOLD1 = 100
THRESHOLD2 = 300

As you can see, with those different weights we get different results. I would recommend you to test those values first with your images and later update the code to use your tuned values.

Also, as a continuation idea, it would be cool if the user could send those threshold values through the request.

Recommendation: as a rule of thumb, always set the bigger threshold as three times the value of the small threshold.

After setting the code and testing locally, the final step is to publish this Azure Function. To do it you first need to have an Azure account and the Azure CLI tool installed.

Now you must first log in using the CLI with:

$ az login

After using this command a new window will open in your browser and you'll be prompted to login in your Azure account. Just follow those instructions.

The next step is to create a resource group for your Azure Function services. You can do it with the following commands in the CLI:

# select some location near to you
# use this command to see all the available locations
# az account list-locations
$ az group create --location "East US 2" --name "dev-azure-hackathon"

Next step is to create a storage account for your application. You can easily do it with the following command:

# this storage name must be unique in the whole world
# I recommend using the following pattern
# storagecv<MMDDYYYY>
# where MM stands for the month
# DD stands for the day
# YYYY stands for the year
$ az storage account create \
  -n storagecv02202022 \
  -l "East US 2" \
  -g "dev-azure-hackathon" \
  --sku Standard_LRS

After creating the storage account, the next step is to create the Azure Function itself. You could create it using the following command:

$ az functionapp create \
    --consumption-plan-location eastus2 \
    --runtime python \
    --runtime-version 3.8 \
    --functions-version 3 \
    --name opencvhttptrigger \
    --resource-group dev-azure-hackathon \
    --os-type linux \
    --storage-account storagecv02202022

Finally, the last step is to publish our Azure Function using the following command:

$ func azure functionapp publish opencvhttptrigger

In my local development scenario, the first time I used this command it did not work properly. At least I did not get the expected result. But in the second run everything worked fine.

Now you can go for your Azure UI portal in this URL. There you need to log in to your account (the same you used in the CLI) and search in the top bar for the resource group you have created before.

There you'll see the services you created through the CLI:

Next step is to click in your function app to open its specific page. Now, in the left menu, click in the Functions button. There you must assert that your Azure Function is there:

In this page, first click on the function name in order to access its page.

Now, click in the Get Function URL in the top menu. In this part you can specify which authorization key you want to use.

Finally, just copy this link and update the URL you used to test this project locally to use the Azure link. That's it.

Now you have an Azure Function deployed where you can send an image and get its detected edges. Pretty cool!

Submission Category:

Computing Captains.

On this project I used basically just Azure Functions and its related services.

Link to Code on GitHub

64J0 / AzureFunctions-OpenCV

Microsoft Azure Trial Hackathon on DEV project submission. Using OpenCV (Python) in Azure Functions.

OpenCV + Python in Azure Functions

🎯 About

This project was built in order to participate at the Microsoft Azure Trial Hackathon on DEV. If you want to learn how it works and how to start it please check my article in the dev.to platform:

View on GitHub

Additional Resources / Info

Finally, in this section I'll just share some additional resources and some links that I used while developing this project.

First, with OPENCV WITH AZURE FUNCTIONS I understood how to get started using OpenCV along with Azure Functions in Python. This article is pretty clear and show commands to run the project using the CLI tool, which is pretty awesome in developing phase.

Also, I would like to congratulate people that write docs. Although it is not perfect and sometimes pretty tricky to find the information I want, the Python manual for developing Azure Functions is very good. You can access it through this link.

Apart from those, there are some cool references in the code script in comments. You can check it later to understand better where some ideas come from.

Last thing I want to present is a demo video I recorded to show how this project works. Take a look here:

How to publish a NuGet package using dotnet CLI

Vinícius Gajo — Sun, 13 Feb 2022 19:45:43 +0000

Cover image from Claudio Schwarz available in unsplash.

In this post, I'll teach you how to publish a NuGet package using the .NET CLI to make it available to be downloaded and used by other people around the world.

I got the motivation to write this article after trying to find some tutorials on how to make a NuGet package and noticing that most of those are for Visual Studio users in the Windows environment, and this does not fit my needs since I use Ubuntu.

This is a pretty common situation that affects lots of developers that want to contribute to the open source community.

First you develop your tool locally. Then, the next step is to pack it and deliver to a package management platform, like npm for Node.js or NuGet for .NET projects.

To illustrate the process I'll be using a project I have recently started that is called Fubernetes. My goal with this project is to make it easier to craft Kubernetes YAML configuration by taking advantage of F#'s type system.

Disclaimer: This project is still in early development and lots of Kubernetes objects are not mapped yet.

Setup

During this tutorial, I'll present the commands I have tested in a Linux Ubuntu environment. It is required to have the .NET CLI tool installed.

In my local environment I have those SDK's installed:

$ dotnet --list-sdks

5.0.201 [~/dotnet/sdk]
5.0.401 [~/dotnet/sdk]
6.0.101 [~/dotnet/sdk]

But I expect it to work fine if you have any version after .NET 5. To check if you have the required tool you can run the following command in a terminal:

$ dotnet pack --help

Finally, the last required piece is to have a .NET project. For testing purposes you could use the project I mentioned before (Fubernetes) or just stick with a Console application, adapting the commands.

Procedure

1 - After configuring your local setup, go for the NuGet page and create an account.

2 - Next step is to pack the project. Since this specific project does not depend upon external packages we could just run:

$ dotnet pack --configuration release Fsharp-K8s.Main/

Microsoft (R) Build Engine version 17.0.0+c9eb9dd64 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

  Determining projects to restore...
  All projects are up-to-date for restore.
  Main -> ~/Desktop/codes/fsharp-k8s/Fsharp-K8s.Main/bin/release/net5.0/Fubernetes.dll
  Successfully created package '~/Desktop/codes/fsharp-k8s/Fsharp-K8s.Main/bin/release/Fubernetes.1.0.0.nupkg'.

Now you could inspect the folder bin/release/ to see the generated binaries.

3 - Go back to the NuGet page, click on your name in the top right and click on API keys. You'll need to generate a new key in order to use the terminal to upload your project binaries.

Continuing, click on + Create and this page will open for you to input the required values.

There you should give your key a name, set a reasonable expiration date and select your account as the package owner. For testing purposes you could leave the default option for scopes selected: Push > Push new packages and package versions.

Since we do not have any package created yet it is better to not set any additional rules for this API token. So, in glob pattern just put a * and hit the Create button.

Now you'll be redirected to the key box similar to this one:

There you can get the key by clicking the copy button. Pay attention to this value since it gives permission to do anything in your account.

If you lost the copied key you need to copy on the regenerate button and the copy will be there again.

Now, you can send your package to NuGet using this last command:

$ dotnet nuget push Fsharp-K8s.Main/bin/release/Fubernetes.1.0.0.nupkg --api-key <SECRET_KEY> --source https://api.nuget.org/v3/index.json

4 - Finally, it is just a matter of time until your package gets indexed and appears for everyone to download. During this time it is submitted for several validations such as malware detection.

According to the docs, this should take at most 15 minutes and after it finishes you'll receive a confirming e-mail.

Also, to check the package status, you can go for the NuGet platform again, click on your name in the top right, and hit Manage packages.

After some time you'll see it published like this:

Conclusion

In this article, I have covered all the steps required to publish a .NET package to NuGet using the dotnet CLI tool. I hope that it is useful for you and makes it easier to develop new cool open source tools.

That's it, see you later.

References

Starting with NixOS using QEMU

Vinícius Gajo — Sat, 08 Jan 2022 00:57:09 +0000

In this post I'll basically present the result of a research I did to understand better several components of a virtualization tool called QEMU. My goal, in the end, is to start a NixOS virtualized machine to run some experiments with this interesting OS.

BIOS

Reference link.

BIOS means short for Basic Input/Output System, is a ROM (Read Only Memory) chip found on motherboards that allows you to access and set up your computer system at the most basic level.

The BIOS includes instructions on how to load basic computer hardware. It also includes a test referred to as a POST (Power-On Self-Test) that helps verify the computer meets requirements to boot up properly. If the computer does not pass the POST, you head a combination of beeps indicating what is malfunctioning in the computer.

POST - Test the computer hardware and make sure no errors exist before loading the OS.
Bootstrap loader - Locate the OS. If a capable OS is located, the BIOS will pass control to it.
BIOS drivers - Low-level drivers that give the computer basic operational control over your computer's hardware.
BIOS setup or CMOS setup - Configuration program that allows you to configure hardware settings including system settings, such as date, time, and computer passwords.

The BIOS does things like configure the keyboard, mouse, and other hardware, set the system clock, test the memory, and so on. Then it look for a drive and loads the boot loader on the drive, which is either an MBR or GPT partition table.

UEFI

UEFI stands for Unified Extensible Firmware Interface. It is a publicly available specification that defines a software interface between an operating system and platform firmware.

UEFI replaces the legacy BIOS firmware interface originally present in all IBM pc's, with most UEFI firmware implementations providing support for legacy BIOS services. UEFI can support remote diagnostics and repair of computers, even with no operating system installed.

KVM

Reference link.

KVM stands for Kernel-based Virtual Machine. It's an open source virtualization technology built into Linux. Specifically, KVM lets you turn Linux into a hypervisor that allows a host machine to run multiple, isolated virtual environments called guests or virtual machines (VMs).

KVM is part of Linux.

QEMU

Reference link.

According to the site, QEMU is a generic and open source machine emulator and virtualizer.

Emulator -

Hardware or software that enables one computer system (called the host) to behave like another computer system (called the guest). An emulator typically enables the host system to run software or use peripheral devices designed for the guest system. Emulation refers to the ability of a computer program in an electronic device to emulate (or imitate) another program or device.

Virtualizer -

Virtualization means a variety of technologies for managing computer resources by providing a software interface, known as an "abstraction layer", between the software (operating system and applications) and the hardware. Virtualization turns "physical" RAM and storage into "logical" resources.

2.1. Hardware virtualization -

This is what most computer people are referring to when they talk about virtualization. It partitions the computer's RAM into separate and isolated "virtual machines" (VMs) simulating multiple computers within one physical computer. Hardware virtualization enables multiple copies of the same or different operating systems to run in the computer and prevents the OS and its application in one VM from interfering with the OS and applications in another VM.

2.2. Network and storage virtualization -

In a network, virtualization consolidates multiple devices into a logical view so they can be managed from a single console. Virtualization also enables multiple storage devices to be accessed the same way no matter their type or location.

2.3. Application virtualization -

Application virtualization refers to several techniques that make running applications protected, flexible and easy to manage.

2.4. OS virtualization -

Under the control of one operating system, a server is split into "containers" that each handle an application.

With this tool it's possible to:

Run operating systems for any machine, on any supported architechture. It provides a virtual model of an entire machine (CPU, memory and emulated devices) to run a guest OS.
Run programs for another Linux/BSD target, on any supported architechture.
Run KVM and Xen virtual machines with near native performance.

YouTube - QEMU: A proper guide!.

Partition information

In this section I'll be sharing other necessary topics to understand the complete installation of the NixOS image.

Swap memory

Reference link.

Memory swapping is a computer techonology that enables an operating system to provide more memory to a running application or process than is available in physical random access memory (RAM). When the physical system memory is exhausted, the operating system can opt to make use of memory swapping techniques to get additional memory.

Memory swapping works by making use of virtual memory and storage space in an approach that provides additional resources when required. In short, this additional memory enables the computer to run faster and crunch data better.

With memory swapping, the operating system makes use of storage disk space to provide functional equivalent of memory storage space.

The process of memory swapping is managed by an operating system or by a virtual machine hypervisor.

Advantages of memory swapping:

More memory: memory swapping is a critical component of memory management, enabling an operating system to handle requests that would otherwise overwhelm a system.
Continuous operations: swap file memory can be written to disk in a continuous manner, enabling faster lookup times for operations.
System optimization: application processes of lesser importance and demand can be relegated to swap space, saving the higher performance physical memory for higher value operations.

Limitations of memory swapping:

Performance: disk storage space, when called up by memory
swapping, does not offer the same performance as physical RAM for process execution.
Disk limitations: swap files are reliant on the stabiity and availability of storage media, which might not be as stable as system memory.
Capacity: memory swapping is limited by the available swap space that has been allocated by an operating system or hypervisor.

LVM volumes

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

Systemd

Reference link.

systemd is a software suite that provides an array of system components for Linux operating systems. Its main aim is to unify service configuration and behavior across Linux distributions; systemd's primary component is a "system and service manager" - an init system used to bootstrap user space and manage user processes. It also provides replacements for various daemons and utilities, including device management, login management, network connection management, and event logging. The name systemd adheres to the Unix convention of naming daemons by appending the letter d.

Software RAID devices

Reference link.

RAID stands for "Redundant Array of Inexpensive Disks", is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This was in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

UEFI (GPT) x Legacy Boot (MBR)

Reference link.

The main difference between UEFI and legacy boot is that UEFI is the latest method of booting a computer that is designed to replace BIOS while the legacy boot is the process of booting the computer using BIOS firmware.

Also, UEFI more is recommended because it includes more security features (with less complex code) than the legacy BIOS mode.

GPT and MBR are related to the partition used in the OS.

Q: So, what's a partition?

A: Is a virtual division of a hard disk drive (HDD) or a solid state drive (SSD). Each partition can vary in size and typically serves a different function.

In Linux there's typically a root partition (/), one for swap which helps with memory management, and large /home partition. the /home partition is similar to the C: partition in Windows in that it's where you install most of your programs and store files.

Program to check the partitions: GParted.

An overview of MBR and GPT partitions

Before a drive can be divided into individual partitions, it needs to be configured to use a specific partition scheme or table.

A partition table tells the OS how the partitions and data on the drive are organized. MBR stands for Master Boot Record, and is a bit of reserved space at the beginning of the drive that contains the information about how the partitions are organized. The MBR also contains code to launch the OS, and
it's sometimes called the Boot Loader.

GPT is an abbreviation of GUID Partition Table, and is a newer standard that's slowly replacing MBR. Unlike MBR partition table, GPT stores the data about how all the partitions are organized and how to boot the OS throughout the drive. That way if one partition is erased or corrupted, it's still possible to boot and recover some of the data.

Some differences:

The maximum capacity of MBR partition tables is only about 2 TB. You can use a drive that's larger than 2 TB with MBR, but only the first 2 TB of the drive will be used. The rest of the storage on the drive will be wasted.
In contrast, GPT partition tables offer a maximum capacity of 9.7 ZB, where 1 ZB = 1 billion TB.
MBR partition tables can have a maximum of 4 separate partitions. However, one of those partitions can be configured to be an extended partition, which is a partition that can be split up into an 23 additional partitions. So the absolute maximum number of partitions an MBR partition table can have is 26 partitions.
GPT partition tables allow for up to 128 separate partitions, which is more than enough for most real world applications.
As MBR is older, it's usually paired with older Legacy BIOS systems, while GPT is found on newer UEFI systems. This means that MBR partitions have better software and hardware compatibility, though GPT is starting to catch up.

Steps

Choose an interface for the system

i3wm gaps
dwm -> built with C code
install the minimum system and install the interface later

Download the minimal image and configure it to use with QEMU.

  # download the minimal image:
  $ wget https://channels.nixos.org/nixos-21.05/latest-nixos-minimal-x86_64-linux.iso
  # it will download a file named: latest-nixos-minimal-x86_64-linux.iso

  # config the image
  # cmd template -> qemu-img create -f qcow2 NOME.img XG
  $ qemu-img create -f qcow2 nixos-test.img 20G
  # command used to create, convert and modify disk images
  # -f:
  #   Stands for format option. qcow2 stands for copy on write 2nd generation.


  # bootstrap the machine
  # cmd template -> qemu-system-x86_64 -boot d -cdrom image.iso -m 512 -hda mydisk.img
  $ qemu-system-x86_64 -enable-kvm -boot d \
  $ -cdrom latest-nixos-minimal-x86_64-linux.iso \
  $ -m 2G -cpu host -smp 2 -hda nixos-test.img
  # command used to boot an image
  # to get the help use the -h flag
  # -enable-kvm:
  #   Enable KVM full virtualization support. This option is only available if KVM support
  #   is enabled when compiling.
  # -boot
  #   Specify boot order drives as a string of drive letters. Valid drive letters depend on
  #   the target architechture. The x86 PC uses: a, b (floppy 1 and 2), c (first hard disk)
  #   d (first CD-ROM), n-p (Etherboot from network adapter 1-4), hard disk boot is the default.
  # -cdrom
  #   Use file as CD-ROM image (you cannot use -hdc and -cdrom at the same time). You can use
  #   the host CD-ROM by using /dev/cdrom as filename.
  # -m
  #   Set the quantity of RAM.
  # -hda
  #   Use file as hard disk 0, 1, 2 or image.

  # start the vm after closing it
  $ qemu-system-x86_64 -enable-kvm -boot d \
  $ -m 2G -cpu host -smp 2 -hda nixos-test.img

Follow the installation steps provided by the docs. Link here.

Some useful keyboard commands:

/Ctrl-alt-g/ -> free the mouse from inside the image.
/Ctrl-alt-f/ -> toggle switch fullscreen.

Create a function to count words in F# docs using F#

Vinícius Gajo — Sun, 01 Aug 2021 22:37:24 +0000

Hello folks, hope you're good. In this post (that's actually my first post in the dev.to platform) I'll be sharing a program I wrote with F# to count how many words are in the F# docs in the Microsoft platform.

F# is an open-source, cross-platform programming language that makes it easy to write succinct, performant, robust, and practical code. - Microsoft docs.

My motivation to write this post is to share some things I'm learning about the F# language and also to confirm some information regarding my understanding of the algorithm implemented.

Alright, before getting into the code I want to say thanks to my friend JZ who inspired me to write this function. Also, I have used an algorithm that he wrote in C# to do this same task as a guide, so thanks in double.

Requirements

To reproduce this code you should have those tools installed:

.NET Core SDK version 5
An IDE with support to F# syntax (for now I'm using VS Code with Ionide extension, but in the future, I'll probably move to Emacs)

Disclaimer: During my explanation I won't focus on all introductory aspects of the F# language, like, I'm supposing that the reader already knows how to create a function and things like this. If you want a more detailed explanation please comment on this post.

Starting the project

Alright, after installing the SDK and the IDE, just open a terminal and write the following code:

For instance I'm using Ubuntu 20.04 to develop.

# start the project with a boilerplate project
$ dotnet new console -o WordsCounter -lang "F#"

After this command execution, you'll notice that a new folder has been created with the name WordCounter. Entering this folder you'll see two files and another folder, like in the following image.

Here, at this tutorial, we will be concerned only with the Program.fs file, since all the logic of the program will be written in this file.

Let's continue, at this point you should open this file in your favorite IDE just to check the code.

The code

When you open the file in your IDE it should display those lines of code:

// Learn more about F# at http://docs.microsoft.com/dotnet/fsharp

open System

// Define a function to construct a message to print
let from whom =
    sprintf "from %s" whom

[<EntryPoint>]
let main argv =
    let message = from "F#" // Call the function
    printfn "Hello world %s" message
    0 // return an integer exit code

This is the template of a console application written with F# code. If you want to run this project, simply type in the terminal:

$ dotnet run

With this command, the project will be compiled and a string should be displayed in the terminal, like the following image:

With this approach, you can say that the project has been compiled and if you check the WordCounter/ folder again you'll notice that there is a new folder called bin/.

This is the folder where the compiled project lives. Also, some files can be used to debug the application but let's keep it simple for now.

For the sake of simplicity, I'll use the interactive way of running an F# program, because this is the way I run most of my introductory programs.

With this approach, you can test your code faster and learn fast too. At this point, I'm using this tool a lot to write my codes, so I think that you should consider using it too.

To create an F# script that runs on the interactive mode you only need to change the extension of the file. So, just change the file from Program.fs to Program.fsx, and that's it for now.

For some reason that I don't know at this moment, running this code this way (with the interactive tool) will not print the result on the terminal. But ok, our code will work later.

Alright, the actual code we will use is in fact the following:

#if INTERACTIVE
#r "nuget: HtmlAgilityPack"
#endif

open System
open System.Net
open HtmlAgilityPack

let fetchHtmlContent (uri: Uri) =
    let httpClient = new Http.HttpClient()
    httpClient.GetStringAsync(uri)

let htmlNodeIsLeaf (node: HtmlNode) =
    (not node.HasChildNodes)
    && not (String.IsNullOrWhiteSpace(node.InnerHtml))

let countWords (textNodes: seq<HtmlNode>) =
    Seq.fold 
        (fun (acc) (node: HtmlNode) -> acc + node.InnerText.Split(" ").Length) 
        0 // acc initial value
        textNodes

let printResults url quantityOfWords =
    printfn "\nUrl: %s \nQuantity of words: %i" url quantityOfWords

let program (url: string) =
    async {
        let uri = Uri(url)
        let! rawHtml = fetchHtmlContent(uri) |> Async.AwaitTask
        let html = HtmlDocument()
        html.LoadHtml(rawHtml)

        let documentNode = html.DocumentNode

        // xpath -> select html components
        let singleNode =
            documentNode.SelectSingleNode(@"//*[@id=""main-column""]")

        let descendants = singleNode.Descendants()
        let textNodes = descendants |> Seq.where htmlNodeIsLeaf

        let quantityOfWords = countWords textNodes

        printResults url quantityOfWords
    }


let listOfTargetSites =
    [ "https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/literals" (* 576 *)
      "https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/sequences" (* 4675 *) ]

listOfTargetSites
|> List.map program
|> Async.Parallel
|> Async.RunSynchronously
|> ignore

You may be thinking that this is complex but calm down, I'll explain what is happening in the code here, in the following block of text...

Explanation

In the first line, we're importing an external library called HtmlAgilityPack using nuget which is the default package manager for .NET application.

#if INTERACTIVE
#r "nuget: HtmlAgilityPack"
#endif

This package is useful because it presents some built-in functions to perform operations in HTML files as the name suggests. If you want to check the docs just access this link.

You'll notice that the examples are written with the C# syntax, but that's ok. Using F# we need to get comfortable with this situation.

With the surrounding syntax, we manage to only use this command when running the code with the interactive tool. This is a really nice feature since we can use the same code without changes in the build process.

If you want to know more about this interactive syntax please check this link from the official docs.

Let's continue, the next lines are used to open the packages we need to write the algorithm:

open System
open System.Net
open HtmlAgilityPack

Basically, we're opening the System to use some built-in operations with Strings and get the special class Uri(), that is used later to grant that the function signature of GetStringAsync() is right.

The next package System.Net, as the name suggests, is used to handle network requests.

And the last package, HtmlAgilityPack, is a special package used to perform operations in HTML files, like web scraping.

Ok, now let's jump to the function called program. This is the main function that controls the flow of the algorithm, so let's check it deeply.

This function receives a URL, that is a string, then it requests this URL to fetch the content of the site. After this operation, the site content is parsed, getting the DocumentNodes.

In the next phase, we search in those DocumentNodes for the element with the id of main-column. The syntax used to define this element in the SelectSingleNode() function is called XPath and is beyond the scope of this post. If you are interested in this XPath syntax please check this link with a cheatsheet.

We are searching for this element because all the relevant content is inside it like you can see in the next image.

Continuing, we dive into the leaves of the HTML structure (tree), which in this case are more probably to have the words we want to count.

Assumption: It's more probable that we will find the text content in the last level of the tree, its leaves.

Then, we perform a fold operation, that is very similar to the reduce, except for the fact that with this function we can specify the initial value of the accumulator.

Basically, with this operation, we iterate through all the nodes and add a specific value to the accumulator variable, in this case, we add the number of words in each phrase in each leaf.

In the end, we just print the result in the console for the user to see and store this information. In my case, I'm using it in a spreadsheet with the links I want to study just to get an estimated time based on my previous readings.

Execution

For this example, I'll be using only two links from the F# docs provided by Microsoft.

In the last two blocks of code, we're defining a list with the URLs of the pages we want to count the number of words, and activating the algorithm.

let listOfTargetSites =
    [ "https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/literals" (* 576 *)
      "https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/sequences" (* 4675 *) ]

listOfTargetSites
|> List.map program
|> Async.Parallel
|> Async.RunSynchronously
|> ignore

Basically, in the last block of code, we're running a map operation to apply that specific function (the program() itself) to each of the entries in the list.

After this, we're saying to the .NET runtime to run the process in parallel, and that's why sometimes we get strange results in the console.

That's because the IO operations take some time to be concluded and we are not waiting for this operation to be concluded in any part of the code, we're just waiting for the requests to be fulfilled.

Look at this example, at the first two operations everything went ok but in the last one the string with the number of words were merged in a buggy way:

Finally, to test the code we wrote, just type this command in the terminal:

$ dotnet fsi Program.fsx

The result should be:

Conclusion

That's it guys, with this program we get a good estimate of how many words we have on each page of the F# docs provided by Microsoft. Please, don't consider this result as a flawless estimate since there are lots of things we didn't consider when writing the code, just to make it simple and easy to understand.

Also, if you pretend to use this code on a different site just remember to change the XPath according to your context.

If you want to talk to me please write a comment in this post or contact me on LinkedIn.

I've been studying F# for 3 weeks now and I'm really liking it, please consider giving it a try. See you later with more posts.