Here are the seven tips and code bites that I use every day in my work as a data scientist.

code bites
code bites
Image by author

In this story, I will share what I use in my day-to-day work and what has helped me improve my code. Check the list below to see if there’s anything new for you!

String formatting with f-strings

Hallelujah! That is what I thought when I learned about the Python 3.6+ update that includes a new way of formatting strings: the Python formatted string literal. …


Use these three built-in modules to format your scripts the Pythonic way.

Photo by the author.

In this article, I’ll show you three scripting conventions and corresponding built-in modules to help better format your Python scripts. These modules are designed to adhere to the DRY (don’t repeat yourself) principle and are there to improve the quality of your code and scripts!

In short, we’ll go over the following three components:

Always Use an ifmain

ifmain refers to the last lines of code in a Python script that you often see: if __name__ == "__main__":. When…


How to use, format, and fade “flash” messages in Django, both with page reload and asynchronous JavaScript submission

Code
Code
Photo by the author.

In this article, you will learn how the Django messages framework works and how you can use its power, including a way to fade the messages when submitting forms via JavaScript without reloading the page.


Learn about distributed task queues for making asynchronous API requests

Django, RabbitMQ, and Celery logos
Django, RabbitMQ, and Celery logos
Image by author

What happens when a user sends a request, but processing that request takes longer than the HTTP request-response cycle? What if you’re accessing multiple databases or want to return a document too large to process within the time window? What if you want to access an API, but the number of requests is throttled to a maximum of n requests per t time window?

These are part of the questions that were raised during the data collection process for my master’s thesis. For my research, microposts from Twitter were scraped via the Twitter API. …


How to incorporate the immensely popular JSON data format in Python for storage of textual data.

Image by author

In this story, we’ll look at JavaScript Object Notation or JSON, probably the world’s preferred data-interchange format and a sure-fire upgrade from more traditional file storage options (XML, CSV, TSV).

We’ll cover the basics for creating and loading JSON files, file storage, and newline delimited JSON storage and take a look into a more specific use-case of working with textual data and JSON.

Why JSON

JSON is widely used in web applications as the preferred way to interchange data, especially to and from front-end to back-end middleware.


Image by author

Pair-wise Cohen kappa and group Fleiss’ kappa (𝜅) coefficients for categorical annotations

In this story, we’ll explore the Inter-Annotator Agreement (IAA), a measure of how well multiple annotators can make the same annotation decision for a certain category. Supervised Natural Language Processing algorithms use a labeled dataset, that is often annotated by humans. An example would be the annotation scheme for my master’s thesis, where tweets were labeled as either abusive or non-abusive.

IAA shows you how clear your annotation guidelines are, how uniformly your annotators understood it, and how reproducible the annotation task is. It is a vital part of both the validation and reproducibility of classification results.

Accuracy and F1…


Testing w/ selenium
Testing w/ selenium
Image by author.

A test-driven development cycle is a vital part of the migration process of your Django app to production. Learn the basics of TDD in this brief introduction.

I had learned the hard way that functional parts, such as database operations, log-ins, or registrations break without you noticing. Your users, however, will notice. Testing is a vital part of the production process of and in this story, we’ll briefly introduce how to test your Django application.

This story covers:

Test-Driven Development Cycle


Icon by Loris Grillet

Over the last few years, the industry has developed a preference for vanilla JavaScript, without relying on external libraries.

jQuery is a lightweight and easy to use JavaScript library that helps in creating complex functionalities with few lines of coding. jQuery coding is shorter and often more simplified than equivalent vanilla JS code. So why do many developers move away from this library and towards plain or vanilla JavaScript?

In this story we’ll go through:

Fading an element in JavaScript and jQuery

jQuery is a library that has a high level of abstraction, making it easy to use but…


user registration w/ email verification
user registration w/ email verification

A tutorial detailing user registration and password reset routes for your Django app with e-mail verification and a unique token generation.

This tutorial will provide you with all that you need to know to create a user registration route with email verification. We’ll go through the registration process, unique token generation, and sending e-mails with the email-client Sendgrid. They offer a free plan with up to 100 emails per day. We’ll also show you how to create a forgot password route where users can enter their e-mail to get a reset password link.

The Django app parts we’re going to go through are the URLs, the project views, the unique token generation, and the registration forms and models. …


Animation by Enjoyanimation

A naive approach to parsing headers and paragraphs from pdf documents

Here’s for something completely different: parsing pdf documents and extracting the headers and paragraphs! There are various packages that extract text from pdf documents and convert them to HTML, but I’ve found these to be either too elaborate for the task at hand and/or too complex. In my experience, generic pdf parsers generalize okay-ish over all documents, but for a specific use-case of somewhat similarly structured documents, we can enhance performance with some code of our own!

Methodology

Since pdf files consist of unstructured text, we need to find some similarities over the different documents on how headers and paragraphs are…

Louis de Bruijn

Analytics trainee at ING | MSc Information Science | https://www.louisdebruijn.me

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store