Seoul Temperature Prediction: Kaggle Notebook

Arsh Anwar
5 min readNov 28, 2022

Hey Everyone,
In today’s blog, we will work on a Kaggle notebook that focuses on analyzing temperature variations, creating models to predict minimum and maximum Temperature, and Finally deploying the best model on TrueFoundry.

So, Let’s get started.

Introduction

Today we are going to work on the Seoul Temperature dataset. Seoul is a city in South Korea (yes you’re right, BTS live there).

Firstly we will start with exploratory data analysis, then we will do some Plotly Visualisations, after which we will train a few models from LuciferML and finally do the deployment on TrueFoundry.

For those who don’t know

LuciferML is an AutoML Library for Creating models. It takes away all your hassles and does your hard work. It’s currently in development — it Preprocesses and Trains different models on your Data.

TrueFoundry is an MLOps platform that provides the fastest post-model pipeline framework for Data Scientists (DS) and Machine Learning Engineers (MLEs). It also allows for the monitoring of the trained model in 15 minutes. With MLFoundry, you can track experiments with a few lines of Code, compare and visualize experiments on a rich dashboard, and log experiment artifacts.

About Dataset

The Seoul Temperature dataset from Kaggle has 25 columns, out of which we have to predict the “Next_Tmin and Next_Tmax” column.
It has data that tells based on the different atmospheric values how the Min/Max Temperature gets affected

Installation

!pip install lucifer-ml  mlfoundry servicefoundry gradio

Importing Libraries

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
plt.style.use('dark_background')
import numpy as np
import pandas as pd
import seaborn as sns
#Importing LuciferML
from luciferml.supervised.classification import Classification
import mlfoundry as mlf
import warnings
import servicefoundry.core as sfy
warnings.simplefilter(action='ignore', category=Warning)

Loading Dataset

dataset = pd.read_csv('../input/temperature-forecast-project-using-ml/temp.csv')

Exploratory Data Analysis

There are a few missing values in the dataset there we will drop them.

Visualizations

Findings

  1. Generally higher values of Solar Radiation are Associated with Lower Values of Maximum Temperature
  2. Higher values of Solar Radiation are associated with lower values of minimum temperature
  3. Lower values of wind speeds are associated with high temperatures
  4. Most of the temperature recordings are from Days when Elevation Values are low

Data Preparation

Here, we will create two labels, one for Min Temperature and the other for Max Temp.

features = dataset.drop(['index', 'Date', 'Next_Tmax', 'Next_Tmin'], axis = 1)
labels_max = dataset['Next_Tmax']
labels_min = dataset['Next_Tmin']

Training a lot of Models…

I have trained almost all the models of LuciferML for both Min and Max Temp and have created a list of the scores along with the scores

params_cat ={}

regressor_cat_max = Regression(predictor = 'cat',params= params_cat)
regressor_cat_min = Regression(predictor = 'cat',params= params_cat)
# For Max Temp
regressor_cat_max.fit(features, labels_max)
result = regressor_cat_max.result()
accuracy_scores_max[result['Regressor']] = result['Accuracy']
RMSE_max[result['Regressor']] = result['RMSE']

# For Min Temp
regressor_cat_min.fit(features, labels_min)
result = regressor_cat_min.result()
accuracy_scores_min[result['Regressor']] = result['Accuracy']
RMSE_min[result['Regressor']] = result['RMSE']

Leaderboard

From the above plots, we can conclude Catboost Regressor performed well for both Min and Max Temp. Therefore, we will deploy it on TrueFoundry

Deployments

Experiment Tracking is the process of saving/logging all the info. released every experiment and test

a) Login to TrueFoundry

sfy.login(api_key)

You can get the API Key from here:

b) Creating a new project

mlf_api = mlfoundry.get_client()
mlf_run = mlf_api.create_run(
project_name='temp-predictor', run_name='temp-predictor-run-1')

c) Logging Dataset

mlf_run.log_dataset("features", features)
mlf_run.log_dataset("labels_max", labels_max)
mlf_run.log_dataset("labels_min", labels_min)

d) Logging Models

mlf_run.log_model(name = 'Best_Model_Max', model = regressor_cat_max.regressor, framework = mlfoundry.ModelFramework.SKLEARN, description = 'Temp_pred_max_best')
mlf_run.log_model(name = 'Best_Model_Min', model = regressor_cat_min.regressor, framework = mlfoundry.ModelFramework.SKLEARN, description = 'Temp_pred_min_best')

Deploying model in 3..2..1…

We will deploy our model as a Gradio WebApp.
For deploying a model, we need two files one is deploy.py, which is the deployment code, and requirements.txt.

Directory Structure

.
├── deploy.py
└── requirements.txt

First, we will write deployment code using the magic function

Next, we will gather the requirements

requirements = sfy.gather_requirements("deploy.py")

Creating service and deploying it on our workspace

Here, in resource, we are going to add a Memory Limit of 1GB for our model.

service = Service(
name="temp-pred-service-1",
image=Build(
build_spec=PythonBuild(
command="python deploy.py",
),
),
ports=[{"port": 8080}],
resources=Resources(memory_limit=1500, memory_request=1000),
)
service.deploy(workspace_fqn='<your-workspace-fqn>')

Deployed Model

Model Deployed Here: https://temp-pred-service-1-demo-projects.tfy-ctl-euwe1-production.production.truefoundry.com/

--

--

Arsh Anwar

AI/ML expert. Built LuciferML (100k+ downloads). Co-founder @Revca, building smart solutions for a sustainable future.