SentinelTI Dashboard

An explainable, real-time IP threat intelligence framework built for modern SOCs — powered by ensemble ML, IPInfo location enrichment, and SHAP interpretability.

System operational · v2.0 · IEEE Publication

95%

Precision

Target >90%

93%

Recall

Target >85%

94%

F1-Score

Target >90%

4%

False Positive Rate

Target <5%

<10s

Data Freshness

Target <15s

>95%

IOC Coverage

Target >90%

What is SentinelTI?

SentinelTI is an asynchronous threat intelligence platform that addresses three core failure modes of traditional blacklist-based systems: high false-positive rates, opaque decision-making, and delayed threat visibility.

The framework ingests IP indicators from open-source feeds in real time, enriches them asynchronously — including IP location and ASN context via IPInfo — and runs a stacking ensemble of Random Forest + XGBoost to produce a dynamic maliciousness risk score explained by SHAP.

Research Paper

"An Explainable Machine Learning Framework for IP Threat Intelligence and Maliciousness Risk Scoring" — Soham Shah & Mousmi Pawar, Somaiya Vidyavihar University. IEEE Publication.

Core Problems Solved

Traditional Problem	SentinelTI Solution
Stale blacklists — hours of detection lag	Real-time ingestion, <10s freshness
High false positive rate, alert fatigue	Ensemble ML reduces FPR to 4%
Black-box decisions, no analyst trust	SHAP global + local explainability
No risk granularity (binary flag)	Three-tier stratification: Low / Medium / High
Reactive SOC posture	Async enrichment enables proactive response

Tech Stack

Layer	Technology	Purpose
Backend	`Python`	Core ingestion, enrichment, ML pipeline
Database	`MongoDB`	IOC storage with timestamps, features, scores
ML Models	`Random Forest · XGBoost`	Stacking ensemble for maliciousness scoring
Explainability	`SHAP`	Global & local feature attribution
Dashboard	`Streamlit`	Live SOC visualisation with colour-coded risk
Deployment	`Docker`	Containerised, modular production deployment
Threat Feeds	`Abuse.ch · CINS Army`	Primary IOC sources
Enrichment APIs	`VirusTotal · AbuseIPDB · IPInfo · ASN`	Contextual metadata & IP location enrichment

Installation →

Getting Started

Get SentinelTI running in your SOC environment in under 10 minutes.

ℹ Prerequisites

You need Python 3.10+, Docker, a running MongoDB instance, and API keys for VirusTotal, AbuseIPDB, and IPInfo.

Installation

bash

# Clone the repository
git clone https://github.com/soham7998/sentinel_TI.git
cd sentinel_TI

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Configuration

.env

# MongoDB connection
MONGO_URI=mongodb://localhost:27017
MONGO_DB=sentinelti

# Threat intelligence API keys
VIRUSTOTAL_API_KEY=your_virustotal_key_here
ABUSEIPDB_API_KEY=your_abuseipdb_key_here

# IP location enrichment (replaces GeoIP)
IPINFO_TOKEN=your_ipinfo_token_here

# Feed configuration
ABUSEIP_LIMIT=200
INGESTION_INTERVAL_SEC=300

# Dashboard
STREAMLIT_PORT=8501

Running with Docker

bash

# Build and start all services
docker-compose up --build -d

docker-compose logs -f sentinel   # view logs
open http://localhost:8501          # dashboard

Running Locally

Ingestion
Enrichment
Dashboard
# Start IOC ingestion pipeline
python src/ingestion/ingest.py --limit 200
# Run async enrichment worker (incl. IPInfo location lookup)
python src/enrichment/enrich_worker.py
# Launch Streamlit dashboard
streamlit run dashboard/app.py --server.port 8501

Success

IOCs appear on the dashboard within ~30 seconds. Full SHAP scores with IPInfo location data arrive within ~60 seconds of ingestion.

← Introduction Architecture →

Architecture Pipeline

A modular, asynchronous pipeline separating fast IOC ingestion from background enrichment — achieving <10s dashboard latency without sacrificing analytical depth.

Framework Diagram

The interactive diagram below matches the paper's Figure 1 — each stage flows sequentially through the asynchronous intelligence pipeline:

Input Layer

External Threat Feeds

Abuse.ch (Primary) · CINS Army (Supplementary)

⚡ Fast IOC Ingestion

Normalize · Deduplicate · Limit = 200

Asynchronous Enrichment

VirusTotal · AbuseIPDB · IPInfo · ASN

MongoDB IOC Store

Timestamps · Features · Scores

ML Risk Scoring Engine

Random Forest + XGBoost · Risk Score 0 → 1

SHAP Explainability

Global Feature Importance · Local Per-IP Explanation

Actionable SOC Dashboard — Evidence-based Decisions

LOW

MEDIUM

HIGH

Async Enrichment Detail

The core architectural decision is the separation of ingestion from enrichment. When a new IOC arrives:

Stored immediately in MongoDB with enriched: false — visible on SOC dashboard instantly.
Background worker calls VirusTotal, AbuseIPDB, IPInfo (location/ASN/privacy), and ASN APIs asynchronously.
Once enrichment completes, the feature vector is assembled and scored by the ML pipeline.
Risk score and SHAP values are written back to MongoDB, updating the dashboard in real time.

IPInfo Integration

Why IPInfo?

IPInfo replaces static GeoIP databases with a live API that provides richer context: city/region/country, organization, ASN, and critically — privacy flags (VPN, proxy, Tor detection). These flags are strong threat indicators fed directly into the ML feature vector as ipinfo_geo_risk.

python · IPInfo enrichment

import ipinfo, os

handler = ipinfo.getHandler(os.getenv("IPINFO_TOKEN"))

def enrich_location(ip: str) -> dict:
    d = handler.getDetails(ip)
    return {
        "country":   d.country,
        "region":    d.region,
        "city":      d.city,
        "org":       d.org,        # e.g. "AS14061 DigitalOcean"
        "asn":       d.asn,
        "privacy":   d.privacy,   # {"vpn":bool, "tor":bool, "proxy":bool}
        "latitude":  d.latitude,
        "longitude": d.longitude,
    }

def compute_ipinfo_geo_risk(details: dict) -> float:
    """Derive a geo risk score from IPInfo privacy flags + ASN."""
    score = 0.0
    privacy = details.get("privacy", {})
    if privacy.get("tor"):    score += 0.5
    if privacy.get("vpn"):    score += 0.3
    if privacy.get("proxy"):  score += 0.2
    return min(score, 1.0)

MongoDB Schema

JSON · IOC Document

{
  "ip": "137.184.9.29",
  "source": "Abuse.ch",
  "first_seen": "2026-02-14T08:10:21Z",
  "enriched": true,
  "location": {
    /* via IPInfo */
    "country": "United States",  "region": "California",
    "city": "Santa Clara",      "org": "AS14061 DigitalOcean",
    "asn": "AS14061",
    "privacy": { "vpn": false, "tor": false, "proxy": false }
  },
  "features": {
    "abuse_reports": 42,    "recency_hrs": 3,
    "confidence_pct": 92,   "multi_source": true,
    "attack_types": ["Brute Force", "SSH"],
    "virustotal_detections": 18,
    "ipinfo_geo_risk": 0.0,  /* derived from IPInfo privacy flags */
    "freshness": 0.91
  },
  "risk_score": 0.87,  "risk_level": "HIGH",
  "shap_values": { /* per-feature SHAP φ */ }
}

← Quickstart ML Scoring →

Machine Learning Scoring

A stacking ensemble of Random Forest and XGBoost, combined via a linear meta-learner to produce calibrated maliciousness probabilities.

Feature Engineering

Each IP address is represented as a feature vector xᵢ = (f₁, f₂, … f_d) built from enrichment data:

Abuse Reports

📍 AbuseIPDB

Frequency of abuse reports logged against this IP

Confidence Score

📍 AbuseIPDB

Weighted confidence (0–100) of maliciousness

Recency (hours)

📍 Last Report Timestamp

Hours elapsed since last observed abuse activity

VirusTotal Detections

📍 VirusTotal API

Count of vendor detections (e.g., 18/70)

IPInfo Geo Risk

📍 IPInfo API

Risk score derived from location, org, and privacy flags (VPN/Tor/proxy) via IPInfo

Multi-Source Agreement

📍 Feed Consensus

Number of independent feeds reporting this IP

Attack Type Diversity

📍 AbuseIPDB Categories

Count of distinct attack categories observed

Freshness Feature

📍 Temporal Weighting

Recent threats weighted higher than stale reports

The target variable is binary: yᵢ ∈ {0, 1}, where 1 = malicious, 0 = benign.

Random Forest

P_RF(y=1 | xᵢ) = (1/K) · Σ hₖ(xᵢ) for k = 1..K

python

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=200, max_depth=15,
    min_samples_split=5, class_weight='balanced', random_state=42)
rf.fit(X_train, y_train)
rf_probs = rf.predict_proba(X_test)[:, 1]

XGBoost

P_XGB(y=1 | xᵢ) = sigmoid(Σ fₘ(xᵢ)) for m = 1..M

python

from xgboost import XGBClassifier
xgb = XGBClassifier(n_estimators=300, max_depth=6,
    learning_rate=0.05, scale_pos_weight=3, eval_metric='logloss')
xgb.fit(X_train, y_train)
xgb_probs = xgb.predict_proba(X_test)[:, 1]

Stacking Ensemble

ŷᵢ_stack = β₀ + β₁·P_RF(y=1|xᵢ) + β₂·P_XGB(y=1|xᵢ)

P_final(y=1|xᵢ) = sigmoid(ŷᵢ_stack)

Why a linear meta-learner?

The β₁ and β₂ coefficients directly expose the relative trust in each base model — maintaining full explainability at every hierarchy level.

python

from sklearn.linear_model import LogisticRegression
import numpy as np
Z_train = np.column_stack([rf_val_probs, xgb_val_probs])
Z_test  = np.column_stack([rf_probs, xgb_probs])
meta = LogisticRegression()
meta.fit(Z_train, y_val)
final_scores = meta.predict_proba(Z_test)[:, 1]
print(f"RF weight: {meta.coef_[0][0]:.3f}, XGB: {meta.coef_[0][1]:.3f}")

← Architecture Explainability →

SHAP Explainability

Global model-wide importance and local per-IP attribution — making every risk score fully auditable by SOC analysts.

Global Feature Importance

Iⱼ = (1/n) · Σ |φᵢⱼ| for i = 1..n

Confidence Score

0.82

Abuse Reports

0.74

VirusTotal Detections

0.61

IPInfo Geo Risk

0.44

Attack Types

0.38

Multi-Source

0.30

Freshness Feature

0.24

Recency (hrs)

0.18

🔍 Key Insight

Confidence Score and Abuse Reports dominate. IPInfo Geo Risk ranks 4th — its privacy-flag signals (Tor, VPN, proxy) carry meaningful weight that static GeoIP lookups missed entirely.

Local Per-IP Explanation

P_final(y=1|xᵢ) = φ₀ + Σ φᵢⱼ for j = 1..d

python

import shap
explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test, feature_names=feature_cols)
shap.force_plot(explainer.expected_value, shap_values[0],
    X_test.iloc[0], feature_names=feature_cols)

Interpreting SHAP Output in the SOC

SHAP φ Value	Interpretation	Analyst Action
φ > 0.3	Feature strongly indicates malicious	Investigate this specific signal
0.0 < φ ≤ 0.3	Feature weakly supports malicious	Note in context of other signals
φ ≈ 0	Feature has negligible contribution	No action needed
φ < 0	Feature argues against maliciousness	Consider false positive likelihood

← ML Scoring Risk Scoring →

Risk Scoring & Metrics

Three-tier stratification gives SOC analysts clear, actionable priority levels — eliminating the equal-urgency problem of binary blacklists.

Risk Level Classification

LOWScore: 0.0 – 0.3Benign or stale reports. No action needed.

MEDIUMScore: 0.3 – 0.7Suspicious or mixed signals. Review recommended.

HIGHScore: ≥ 0.7Active threat. Immediate action required.

⚠ Threshold Tuning

Defaults (0.3, 0.7) calibrated for FPR <5%. Adjust in config/risk_thresholds.yaml for your environment.

Evaluation Metrics

Metric	Formula	Description
Precision	`TP / (TP + FP)`	Reliability of malicious predictions
Recall	`TP / (TP + FN)`	Ability to detect all actual threats
F1-Score	`2·(P·R)/(P+R)`	Harmonic mean of precision and recall
FPR	`FP / (FP + TN)`	Benign IPs incorrectly flagged
Score Stability	`σ(Rᵢ)`	Std dev of risk scores across time windows
Data Freshness	`mean(t_current − t_IOC)`	Average IOC age at dashboard display time
Feature Drift (KS)	`sup\|F₁(x) − F₂(x)\|`	Distributional shift: training vs. deployment

Benchmark Results

95%

Precision

Target >90% ✓

93%

Recall

Target >85% ✓

94%

F1-Score

Target >90% ✓

4%

FPR

Target <5% ✓

<10s

Freshness

Target <15s ✓

Stable

Feature Drift

KS <0.1 ✓

Comparison with Prior Work

System	Model	Precision	Real-time	Explainable
AIRPA (Lewis et al.)	Random Forest	85%	No	No
Anjum & Chowdhury	SVM, KNN	91%	Partial	No
Palaniappan et al.	RF, SVM	89%	No	No
SentinelTI	RF + XGBoost Stack	95%	Yes	SHAP ✓

← Explainability API Reference →

API Reference

REST endpoints for programmatic access to SentinelTI's ingestion, scoring, and explanation pipeline.

ℹ Base URL

All endpoints relative to http://localhost:8000/api/v1. Auth via Authorization: Bearer <token>.

Endpoints

POST /score

Request
Response
Python
POST /api/v1/score
Authorization: Bearer <token>
Content-Type: application/json

{ "ips": ["137.184.9.29"], "explain": true }
{
  "results": [{
    "ip": "137.184.9.29",
    "risk_score": 0.87, "risk_level": "HIGH",
    "location": {
      "country": "US", "city": "Santa Clara",
      "org": "AS14061 DigitalOcean",
      "privacy": {"tor":false,"vpn":false}
    },
    "shap": { "confidence_pct":0.41, "abuse_reports":0.28, "ipinfo_geo_risk":0.19 }
  }]
}
import requests
resp = requests.post("http://localhost:8000/api/v1/score",
    headers={"Authorization": "Bearer your_token"},
    json={"ips": ["137.184.9.29"], "explain": True})
print(resp.json()["results"][0]["risk_level"])

GET /ioc/{ip}

bash

curl http://localhost:8000/api/v1/ioc/137.184.9.29 \
  -H "Authorization: Bearer <token>"

Threat Feeds & Enrichment Sources

Source	Type	Update Freq	Data Provided
Abuse.ch	Primary IOC feed	Every 5 min	Malicious IPs, C2, botnet
CINS Army	Supplementary IOC	Hourly	Threat score list
AbuseIPDB	Enrichment API	Per-query	Abuse reports, confidence, categories
VirusTotal	Enrichment API	Per-query	Vendor detections count
IPInfo	Location & ASN enrichment	Per-query	Country, city, org, ASN, VPN/Tor/proxy flags

Data Schema Reference

Field	Type	Source	Description
`ip`	string	Feed	IPv4 address (canonical)
`abuse_reports`	int	AbuseIPDB	Total abuse report count
`confidence_pct`	float	AbuseIPDB	Weighted confidence 0–100
`recency_hrs`	float	Derived	Hours since last report
`virustotal_detections`	int	VirusTotal	Number of vendor detections
`location.country`	string	IPInfo	ISO country code
`location.org`	string	IPInfo	ASN + organisation name
`location.privacy`	object	IPInfo	VPN / proxy / Tor boolean flags
`ipinfo_geo_risk`	float	Derived / IPInfo	Computed geo risk from IPInfo privacy context
`risk_score`	float	ML Stack	Maliciousness probability [0, 1]
`risk_level`	enum	Thresholding	LOW / MEDIUM / HIGH
`shap_values`	object	SHAP	Per-feature SHAP φ contributions
`freshness`	float	Derived	Temporal relevance score [0, 1]

← Results