SentinelTI Dashboard

An explainable, real-time IP threat intelligence framework built for modern SOCs — powered by ensemble ML, IPInfo location enrichment, and SHAP interpretability.


 System operational · v2.0 · IEEE Publication
95%
Precision
Target >90%
93%
Recall
Target >85%
94%
F1-Score
Target >90%
4%
False Positive Rate
Target <5%
<10s
Data Freshness
Target <15s
>95%
IOC Coverage
Target >90%

What is SentinelTI?

SentinelTI is an asynchronous threat intelligence platform that addresses three core failure modes of traditional blacklist-based systems: high false-positive rates, opaque decision-making, and delayed threat visibility.

The framework ingests IP indicators from open-source feeds in real time, enriches them asynchronously — including IP location and ASN context via IPInfo — and runs a stacking ensemble of Random Forest + XGBoost to produce a dynamic maliciousness risk score explained by SHAP.

Research Paper

"An Explainable Machine Learning Framework for IP Threat Intelligence and Maliciousness Risk Scoring" — Soham Shah & Mousmi Pawar, Somaiya Vidyavihar University. IEEE Publication.

Core Problems Solved

Traditional ProblemSentinelTI Solution
Stale blacklists — hours of detection lagReal-time ingestion, <10s freshness
High false positive rate, alert fatigueEnsemble ML reduces FPR to 4%
Black-box decisions, no analyst trustSHAP global + local explainability
No risk granularity (binary flag)Three-tier stratification: Low / Medium / High
Reactive SOC postureAsync enrichment enables proactive response

Tech Stack

LayerTechnologyPurpose
BackendPythonCore ingestion, enrichment, ML pipeline
DatabaseMongoDBIOC storage with timestamps, features, scores
ML ModelsRandom Forest · XGBoostStacking ensemble for maliciousness scoring
ExplainabilitySHAPGlobal & local feature attribution
DashboardStreamlitLive SOC visualisation with colour-coded risk
DeploymentDockerContainerised, modular production deployment
Threat FeedsAbuse.ch · CINS ArmyPrimary IOC sources
Enrichment APIsVirusTotal · AbuseIPDB · IPInfo · ASNContextual metadata & IP location enrichment

Getting Started

Get SentinelTI running in your SOC environment in under 10 minutes.

ℹ Prerequisites

You need Python 3.10+, Docker, a running MongoDB instance, and API keys for VirusTotal, AbuseIPDB, and IPInfo.

Installation

bash
# Clone the repository
git clone https://github.com/soham7998/sentinel_TI.git
cd sentinel_TI

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Configuration

.env
# MongoDB connection
MONGO_URI=mongodb://localhost:27017
MONGO_DB=sentinelti

# Threat intelligence API keys
VIRUSTOTAL_API_KEY=your_virustotal_key_here
ABUSEIPDB_API_KEY=your_abuseipdb_key_here

# IP location enrichment (replaces GeoIP)
IPINFO_TOKEN=your_ipinfo_token_here

# Feed configuration
ABUSEIP_LIMIT=200
INGESTION_INTERVAL_SEC=300

# Dashboard
STREAMLIT_PORT=8501

Running with Docker

bash
# Build and start all services
docker-compose up --build -d

docker-compose logs -f sentinel   # view logs
open http://localhost:8501          # dashboard

Running Locally

Ingestion
Enrichment
Dashboard
# Start IOC ingestion pipeline
python src/ingestion/ingest.py --limit 200
# Run async enrichment worker (incl. IPInfo location lookup)
python src/enrichment/enrich_worker.py
# Launch Streamlit dashboard
streamlit run dashboard/app.py --server.port 8501
Success

IOCs appear on the dashboard within ~30 seconds. Full SHAP scores with IPInfo location data arrive within ~60 seconds of ingestion.

Architecture Pipeline

A modular, asynchronous pipeline separating fast IOC ingestion from background enrichment — achieving <10s dashboard latency without sacrificing analytical depth.

Framework Diagram

The interactive diagram below matches the paper's Figure 1 — each stage flows sequentially through the asynchronous intelligence pipeline:

Input Layer
External Threat Feeds
Abuse.ch (Primary)  ·  CINS Army (Supplementary)
⚡ Fast IOC Ingestion
Normalize  ·  Deduplicate  ·  Limit = 200
Asynchronous Enrichment
VirusTotal  ·  AbuseIPDB  ·  IPInfo  ·  ASN
MongoDB IOC Store
Timestamps  ·  Features  ·  Scores
ML Risk Scoring Engine
Random Forest + XGBoost  ·  Risk Score 0 → 1
SHAP Explainability
Global Feature Importance  ·  Local Per-IP Explanation
Actionable SOC Dashboard — Evidence-based Decisions
LOW
MEDIUM
HIGH

Async Enrichment Detail

The core architectural decision is the separation of ingestion from enrichment. When a new IOC arrives:

  1. Stored immediately in MongoDB with enriched: false — visible on SOC dashboard instantly.
  2. Background worker calls VirusTotal, AbuseIPDB, IPInfo (location/ASN/privacy), and ASN APIs asynchronously.
  3. Once enrichment completes, the feature vector is assembled and scored by the ML pipeline.
  4. Risk score and SHAP values are written back to MongoDB, updating the dashboard in real time.

IPInfo Integration

Why IPInfo?

IPInfo replaces static GeoIP databases with a live API that provides richer context: city/region/country, organization, ASN, and critically — privacy flags (VPN, proxy, Tor detection). These flags are strong threat indicators fed directly into the ML feature vector as ipinfo_geo_risk.

python · IPInfo enrichment
import ipinfo, os

handler = ipinfo.getHandler(os.getenv("IPINFO_TOKEN"))

def enrich_location(ip: str) -> dict:
    d = handler.getDetails(ip)
    return {
        "country":   d.country,
        "region":    d.region,
        "city":      d.city,
        "org":       d.org,        # e.g. "AS14061 DigitalOcean"
        "asn":       d.asn,
        "privacy":   d.privacy,   # {"vpn":bool, "tor":bool, "proxy":bool}
        "latitude":  d.latitude,
        "longitude": d.longitude,
    }

def compute_ipinfo_geo_risk(details: dict) -> float:
    """Derive a geo risk score from IPInfo privacy flags + ASN."""
    score = 0.0
    privacy = details.get("privacy", {})
    if privacy.get("tor"):    score += 0.5
    if privacy.get("vpn"):    score += 0.3
    if privacy.get("proxy"):  score += 0.2
    return min(score, 1.0)

MongoDB Schema

JSON · IOC Document
{
  "ip": "137.184.9.29",
  "source": "Abuse.ch",
  "first_seen": "2026-02-14T08:10:21Z",
  "enriched": true,
  "location": {
    /* via IPInfo */
    "country": "United States",  "region": "California",
    "city": "Santa Clara",      "org": "AS14061 DigitalOcean",
    "asn": "AS14061",
    "privacy": { "vpn": false, "tor": false, "proxy": false }
  },
  "features": {
    "abuse_reports": 42,    "recency_hrs": 3,
    "confidence_pct": 92,   "multi_source": true,
    "attack_types": ["Brute Force", "SSH"],
    "virustotal_detections": 18,
    "ipinfo_geo_risk": 0.0,  /* derived from IPInfo privacy flags */
    "freshness": 0.91
  },
  "risk_score": 0.87,  "risk_level": "HIGH",
  "shap_values": { /* per-feature SHAP φ */ }
}

Machine Learning Scoring

A stacking ensemble of Random Forest and XGBoost, combined via a linear meta-learner to produce calibrated maliciousness probabilities.

Feature Engineering

Each IP address is represented as a feature vector xᵢ = (f₁, f₂, … f_d) built from enrichment data:

Abuse Reports
📍 AbuseIPDB
Frequency of abuse reports logged against this IP
Confidence Score
📍 AbuseIPDB
Weighted confidence (0–100) of maliciousness
Recency (hours)
📍 Last Report Timestamp
Hours elapsed since last observed abuse activity
VirusTotal Detections
📍 VirusTotal API
Count of vendor detections (e.g., 18/70)
IPInfo Geo Risk
📍 IPInfo API
Risk score derived from location, org, and privacy flags (VPN/Tor/proxy) via IPInfo
Multi-Source Agreement
📍 Feed Consensus
Number of independent feeds reporting this IP
Attack Type Diversity
📍 AbuseIPDB Categories
Count of distinct attack categories observed
Freshness Feature
📍 Temporal Weighting
Recent threats weighted higher than stale reports

The target variable is binary: yᵢ ∈ {0, 1}, where 1 = malicious, 0 = benign.

Random Forest

P_RF(y=1 | xᵢ) = (1/K) · Σ hₖ(xᵢ) for k = 1..K
python
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=200, max_depth=15,
    min_samples_split=5, class_weight='balanced', random_state=42)
rf.fit(X_train, y_train)
rf_probs = rf.predict_proba(X_test)[:, 1]

XGBoost

P_XGB(y=1 | xᵢ) = sigmoid(Σ fₘ(xᵢ)) for m = 1..M
python
from xgboost import XGBClassifier
xgb = XGBClassifier(n_estimators=300, max_depth=6,
    learning_rate=0.05, scale_pos_weight=3, eval_metric='logloss')
xgb.fit(X_train, y_train)
xgb_probs = xgb.predict_proba(X_test)[:, 1]

Stacking Ensemble

ŷᵢ_stack = β₀ + β₁·P_RF(y=1|xᵢ) + β₂·P_XGB(y=1|xᵢ)
P_final(y=1|xᵢ) = sigmoid(ŷᵢ_stack)
Why a linear meta-learner?

The β₁ and β₂ coefficients directly expose the relative trust in each base model — maintaining full explainability at every hierarchy level.

python
from sklearn.linear_model import LogisticRegression
import numpy as np
Z_train = np.column_stack([rf_val_probs, xgb_val_probs])
Z_test  = np.column_stack([rf_probs, xgb_probs])
meta = LogisticRegression()
meta.fit(Z_train, y_val)
final_scores = meta.predict_proba(Z_test)[:, 1]
print(f"RF weight: {meta.coef_[0][0]:.3f}, XGB: {meta.coef_[0][1]:.3f}")

SHAP Explainability

Global model-wide importance and local per-IP attribution — making every risk score fully auditable by SOC analysts.

Global Feature Importance

Iⱼ = (1/n) · Σ |φᵢⱼ| for i = 1..n
Confidence Score
0.82
Abuse Reports
0.74
VirusTotal Detections
0.61
IPInfo Geo Risk
0.44
Attack Types
0.38
Multi-Source
0.30
Freshness Feature
0.24
Recency (hrs)
0.18
🔍 Key Insight

Confidence Score and Abuse Reports dominate. IPInfo Geo Risk ranks 4th — its privacy-flag signals (Tor, VPN, proxy) carry meaningful weight that static GeoIP lookups missed entirely.

Local Per-IP Explanation

P_final(y=1|xᵢ) = φ₀ + Σ φᵢⱼ for j = 1..d
python
import shap
explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test, feature_names=feature_cols)
shap.force_plot(explainer.expected_value, shap_values[0],
    X_test.iloc[0], feature_names=feature_cols)

Interpreting SHAP Output in the SOC

SHAP φ ValueInterpretationAnalyst Action
φ > 0.3Feature strongly indicates maliciousInvestigate this specific signal
0.0 < φ ≤ 0.3Feature weakly supports maliciousNote in context of other signals
φ ≈ 0Feature has negligible contributionNo action needed
φ < 0Feature argues against maliciousnessConsider false positive likelihood

Risk Scoring & Metrics

Three-tier stratification gives SOC analysts clear, actionable priority levels — eliminating the equal-urgency problem of binary blacklists.

Risk Level Classification

LOWScore: 0.0 – 0.3Benign or stale reports. No action needed.
MEDIUMScore: 0.3 – 0.7Suspicious or mixed signals. Review recommended.
HIGHScore: ≥ 0.7Active threat. Immediate action required.

⚠ Threshold Tuning

Defaults (0.3, 0.7) calibrated for FPR <5%. Adjust in config/risk_thresholds.yaml for your environment.

Evaluation Metrics

MetricFormulaDescription
PrecisionTP / (TP + FP)Reliability of malicious predictions
RecallTP / (TP + FN)Ability to detect all actual threats
F1-Score2·(P·R)/(P+R)Harmonic mean of precision and recall
FPRFP / (FP + TN)Benign IPs incorrectly flagged
Score Stabilityσ(Rᵢ)Std dev of risk scores across time windows
Data Freshnessmean(t_current − t_IOC)Average IOC age at dashboard display time
Feature Drift (KS)sup|F₁(x) − F₂(x)|Distributional shift: training vs. deployment

Benchmark Results

95%
Precision
Target >90% ✓
93%
Recall
Target >85% ✓
94%
F1-Score
Target >90% ✓
4%
FPR
Target <5% ✓
<10s
Freshness
Target <15s ✓
Stable
Feature Drift
KS <0.1 ✓

Comparison with Prior Work

SystemModelPrecisionReal-timeExplainable
AIRPA (Lewis et al.)Random Forest85%NoNo
Anjum & ChowdhurySVM, KNN91%PartialNo
Palaniappan et al.RF, SVM89%NoNo
SentinelTIRF + XGBoost Stack95%YesSHAP ✓

API Reference

REST endpoints for programmatic access to SentinelTI's ingestion, scoring, and explanation pipeline.

ℹ Base URL

All endpoints relative to http://localhost:8000/api/v1. Auth via Authorization: Bearer <token>.

Endpoints

POST /score

Request
Response
Python
POST /api/v1/score
Authorization: Bearer <token>
Content-Type: application/json

{ "ips": ["137.184.9.29"], "explain": true }
{
  "results": [{
    "ip": "137.184.9.29",
    "risk_score": 0.87, "risk_level": "HIGH",
    "location": {
      "country": "US", "city": "Santa Clara",
      "org": "AS14061 DigitalOcean",
      "privacy": {"tor":false,"vpn":false}
    },
    "shap": { "confidence_pct":0.41, "abuse_reports":0.28, "ipinfo_geo_risk":0.19 }
  }]
}
import requests
resp = requests.post("http://localhost:8000/api/v1/score",
    headers={"Authorization": "Bearer your_token"},
    json={"ips": ["137.184.9.29"], "explain": True})
print(resp.json()["results"][0]["risk_level"])

GET /ioc/{ip}

bash
curl http://localhost:8000/api/v1/ioc/137.184.9.29 \
  -H "Authorization: Bearer <token>"

Threat Feeds & Enrichment Sources

SourceTypeUpdate FreqData Provided
Abuse.chPrimary IOC feedEvery 5 minMalicious IPs, C2, botnet
CINS ArmySupplementary IOCHourlyThreat score list
AbuseIPDBEnrichment APIPer-queryAbuse reports, confidence, categories
VirusTotalEnrichment APIPer-queryVendor detections count
IPInfoLocation & ASN enrichmentPer-queryCountry, city, org, ASN, VPN/Tor/proxy flags

Data Schema Reference

FieldTypeSourceDescription
ipstringFeedIPv4 address (canonical)
abuse_reportsintAbuseIPDBTotal abuse report count
confidence_pctfloatAbuseIPDBWeighted confidence 0–100
recency_hrsfloatDerivedHours since last report
virustotal_detectionsintVirusTotalNumber of vendor detections
location.countrystringIPInfoISO country code
location.orgstringIPInfoASN + organisation name
location.privacyobjectIPInfoVPN / proxy / Tor boolean flags
ipinfo_geo_riskfloatDerived / IPInfoComputed geo risk from IPInfo privacy context
risk_scorefloatML StackMaliciousness probability [0, 1]
risk_levelenumThresholdingLOW / MEDIUM / HIGH
shap_valuesobjectSHAPPer-feature SHAP φ contributions
freshnessfloatDerivedTemporal relevance score [0, 1]
On This Page