SentinelTI Dashboard
An explainable, real-time IP threat intelligence framework built for modern SOCs — powered by ensemble ML, IPInfo location enrichment, and SHAP interpretability.
What is SentinelTI?
SentinelTI is an asynchronous threat intelligence platform that addresses three core failure modes of traditional blacklist-based systems: high false-positive rates, opaque decision-making, and delayed threat visibility.
The framework ingests IP indicators from open-source feeds in real time, enriches them asynchronously — including IP location and ASN context via IPInfo — and runs a stacking ensemble of Random Forest + XGBoost to produce a dynamic maliciousness risk score explained by SHAP.
"An Explainable Machine Learning Framework for IP Threat Intelligence and Maliciousness Risk Scoring" — Soham Shah & Mousmi Pawar, Somaiya Vidyavihar University. IEEE Publication.
Core Problems Solved
| Traditional Problem | SentinelTI Solution |
|---|---|
| Stale blacklists — hours of detection lag | Real-time ingestion, <10s freshness |
| High false positive rate, alert fatigue | Ensemble ML reduces FPR to 4% |
| Black-box decisions, no analyst trust | SHAP global + local explainability |
| No risk granularity (binary flag) | Three-tier stratification: Low / Medium / High |
| Reactive SOC posture | Async enrichment enables proactive response |
Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| Backend | Python | Core ingestion, enrichment, ML pipeline |
| Database | MongoDB | IOC storage with timestamps, features, scores |
| ML Models | Random Forest · XGBoost | Stacking ensemble for maliciousness scoring |
| Explainability | SHAP | Global & local feature attribution |
| Dashboard | Streamlit | Live SOC visualisation with colour-coded risk |
| Deployment | Docker | Containerised, modular production deployment |
| Threat Feeds | Abuse.ch · CINS Army | Primary IOC sources |
| Enrichment APIs | VirusTotal · AbuseIPDB · IPInfo · ASN | Contextual metadata & IP location enrichment |
Getting Started
Get SentinelTI running in your SOC environment in under 10 minutes.
You need Python 3.10+, Docker, a running MongoDB instance, and API keys for VirusTotal, AbuseIPDB, and IPInfo.
Installation
# Clone the repository git clone https://github.com/soham7998/sentinel_TI.git cd sentinel_TI # Create virtual environment python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate # Install dependencies pip install -r requirements.txt
Configuration
# MongoDB connection MONGO_URI=mongodb://localhost:27017 MONGO_DB=sentinelti # Threat intelligence API keys VIRUSTOTAL_API_KEY=your_virustotal_key_here ABUSEIPDB_API_KEY=your_abuseipdb_key_here # IP location enrichment (replaces GeoIP) IPINFO_TOKEN=your_ipinfo_token_here # Feed configuration ABUSEIP_LIMIT=200 INGESTION_INTERVAL_SEC=300 # Dashboard STREAMLIT_PORT=8501
Running with Docker
# Build and start all services docker-compose up --build -d docker-compose logs -f sentinel # view logs open http://localhost:8501 # dashboard
Running Locally
# Start IOC ingestion pipeline
python src/ingestion/ingest.py --limit 200# Run async enrichment worker (incl. IPInfo location lookup)
python src/enrichment/enrich_worker.py# Launch Streamlit dashboard
streamlit run dashboard/app.py --server.port 8501IOCs appear on the dashboard within ~30 seconds. Full SHAP scores with IPInfo location data arrive within ~60 seconds of ingestion.
Architecture Pipeline
A modular, asynchronous pipeline separating fast IOC ingestion from background enrichment — achieving <10s dashboard latency without sacrificing analytical depth.
Framework Diagram
The interactive diagram below matches the paper's Figure 1 — each stage flows sequentially through the asynchronous intelligence pipeline:
Async Enrichment Detail
The core architectural decision is the separation of ingestion from enrichment. When a new IOC arrives:
- Stored immediately in MongoDB with
enriched: false— visible on SOC dashboard instantly. - Background worker calls VirusTotal, AbuseIPDB, IPInfo (location/ASN/privacy), and ASN APIs asynchronously.
- Once enrichment completes, the feature vector is assembled and scored by the ML pipeline.
- Risk score and SHAP values are written back to MongoDB, updating the dashboard in real time.
IPInfo Integration
IPInfo replaces static GeoIP databases with a live API that provides richer context: city/region/country, organization, ASN, and critically — privacy flags (VPN, proxy, Tor detection). These flags are strong threat indicators fed directly into the ML feature vector as ipinfo_geo_risk.
import ipinfo, os handler = ipinfo.getHandler(os.getenv("IPINFO_TOKEN")) def enrich_location(ip: str) -> dict: d = handler.getDetails(ip) return { "country": d.country, "region": d.region, "city": d.city, "org": d.org, # e.g. "AS14061 DigitalOcean" "asn": d.asn, "privacy": d.privacy, # {"vpn":bool, "tor":bool, "proxy":bool} "latitude": d.latitude, "longitude": d.longitude, } def compute_ipinfo_geo_risk(details: dict) -> float: """Derive a geo risk score from IPInfo privacy flags + ASN.""" score = 0.0 privacy = details.get("privacy", {}) if privacy.get("tor"): score += 0.5 if privacy.get("vpn"): score += 0.3 if privacy.get("proxy"): score += 0.2 return min(score, 1.0)
MongoDB Schema
{
"ip": "137.184.9.29",
"source": "Abuse.ch",
"first_seen": "2026-02-14T08:10:21Z",
"enriched": true,
"location": {
/* via IPInfo */
"country": "United States", "region": "California",
"city": "Santa Clara", "org": "AS14061 DigitalOcean",
"asn": "AS14061",
"privacy": { "vpn": false, "tor": false, "proxy": false }
},
"features": {
"abuse_reports": 42, "recency_hrs": 3,
"confidence_pct": 92, "multi_source": true,
"attack_types": ["Brute Force", "SSH"],
"virustotal_detections": 18,
"ipinfo_geo_risk": 0.0, /* derived from IPInfo privacy flags */
"freshness": 0.91
},
"risk_score": 0.87, "risk_level": "HIGH",
"shap_values": { /* per-feature SHAP φ */ }
}
Machine Learning Scoring
A stacking ensemble of Random Forest and XGBoost, combined via a linear meta-learner to produce calibrated maliciousness probabilities.
Feature Engineering
Each IP address is represented as a feature vector xᵢ = (f₁, f₂, … f_d) built from enrichment data:
The target variable is binary: yᵢ ∈ {0, 1}, where 1 = malicious, 0 = benign.
Random Forest
from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(n_estimators=200, max_depth=15, min_samples_split=5, class_weight='balanced', random_state=42) rf.fit(X_train, y_train) rf_probs = rf.predict_proba(X_test)[:, 1]
XGBoost
from xgboost import XGBClassifier xgb = XGBClassifier(n_estimators=300, max_depth=6, learning_rate=0.05, scale_pos_weight=3, eval_metric='logloss') xgb.fit(X_train, y_train) xgb_probs = xgb.predict_proba(X_test)[:, 1]
Stacking Ensemble
The β₁ and β₂ coefficients directly expose the relative trust in each base model — maintaining full explainability at every hierarchy level.
from sklearn.linear_model import LogisticRegression import numpy as np Z_train = np.column_stack([rf_val_probs, xgb_val_probs]) Z_test = np.column_stack([rf_probs, xgb_probs]) meta = LogisticRegression() meta.fit(Z_train, y_val) final_scores = meta.predict_proba(Z_test)[:, 1] print(f"RF weight: {meta.coef_[0][0]:.3f}, XGB: {meta.coef_[0][1]:.3f}")
SHAP Explainability
Global model-wide importance and local per-IP attribution — making every risk score fully auditable by SOC analysts.
Global Feature Importance
Confidence Score and Abuse Reports dominate. IPInfo Geo Risk ranks 4th — its privacy-flag signals (Tor, VPN, proxy) carry meaningful weight that static GeoIP lookups missed entirely.
Local Per-IP Explanation
import shap explainer = shap.TreeExplainer(xgb_model) shap_values = explainer.shap_values(X_test) shap.summary_plot(shap_values, X_test, feature_names=feature_cols) shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0], feature_names=feature_cols)
Interpreting SHAP Output in the SOC
| SHAP φ Value | Interpretation | Analyst Action |
|---|---|---|
| φ > 0.3 | Feature strongly indicates malicious | Investigate this specific signal |
| 0.0 < φ ≤ 0.3 | Feature weakly supports malicious | Note in context of other signals |
| φ ≈ 0 | Feature has negligible contribution | No action needed |
| φ < 0 | Feature argues against maliciousness | Consider false positive likelihood |
Risk Scoring & Metrics
Three-tier stratification gives SOC analysts clear, actionable priority levels — eliminating the equal-urgency problem of binary blacklists.
Risk Level Classification
Defaults (0.3, 0.7) calibrated for FPR <5%. Adjust in config/risk_thresholds.yaml for your environment.
Evaluation Metrics
| Metric | Formula | Description |
|---|---|---|
| Precision | TP / (TP + FP) | Reliability of malicious predictions |
| Recall | TP / (TP + FN) | Ability to detect all actual threats |
| F1-Score | 2·(P·R)/(P+R) | Harmonic mean of precision and recall |
| FPR | FP / (FP + TN) | Benign IPs incorrectly flagged |
| Score Stability | σ(Rᵢ) | Std dev of risk scores across time windows |
| Data Freshness | mean(t_current − t_IOC) | Average IOC age at dashboard display time |
| Feature Drift (KS) | sup|F₁(x) − F₂(x)| | Distributional shift: training vs. deployment |
Benchmark Results
Comparison with Prior Work
| System | Model | Precision | Real-time | Explainable |
|---|---|---|---|---|
| AIRPA (Lewis et al.) | Random Forest | 85% | No | No |
| Anjum & Chowdhury | SVM, KNN | 91% | Partial | No |
| Palaniappan et al. | RF, SVM | 89% | No | No |
| SentinelTI | RF + XGBoost Stack | 95% | Yes | SHAP ✓ |
API Reference
REST endpoints for programmatic access to SentinelTI's ingestion, scoring, and explanation pipeline.
All endpoints relative to http://localhost:8000/api/v1. Auth via Authorization: Bearer <token>.
Endpoints
POST /score
POST /api/v1/score
Authorization: Bearer <token>
Content-Type: application/json
{ "ips": ["137.184.9.29"], "explain": true }{
"results": [{
"ip": "137.184.9.29",
"risk_score": 0.87, "risk_level": "HIGH",
"location": {
"country": "US", "city": "Santa Clara",
"org": "AS14061 DigitalOcean",
"privacy": {"tor":false,"vpn":false}
},
"shap": { "confidence_pct":0.41, "abuse_reports":0.28, "ipinfo_geo_risk":0.19 }
}]
}import requests resp = requests.post("http://localhost:8000/api/v1/score", headers={"Authorization": "Bearer your_token"}, json={"ips": ["137.184.9.29"], "explain": True}) print(resp.json()["results"][0]["risk_level"])
GET /ioc/{ip}
curl http://localhost:8000/api/v1/ioc/137.184.9.29 \
-H "Authorization: Bearer <token>"
Threat Feeds & Enrichment Sources
| Source | Type | Update Freq | Data Provided |
|---|---|---|---|
| Abuse.ch | Primary IOC feed | Every 5 min | Malicious IPs, C2, botnet |
| CINS Army | Supplementary IOC | Hourly | Threat score list |
| AbuseIPDB | Enrichment API | Per-query | Abuse reports, confidence, categories |
| VirusTotal | Enrichment API | Per-query | Vendor detections count |
| IPInfo | Location & ASN enrichment | Per-query | Country, city, org, ASN, VPN/Tor/proxy flags |
Data Schema Reference
| Field | Type | Source | Description |
|---|---|---|---|
ip | string | Feed | IPv4 address (canonical) |
abuse_reports | int | AbuseIPDB | Total abuse report count |
confidence_pct | float | AbuseIPDB | Weighted confidence 0–100 |
recency_hrs | float | Derived | Hours since last report |
virustotal_detections | int | VirusTotal | Number of vendor detections |
location.country | string | IPInfo | ISO country code |
location.org | string | IPInfo | ASN + organisation name |
location.privacy | object | IPInfo | VPN / proxy / Tor boolean flags |
ipinfo_geo_risk | float | Derived / IPInfo | Computed geo risk from IPInfo privacy context |
risk_score | float | ML Stack | Maliciousness probability [0, 1] |
risk_level | enum | Thresholding | LOW / MEDIUM / HIGH |
shap_values | object | SHAP | Per-feature SHAP φ contributions |
freshness | float | Derived | Temporal relevance score [0, 1] |