A reinforcement learning project implementing a Deep Q-Network (DQN) agent that learns to evade an enemy in a grid-based pursuit-evasion game.
This project demonstrates core DQN concepts including:
- Deep Q-Learning: Neural network-based Q-value approximation
- Experience Replay: Breaking correlations in sequential experiences
- Target Networks: Stabilizing training with separate target and online networks
- Epsilon-Greedy Exploration: Balancing exploration and exploitation
┌─────────────┐
│ ChaseEnv │ (Grid-based game environment)
└─────────────┘
↓
┌─────────────┐
│ DQN Model │ (Neural network for Q-value estimation)
└─────────────┘
↓
┌─────────────┐
│ ReplayBuffer│ (Experience storage and sampling)
└─────────────┘
↓
┌─────────────┐
│ Training │ (DQN training loop)
└─────────────┘
-
env.py: Chase game environment with Pygame visualization- 5×5 grid-based pursuit-evasion game
- Player vs. deterministic enemy AI
- Reward: +1 for survival, -10 for being caught
-
model.py: Deep Q-Network neural network- 2 hidden layers with 64 neurons each
- ReLU activation functions
- Input: 4D state [player_x, player_y, enemy_x, enemy_y]
- Output: 4 Q-values (one per action)
-
buffer.py: Experience Replay Buffer- Stores up to 2000 transitions
- Random sampling for mini-batch training
- Reduces correlation between consecutive samples
-
train.py: Main training loop- 20 episodes of gameplay
- Epsilon-greedy action selection with decay
- Target network updates every 5 episodes
- Model saved after training
# Clone or download the repository
cd DQN-Chase
# Install dependencies
pip install -r requirements.txt# Train the DQN agent
python train.pyThe training process will:
- Initialize the environment and networks
- Run 20 episodes of the game
- Display the game window with real-time rendering
- Print training progress (episode, score, epsilon)
- Save the trained model to
dqn_model.pth
| Parameter | Value | Description |
|---|---|---|
grid_size |
5 | Size of the game grid |
num_episodes |
20 | Number of training episodes |
max_steps |
100 | Maximum steps per episode |
batch_size |
8 | Mini-batch size for training |
gamma |
0.99 | Discount factor for future rewards |
epsilon_min |
0.05 | Minimum exploration rate |
epsilon_decay |
0.995 | Exploration decay rate per episode |
learning_rate |
0.001 | Adam optimizer learning rate |
- Player (Blue Square): Controlled by the DQN agent; must evade the enemy
- Enemy (Red Square): Moves deterministically toward the player
- Actions: 0=Right, 1=Left, 2=Up, 3=Down
- Reward: +1.0 for each step survived, -10.0 if caught
- Episode Ends: When player is caught or max_steps reached
After training, the model is saved as dqn_model.pth and console output includes:
Episode 1: Random chance: 0.995
Episode 1 finished | Score: 42.00 | Random chance: 0.995
Episode 5: Target network updated
...
Training finished. Model saved as 'dqn_model.pth'
- Increase
gamma(→0.99) for longer-term planning - Lower
learning_ratefor more stable training - Increase
batch_sizefor more accurate gradient estimates - Increase
num_episodesfor better convergence - Adjust
epsilon_decayto control exploration schedule
- Python 3.8+
- PyTorch 2.0+
- NumPy 1.20+
- Pygame 2.0+
See requirements.txt for specific versions.
- Larger grid environments (10×10)
- Multiple enemies
- Obstacles on the grid
- Training visualization with TensorBoard
- Policy evaluation metrics
- Double DQN for improved stability
- Dueling DQN architecture
This project is open source and available for educational purposes.
Author: Faedrop
Repository: DQN-Chase