Science Fair Projects

Study and Analysis of Q-Learning Algorithm Parameters for Decision Making using a Developed Simulation Tool


The objective: The objective of this research is to study Q-Learning Algorithm (QLA) and to develop simulation software in order to understand the optimal selection of parameters (learning rate and weight of future rewards) involved.

Hypotheses: 1) If (a) both learning rate (alpha) and weight of future rewards (gamma) are defined to be 1; and/or (b) either alpha or gamma is set to 0, there is no learning involved. 2) To find the efficient path for reaching the goal, the optimal combination of alpha and gamma is 0.5 and 1, respectively. 3) If the sum of alpha and gamma equals 1, the average computation time (time to reach the goal) is constant, regardless of environment complexity.


QLA is a subset of reinforcement learning (RL) which falls under the Markov decision process (MDP). QLA pseudo-code forms the basis of this research. Other materials include a Windows-based laptop with 4GB RAM, C++ compiler, and an environment in which to test the learning agent.


A virtual environment (i.e. simulation tool) was created from scratch with C++ compiler. An AI (Artificial Intelligence) agent was tested within the environment with discrete values of alpha and gamma. The computational time involved justifying the optimal path based on combined effect of defined values of alpha and gamma. 1) The first hypothesis was proven correct: (a) with alpha and gamma both set to 1, all states became goal states & (b) with either alpha or gamma set to 0, learning took infinite amount of time. 2) The second hypothesis was proven incorrect: the optimal combination of alpha and gamma was 0.9 and 1, respectively, as computation time was quickest with these values. 3) The third hypothesis is currently under study.


MDP in AI domain is an unsupervised RL method involving mathematics and reasoning, computer algorithm, and software technology and emerging as an important area of interdisciplinary research as it has potential application in such areas as unmanned exploration, evolutionary research, and feature recognition. The research confirms that Q-Learning is a powerful technique that can be applied in the above areas.

The Q-Learning Algorithm parameters, learning rate and weight of future rewards, were studied and analyzed in order to understand the effect of their optimal combined values by the use of a developed simulation software.

Science Fair Project done By Abhijit S. Fnu


Related Projects : Web Enabled Automated Manufacturing System, Image Compression and De-compression, Web Blossom Bazzar, Unbeatable PONG through Artificial Intelligence, Developing a Computer Program That Effectively Mimics Human Creativity, Post-Disaster Response Using a Novel Adaptive Object Recognition Algorithm, Accuracy of Voice Recognition Software,Face Recognition by the Computer, Environmental Changes and Species Diversity, Computer Model of the SARS Epidemic, Mathematical Model for the Optimal Arrangement of Cell Phone Towers, Security Through Chaos, Category Oriented Web Search Engine Based on Round Robin Learning and Ranking Algorithm, Adaptive Interference Rejection in Wireless Networking, Development of a 3D Search Engine for Mechanical and Geometrical Applications


<<Back To Topics Page........................................................................................>> Next Topic



Copyright © 2013 through 2015