Science Fair Projects

Study and Analysis of Q-Learning Algorithm Parameters for Decision Making using a Developed Simulation Tool


The objective: The objective of this research is to study Q-Learning Algorithm (QLA) and to develop simulation software in order to understand the optimal selection of parameters (learning rate and weight of future rewards) involved.

Hypotheses: 1) If (a) both learning rate (alpha) and weight of future rewards (gamma) are defined to be 1; and/or (b) either alpha or gamma is set to 0, there is no learning involved. 2) To find the efficient path for reaching the goal, the optimal combination of alpha and gamma is 0.5 and 1, respectively. 3) If the sum of alpha and gamma equals 1, the average computation time (time to reach the goal) is constant, regardless of environment complexity.


QLA is a subset of reinforcement learning (RL) which falls under the Markov decision process (MDP). QLA pseudo-code forms the basis of this research. Other materials include a Windows-based laptop with 4GB RAM, C++ compiler, and an environment in which to test the learning agent.


A virtual environment (i.e. simulation tool) was created from scratch with C++ compiler. An AI (Artificial Intelligence) agent was tested within the environment with discrete values of alpha and gamma. The computational time involved justifying the optimal path based on combined effect of defined values of alpha and gamma. 1) The first hypothesis was proven correct: (a) with alpha and gamma both set to 1, all states became goal states & (b) with either alpha or gamma set to 0, learning took infinite amount of time. 2) The second hypothesis was proven incorrect: the optimal combination of alpha and gamma was 0.9 and 1, respectively, as computation time was quickest with these values. 3) The third hypothesis is currently under study.


MDP in AI domain is an unsupervised RL method involving mathematics and reasoning, computer algorithm, and software technology and emerging as an important area of interdisciplinary research as it has potential application in such areas as unmanned exploration, evolutionary research, and feature recognition. The research confirms that Q-Learning is a powerful technique that can be applied in the above areas.

The Q-Learning Algorithm parameters, learning rate and weight of future rewards, were studied and analyzed in order to understand the effect of their optimal combined values by the use of a developed simulation software.

Science Fair Project done By Abhijit S. Fnu


Related Projects : Salmonids by Numbers II, Continual Adaptation of Acoustic Models for Domain-Specific Speech Recognition, Effects of Motility and Contact Inhibition on Tumor Viability, Transportation Networks and the Propagation of Novel H1N1 Swine Flu-like Epidemics, Novel Genetics-Based Early Disease Detection Using Ontology-Driven Microarray Semantics, A Novel Approach to Text Compression Using N-Grams, Malware Identification by Statistical Opcode Analysis, Real-Time Markerless Hand Computer Interaction, Cracking the Code, Examining File Compression in Computers, Programming for the Computer and iPhone Platforms, Some Reasons a Computer Slows Down, Turbo Charging Computer with Mathematical Algorithms


<<Back To Topics Page........................................................................................>> Next Topic



Copyright © 2013 through 2015