Given a conditional distribution $P_{Y\mid X}$ I'd like to find the prior distribution $P_X$ that maximizes the mutual information $I(X;Y)$ with $P_Y(y)=\int P_{Y\mid X}(y\mid x)P_X(x) \, \text{d}x$ (this corresponds to finding the channel capacity $C(X;Y):=\max_{P_X}I(X;Y)$) subject to the constraint $E_{P_X}[-\log(X)]=a$.

In my particular case, $P_{Y\mid X}$ is a Bernoulli distribution with $X$ as its parameter and I'm looking for a distribution over the parameter.

My intuition would tell me this should be some Beta-distribution (something like $\text{Beta}(1/a,1)$?!) in my particular case, but I don't know how to approach such a problem, much less in the general case.

Could anyone point me in the right direction?

1more comment